a month ago
My project industrious-analysis has been experiencing Redis timeouts ever since the earlier outage, and despite trying to redeploy all services after the outage was declared resolved, I am still experiencing the issue. I don't know what else to try at this point since I haven't changed anything on my end since last night.
8 Replies
a month ago
Does the Redis service show any logs?
nothing that seems notable. I am not sure the request is even making it there
a month ago
Are you using the private service URL to connect to it?
a month ago
What exactly are the error messages you're getting?
a month ago
Could you send the exact error message?
@name:"api-gateway" AND @err.message:"failed with status code 500" AND @err.stack:"Error: failed with status code 500\n at onResFinished (/app/node_modules/.pnpm/pino-http@11.0.0/node_modules/pino-http/logger.js:115:39)\n at ServerResponse.onResponseComplete (/app/node_modules/.pnpm/pino-http@11.0.0/node_modules/pino-http/logger.js:178:14)\n at ServerResponse.emit (node:events:531:35)\n at onFinish (node:_http_outgoing:1082:10)\n at callback (node:internal/streams/writable:766:21)\n at afterWrite (node:internal/streams/writable:710:5)\n at afterWriteTick (node:internal/streams/writable:696:10)\n at process.processTicksAndRejections (node:internal/process/task_queues:89:21)" AND @err.type:"Error" @req.id:3 AND @req.method:"POST" AND @req.remoteAddress:"fd12:59bf:c23a:0:a000:75:159b:3779" AND @req.remotePort:38392 AND @req.url:"/ai/generate?wait=true" AND @req.query.wait:"true" AND @req.headers.accept:"*/*" AND @req.headers.accept-encoding:"gzip, deflate" AND @req.headers.accept-language:"*" AND @req.headers.connection:"keep-alive" AND @req.headers.content-length:"56045" AND @req.headers.content-type:"application/json" AND @req.headers.host:"api-gateway.railway.internal:3000" AND @req.headers.sec-fetch-mode:"cors" AND @req.headers.user-agent:"node" I was hoping it would be a transient thing that would self heal but nope it's been hoooours
grrr now my production Postgres is hanging and the redeploy is too
today has been a bad day
it finally deployed but I still can't connect to it. wtf
doesn't even work on the website
a month ago
about the Postgres service, do you see any kind of error logs?
a month ago
can you remove your current deployment and then try a fresh deployment by hitting CTRL + K and then deploy latest commit
a month ago
Can I get a link to that service?
a month ago
That option doesnt exist on a postgres image. It's alright
a month ago
waiting for it to come online and looking at it
a month ago
Thats indeed odd, I'll wait for the team response on that, sorry for any issue.
a month ago
The creating containers step randomly might take some time. We are working on a project that will fix that and a few other issues to make things much faster
I'm really frustrated that both my dev and prod environments stopped working after lunch today and have been hosed for hours
a month ago
The degraded performance in US-East?
tbh it might also be partly a Qdrant issue. my app logs show it hanging a lot. or it's a networking thing idk. AWS has been unreliable lately
a month ago
Is your Qdrant hosted on AWS?
a month ago
I don't see qdrant on this project
I hosted it externally because I didn't know any better. does Railway offer hosting for it?
it's on qdrant.io
a month ago
Railway does! I host my qdrant instances on Railway always
a month ago
Direct link for it:
a month ago
beat me to it
a month ago
Interesting… Let me check something internally
lol it's still unable to be connected to. I'm working on setting up Qdrant on Railway and I hope it actually works
a month ago
Need to bring this up to the right team. You should be good to connect with an external DB viewer like tableplus
a month ago
going to get this fixed though
a month ago
Fixed (Paulo the goat)
Reload yor web page and it should be good!
a month ago
External connections to Postgres are working.
I double checked the password and port on the proxy and everything. it was working before today's mess
a month ago
Prod works fine with regard to both private and public connections; we have verified that.
a month ago
Make sure that your username credential is correct, from what I remember Railway uses the railway username.
I double checked all the credentials. which didn't change since yesterday, when everything was fine
my suspicion is that some networking stuff broke badly during the incident earlier
as a separate example, for some reason I can't write to Qdrant.io but I can read. bizarre
a month ago
Any connection issues would be on your end. We have verified that the database is accessible via both private and public connections.
a month ago
There could be a firewall on your end blocking the current TCP port that is in use. Generating a new TCP proxy will get you a different port.
again, this was working before the incident. I had zero problems. so it's not my firewall
a month ago
Again, we have verified that the database is accessible via both private and public connections.
if it was my firewall I would have trouble accessing both dev and prod. and since nothing changed in my firewall between yesterday and now, it is not the issue
a month ago
I'm sorry, but I don't know what to tell you at this point. The database is accessible to the public internet without issue.
a month ago
We have internal tools that I couldn't disclose, but I can show you Telnet being able to communicate.

a month ago
Is that another database? As it differs from her screenshot (the proxy domain)
telnet doesn't tell me anything really. a real test would be to connect from outside using the same approach (via the postgres connection)
a month ago
sorry!
a month ago
I am most certainly outside. I am not anywhere near a Railway data center.
a month ago
When using a database client, I get a "wrong password" error instead of a connection failure like you're experiencing.

a month ago
Also, I can see that DBeaver is showing some information about the database (like versions) so I'm guessing the issue here are credentials. I also believe that you're copying the right password.
a month ago
That would be the best option, but I'm unsure if changing the environment variable would make a difference. You would need to SSH into the container and change it manually.
a month ago
We have a way to regen the password in our UI now.
a month ago
Do you get any kind of errors when trying it?
a month ago
Maybe the password regeneration feature uses the current credentials to reset it?
the issue with even regenerating the password may be a hint about what's happening under the hood
I was an idiot and accidentally leaked my dev password on Github today and had no issue making a new password for my dev DB via the same UI
a month ago
Same happening to me. Website completely down with Redis errors.
shxkm
Same happening to me. Website completely down with Redis errors.
a month ago
Hey, would it be possible to open a help thread about your problem? It helps us organize threads and get you a faster response.
passos
Hey, would it be possible to open a help thread about your problem? It helps us organize threads and get you a faster response.
a month ago
I did. I replied here so it doesn’t get treated as a misconfiguration or isolated incident.
I'm trying to use this in my development environment and I either get refused connections or timeouts (the former for the internal networking, the latter for the public URL). not very promising, unless I'm doing something dumb
proxy works, but it's dumb that I have to use that for internal services talking to each other
brody
shxkm, please open your own thread.
a month ago
As I said clearly, I DID open my own thread. But Railway tends to not acknowledge its own bugs and issues so I replied here as a “me too”.
By the way, the thread you told me to open has ZERO replies from Railway employees. My production app has been down for more than 12 hours. Friday is the busiest day for my app. I wish I didn’t move here from Heroku.
https://station.railway.com/questions/redis-ttimeouts-all-over-site-not-respo-e871fa03
shxkm
As I said clearly, I DID open my own thread. But Railway tends to not acknowledge its own bugs and issues so I replied here as a “me too”.By the way, the thread you told me to open has ZERO replies from Railway employees. My production app has been down for more than 12 hours. Friday is the busiest day for my app. I wish I didn’t move here from Heroku.https://station.railway.com/questions/redis-ttimeouts-all-over-site-not-respo-e871fa03
a month ago
I'm lucky that my app is just a project for myself and a few friends, but unfortunately for me I rely on it heavily and it really ruined my day to have it down for that many hours.
lbds137
I'm lucky that my app is just a project for myself and a few friends, but unfortunately for me I rely on it heavily and it really ruined my day to have it down for that many hours.
a month ago
I have hundreds of customers. Some of them will be issuing refund requests because of this.
I hope you learned your lesson because I surely learned mine.
shxkm
I have hundreds of customers. Some of them will be issuing refund requests because of this.I hope you learned your lesson because I surely learned mine.
a month ago
Unfortunately I don't have a ton of energy to do lots of manual infrastructure provisioning myself, which is why I use Railway. This is the first major problem I've had since starting to use the service a few months ago, but admittedly my application got more complex recently, with multiple interconnected services. It's frustrating because most of it works, but because my app is for an AI use case, RAG is very important to me, and my connection to an external Qdrant provider has been failing repeatedly. I was advised that Railway now offers Qdrant, so I've been trying to sync my stuff to stay in the ecosystem, but even that has been failing badly due to timeouts. I'm really not impressed and hope that this gets resolved soon so I can go back to actually using my app rather than losing sleep over it. I stayed up till like 3:30am and I'm kinda screwed for today because I have an AWS certification exam that I'll be taking on very little sleep.
I'll have to check the status of my app when I have a chance but it would be helpful to know if anyone is looking into my issues or if I have to pull teeth and do most of the debugging myself
hmm, I was able to regen the password and can connect locally again. thanks to whoever fixed it
that particular image doesn't give me access to the database tab like a regular Postgres instance
a month ago
that's expected as it isn't a database template made by Railway
a month ago
I can't confirm it right now but can't you install the pgvector extensions directly onto the official template?
a month ago
you would need to do it via SQL then, that modal is detecting it as an official template when it's not
don't you love it when you're vibe coding and the AI leaks your password 🙃 happened twice now
a month ago
a #🤗|feedback thread about it is more than welcome :)
a month ago
can't you tell your vibe coding tool to ignore .env files?
a month ago
!s
Status changed to Solved passos • about 1 month ago








