4 months ago
My project industrious-analysis has been experiencing Redis timeouts ever since the earlier outage, and despite trying to redeploy all services after the outage was declared resolved, I am still experiencing the issue. I don't know what else to try at this point since I haven't changed anything on my end since last night.
162 Replies
4 months ago
Does the Redis service show any logs?
4 months ago
nothing that seems notable. I am not sure the request is even making it there
4 months ago
I suspect a network issue
4 months ago
Are you using the private service URL to connect to it?
4 months ago
yes
4 months ago
What exactly are the error messages you're getting?
4 months ago
timeouts
4 months ago
Could you send the exact error message?
4 months ago
well that was weird. it's working now despite me not doing anything??
4 months ago
hmm it is likely to time out again though
4 months ago
it worked once in prod and now it's taking forever in dev
4 months ago
BTW the outage broke both my environments
4 months ago
@name:"api-gateway" AND @err.message:"failed with status code 500" AND @err.stack:"Error: failed with status code 500\n at onResFinished (/app/node_modules/.pnpm/pino-http@11.0.0/node_modules/pino-http/logger.js:115:39)\n at ServerResponse.onResponseComplete (/app/node_modules/.pnpm/pino-http@11.0.0/node_modules/pino-http/logger.js:178:14)\n at ServerResponse.emit (node:events:531:35)\n at onFinish (node:_http_outgoing:1082:10)\n at callback (node:internal/streams/writable:766:21)\n at afterWrite (node:internal/streams/writable:710:5)\n at afterWriteTick (node:internal/streams/writable:696:10)\n at process.processTicksAndRejections (node:internal/process/task_queues:89:21)" AND @err.type:"Error" 4 months ago
@req.id:3 AND @req.method:"POST" AND @req.remoteAddress:"fd12:59bf:c23a:0:a000:75:159b:3779" AND @req.remotePort:38392 AND @req.url:"/ai/generate?wait=true" AND @req.query.wait:"true" AND @req.headers.accept:"*/*" AND @req.headers.accept-encoding:"gzip, deflate" AND @req.headers.accept-language:"*" AND @req.headers.connection:"keep-alive" AND @req.headers.content-length:"56045" AND @req.headers.content-type:"application/json" AND @req.headers.host:"api-gateway.railway.internal:3000" AND @req.headers.sec-fetch-mode:"cors" AND @req.headers.user-agent:"node" 4 months ago
I can't find redis failures anymore after all the redeployments
4 months ago
I hate doing this from my phone
4 months ago
I haven't been home all day
4 months ago
lol only one message made it through. we're back to timeouts
4 months ago

4 months ago
it's still broken. ugh
4 months ago
I was hoping it would be a transient thing that would self heal but nope it's been hoooours
4 months ago
stopped working a bit after lunchtime
4 months ago
I'll work on adding better logging in the meantime
4 months ago
Railway nightmare today
4 months ago
grrr now my production Postgres is hanging and the redeploy is too
today has been a bad day
it finally deployed but I still can't connect to it. wtf
doesn't even work on the website
4 months ago
about the Postgres service, do you see any kind of error logs?
4 months ago
I redeployed it like twice. it says active but nothing shows up here

4 months ago
can't even connect via the website

4 months ago
can you remove your current deployment and then try a fresh deployment by hitting CTRL + K and then deploy latest commit
4 months ago
Can I get a link to that service?
4 months ago
4 months ago
it doesn't give me that option but I did deploy source image or whatever
4 months ago
That option doesnt exist on a postgres image. It's alright
4 months ago
waiting for it to come online and looking at it
4 months ago
Thats indeed odd, I'll wait for the team response on that, sorry for any issue.
4 months ago
it's taking its sweet time
4 months ago
this is like, my third attempt at redeploying it
4 months ago
The creating containers step randomly might take some time. We are working on a project that will fix that and a few other issues to make things much faster
4 months ago
I'm really frustrated that both my dev and prod environments stopped working after lunch today and have been hosed for hours
4 months ago
even now the dev bot isn't working
4 months ago
the incident earlier broke my app
4 months ago
The degraded performance in US-East?
4 months ago
yes. everything was fine until that incident
4 months ago
afterwards? it has been broken all day
4 months ago
tbh it might also be partly a Qdrant issue. my app logs show it hanging a lot. or it's a networking thing idk. AWS has been unreliable lately
4 months ago
Is your Qdrant hosted on AWS?
4 months ago
I don't see qdrant on this project
4 months ago
I hosted it externally because I didn't know any better. does Railway offer hosting for it?
4 months ago
it's on qdrant.io
4 months ago
and they use us-east-1 for my cluster
4 months ago
Railway does! I host my qdrant instances on Railway always
4 months ago
hmm then it would be good to just migrate
4 months ago
Direct link for it:
4 months ago
4 months ago
beat me to it
4 months ago
it shows as active but the db connection still isn't working
4 months ago
Interesting… Let me check something internally
4 months ago
lol it's still unable to be connected to. I'm working on setting up Qdrant on Railway and I hope it actually works
4 months ago
Need to bring this up to the right team. You should be good to connect with an external DB viewer like tableplus
4 months ago
going to get this fixed though
4 months ago
it's not working
4 months ago
I use Jetbrains and their DB connectivity stuff
4 months ago
I can connect to my DEV postgres but not PROD
4 months ago
Fixed (Paulo the goat)
Reload yor web page and it should be good!
4 months ago
the web page works but the external connection is still not working
4 months ago
tbh I don't like using the web page because it's so limited
4 months ago
hard to do any serious database management
4 months ago
it's handy when I'm on my phone but that's about it
4 months ago
External connections to Postgres are working.
4 months ago
my dev one works fine, but prod doesn't
4 months ago
I double checked the password and port on the proxy and everything. it was working before today's mess
4 months ago
Prod works fine with regard to both private and public connections; we have verified that.
4 months ago


4 months ago
as I said, one works, the other fails
4 months ago
Make sure that your username credential is correct, from what I remember Railway uses the railway username.
4 months ago
they use postgres for the username actually
4 months ago
I double checked all the credentials. which didn't change since yesterday, when everything was fine
4 months ago
my suspicion is that some networking stuff broke badly during the incident earlier
4 months ago
as a separate example, for some reason I can't write to Qdrant.io but I can read. bizarre
4 months ago
Any connection issues would be on your end. We have verified that the database is accessible via both private and public connections.
4 months ago
the proxy domain might be broken idk
4 months ago
I guess I can try generating a new one
4 months ago
There could be a firewall on your end blocking the current TCP port that is in use. Generating a new TCP proxy will get you a different port.
4 months ago
tried a new domain and same issue
4 months ago
again, this was working before the incident. I had zero problems. so it's not my firewall
4 months ago
Again, we have verified that the database is accessible via both private and public connections.
4 months ago
if it was my firewall I would have trouble accessing both dev and prod. and since nothing changed in my firewall between yesterday and now, it is not the issue
4 months ago
basic process of elimination
4 months ago
I'm sorry, but I don't know what to tell you at this point. The database is accessible to the public internet without issue.
4 months ago
how exactly are you verifying that?
4 months ago
I am using the TCP proxy feature
4 months ago
We have internal tools that I couldn't disclose, but I can show you Telnet being able to communicate.

4 months ago
Is that another database? As it differs from her screenshot (the proxy domain)
4 months ago
telnet doesn't tell me anything really. a real test would be to connect from outside using the same approach (via the postgres connection)
4 months ago
*her screenshot
4 months ago
and I changed it already as a debugging step
4 months ago
sorry!
4 months ago
it's the correct proxy

4 months ago
I am most certainly outside. I am not anywhere near a Railway data center.
4 months ago
When using a database client, I get a "wrong password" error instead of a connection failure like you're experiencing.

4 months ago
still fails on the new proxy

4 months ago
I have pasted the password so many times
4 months ago
Also, I can see that DBeaver is showing some information about the database (like versions) so I'm guessing the issue here are credentials. I also believe that you're copying the right password.
4 months ago
maybe I should regen the password?
4 months ago
That would be the best option, but I'm unsure if changing the environment variable would make a difference. You would need to SSH into the container and change it manually.
4 months ago
We have a way to regen the password in our UI now.
4 months ago
it didn't work lol
4 months ago
I tried regen and it didn't make a new password
4 months ago
so not even the UI for the password regen is working
4 months ago
it worked earlier when I generated a new dev password
4 months ago
Do you get any kind of errors when trying it?
4 months ago
nope, just doesn't actually update it
4 months ago
Maybe the password regeneration feature uses the current credentials to reset it?
4 months ago
oh actually it popped up an error that said failed to fetch
4 months ago
this time it didn't say "failed to fetch"

4 months ago
the issue with even regenerating the password may be a hint about what's happening under the hood
4 months ago
I was an idiot and accidentally leaked my dev password on Github today and had no issue making a new password for my dev DB via the same UI
4 months ago
Same happening to me. Website completely down with Redis errors.
shxkm
Same happening to me. Website completely down with Redis errors.
4 months ago
Hey, would it be possible to open a help thread about your problem? It helps us organize threads and get you a faster response.
4 months ago
I tried again and same issue
passos
Hey, would it be possible to open a help thread about your problem? It helps us organize threads and get you a faster response.
4 months ago
I did. I replied here so it doesn’t get treated as a misconfiguration or isolated incident.
4 months ago
I'm trying to use this in my development environment and I either get refused connections or timeouts (the former for the internal networking, the latter for the public URL). not very promising, unless I'm doing something dumb
4 months ago
trying the TCP proxy now I guess
4 months ago
proxy works, but it's dumb that I have to use that for internal services talking to each other
brody
shxkm, please open your own thread.
4 months ago
As I said clearly, I DID open my own thread. But Railway tends to not acknowledge its own bugs and issues so I replied here as a “me too”.
By the way, the thread you told me to open has ZERO replies from Railway employees. My production app has been down for more than 12 hours. Friday is the busiest day for my app. I wish I didn’t move here from Heroku.
https://station.railway.com/questions/redis-ttimeouts-all-over-site-not-respo-e871fa03
shxkm
As I said clearly, I DID open my own thread. But Railway tends to not acknowledge its own bugs and issues so I replied here as a “me too”.By the way, the thread you told me to open has ZERO replies from Railway employees. My production app has been down for more than 12 hours. Friday is the busiest day for my app. I wish I didn’t move here from Heroku.https://station.railway.com/questions/redis-ttimeouts-all-over-site-not-respo-e871fa03
4 months ago
I'm lucky that my app is just a project for myself and a few friends, but unfortunately for me I rely on it heavily and it really ruined my day to have it down for that many hours.
lbds137
I'm lucky that my app is just a project for myself and a few friends, but unfortunately for me I rely on it heavily and it really ruined my day to have it down for that many hours.
4 months ago
I have hundreds of customers. Some of them will be issuing refund requests because of this.
I hope you learned your lesson because I surely learned mine.
shxkm
I have hundreds of customers. Some of them will be issuing refund requests because of this.I hope you learned your lesson because I surely learned mine.
4 months ago
Unfortunately I don't have a ton of energy to do lots of manual infrastructure provisioning myself, which is why I use Railway. This is the first major problem I've had since starting to use the service a few months ago, but admittedly my application got more complex recently, with multiple interconnected services. It's frustrating because most of it works, but because my app is for an AI use case, RAG is very important to me, and my connection to an external Qdrant provider has been failing repeatedly. I was advised that Railway now offers Qdrant, so I've been trying to sync my stuff to stay in the ecosystem, but even that has been failing badly due to timeouts. I'm really not impressed and hope that this gets resolved soon so I can go back to actually using my app rather than losing sleep over it. I stayed up till like 3:30am and I'm kinda screwed for today because I have an AWS certification exam that I'll be taking on very little sleep.
4 months ago
I'll have to check the status of my app when I have a chance but it would be helpful to know if anyone is looking into my issues or if I have to pull teeth and do most of the debugging myself
4 months ago
hmm, I was able to regen the password and can connect locally again. thanks to whoever fixed it
4 months ago
having issues with creating collections on Railway Qdrant now
4 months ago
ugh
4 months ago
if it's not one thing it's another
4 months ago
nvm it's listing collections that keeps timing out
4 months ago
either way
4 months ago
and deleting apparently
4 months ago
I'm gonna just give up on Qdrant tbh. too much hassle
4 months ago
at least my Postgres is working. I can switch to pgvector and call it a day
4 months ago
ok new issue - how do I rotate credentials with pgvector?
4 months ago
that particular image doesn't give me access to the database tab like a regular Postgres instance
4 months ago
that's expected as it isn't a database template made by Railway
4 months ago
how do I do it then?

4 months ago
that popup is wrong because there is no such tab
4 months ago
I can't confirm it right now but can't you install the pgvector extensions directly onto the official template?
4 months ago
I tried
4 months ago
it's not a thing
4 months ago
I wouldn't have made a new db if I could have just intalled that
4 months ago
you would need to do it via SQL then, that modal is detecting it as an official template when it's not
4 months ago
yeah that's what I ended up doing
4 months ago
this is a usability problem though
4 months ago
that modal is not applicable to this template
4 months ago
don't you love it when you're vibe coding and the AI leaks your password 🙃 happened twice now
4 months ago
a #🤗|feedback thread about it is more than welcome :)
4 months ago
can't you tell your vibe coding tool to ignore .env files?
4 months ago
.env is already ignored
4 months ago
it committed it to a todo list file 🤦♀️
4 months ago
the credentials are rotated now. but man that was annoying
4 months ago
done: #pgvector template doesn't allow UI-based password rotation
4 months ago
alright I'm good to close this thread
4 months ago
got my stuff fixed after a very stressful couple of days
4 months ago
!s
Status changed to Solved passos • 4 months ago