requests timing out
andremaytorena
PROOP

a month ago

Our requests to our API have been timing out/just never responding. Started happening earlier today, then stopped. I redeployed the API and nothing's working. This is very urgent, nothing in our setup has changed and its been fine all week. This is happening to about 6 different databases.

Solved

78 Replies

andremaytorena
PROOP

a month ago

Screenshot_2026-05-12_at_03.28.06.png

Attachments


andremaytorena
PROOP

a month ago

Screenshot_2026-05-12_at_03.29.31.png

Attachments


What region is this deployed in?


andremaytorena
PROOP

a month ago

us east


andremaytorena
PROOP

a month ago

All requests are failing righ tnow


andremaytorena
PROOP

a month ago

No errors on the application or db


andremaytorena
PROOP

a month ago

Never had this issue before, any clues? could it be railway issue?


andremaytorena
PROOP

a month ago

Some requests are now going through


andremaytorena
PROOP

a month ago

Ok they aren't failing anymore


andremaytorena
PROOP

a month ago

So this is most likely not an app issue on my end or atleast I can't see anything leading me to believe that


andremaytorena
PROOP

a month ago

This happened again like 3 hours ago, went to bed, just woke up to many messages, so would like some confirmation from someone if this is on my side or railway if possible


andremaytorena
PROOP

a month ago

Any ideas? It’s on and off


andremaytorena
PROOP

a month ago

Would adding replicas help against this?


andremaytorena
PROOP

a month ago

Idk how accurate this is since it's AI, but just want to try to figure out what caused this

Screenshot_2026-05-12_at_09.57.45.png

Attachments


andremaytorena
PROOP

a month ago

Hi this is happening again currently


andremaytorena
PROOP

a month ago

Pretty urgent, so if anyone has any ideas 🙂


andremaytorena
PROOP

a month ago

Screenshot_2026-05-12_at_18.40.57.png

Attachments


alexop1000
PRO

a month ago

Happening for us too


andremaytorena
PROOP

a month ago

Scaling to 2 replicas didn't work, databases seem fine, redis seems fine, no clue what's going on


andremaytorena
PROOP

a month ago

499 errors?


alexop1000
PRO

a month ago

Yeah


alexop1000
PRO

a month ago

image.png

Attachments


andremaytorena
PROOP

a month ago

anyone from the team could confirm if it's an app issue?


andremaytorena
PROOP

a month ago

pls


andremaytorena
PROOP

a month ago

Checking DB logs:

2026-05-12 16:49:27.871 UTC [37676] LOG:  unexpected EOF on client connection with an open transaction
2026-05-12 16:49:27.880 UTC [37671] LOG:  could not receive data from client: Connection reset by peer
2026-05-12 16:49:27.890 UTC [37675] LOG:  could not receive data from client: Connection reset by peer
2026-05-12 16:49:27.884 UTC [37671] LOG:  unexpected EOF on client connection with an open transaction
2026-05-12 16:49:27.890 UTC [37675] LOG:  unexpected EOF on client connection with an open transaction
2026-05-12 16:49:28.634 UTC [37604] LOG:  could not receive data from client: Connection reset by peer
2026-05-12 16:49:28.635 UTC [37682] LOG:  could not receive data from client: Connection reset by peer```

andremaytorena
PROOP

a month ago

This was after a postgres restart


andremaytorena
PROOP

a month ago

Redeploy the db now is just stuck on deploying and innaccessible

Screenshot_2026-05-12_at_18.55.16.png

Attachments


alexop1000
PRO

a month ago

I am not using postgres btw. Could be an internal networking issue, since I'm using Dragonfly


andremaytorena
PROOP

a month ago

one of my postgres wont even deploy anymore 🙁


andremaytorena
PROOP

a month ago

here's more logs:

    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/app/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in full_dispatch_request
    rv = self.ensure_sync(before_func)()
    g.db_conn = get_connection()
  File "/app/app/db.py", line 29, in get_connection
    _pools[host] = pool.ThreadedConnectionPool(1, 20, dsn)
  File "/app/.venv/lib/python3.10/site-packages/psycopg2/pool.py", line 59, in __init__
  File "/app/.venv/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
psycopg2.OperationalError: connection to server at "postgres-88e0f11e.railway.internal" (fd12:685a:3643:0:a000:2f:ca06:13c5), port 5432 failed: Connection timed out
    Is the server running on that host and accepting TCP/IP connections?```

andremaytorena
PROOP

a month ago

still happening to you?


alexop1000
PRO

a month ago

Yes


alexop1000
PRO

a month ago

Switching to external redis connection for the time being


andremaytorena
PROOP

a month ago

gonna try that


andremaytorena
PROOP

a month ago

Didn't work for me, switched redis and postgres to public


alexop1000
PRO

a month ago

I meant that I started a redis on Upstash and pointed my app to the external url


andremaytorena
PROOP

a month ago

ahh


andremaytorena
PROOP

a month ago

that fixed it?


alexop1000
PRO

a month ago

Yeah


alexop1000
PRO

a month ago

Something is breaking with the internal networking


andremaytorena
PROOP

a month ago

ok i disabled all redis and seems to work


andremaytorena
PROOP

a month ago

Still not working for me


andremaytorena
PROOP

a month ago

ffs


andremaytorena
PROOP

a month ago

Please can someone from the team take a look


andremaytorena
PROOP

a month ago

it's now been an hour of constant 499 errors


Anonymous
FREE

a month ago

Your workspace has been restricted. You cannot create new resources.

Contact Support why


andremaytorena
PROOP

a month ago

The server works for the first 10 seconds after a redeploy, after that it goes abck to 499 errors, we've had this API running for a year+ and never have enoucntered these issues, nor have we changed anything recently to cause this


alexop1000
PRO

a month ago

Okay this has escalated to full downtime now, is it possible to get someone on this?


andremaytorena
PROOP

a month ago

^ Anyone


alexop1000
PRO

a month ago

There's an AWS us-east-1 outage, could be because of that


andremaytorena
PROOP

a month ago

do u have a link to that?


andremaytorena
PROOP

a month ago

strange that would affect us tho



alexop1000
PRO

a month ago

Or Claude is lying Claude was lying

image.png

Attachments


andremaytorena
PROOP

a month ago

I don’t see it


andremaytorena
PROOP

a month ago

I dont see it


andremaytorena
PROOP

a month ago

Anyone? pls


andremaytorena
PROOP

a month ago

@Alex Op seems to be recovering, you?


alexop1000
PRO

a month ago

Yeah


andremaytorena
PROOP

a month ago

Can anyone from the team advise? if it was an app issue it would most likely not get resolved by itself no?


a month ago

!t


Status changed to Awaiting Railway Response Railway about 1 month ago


andremaytorena
PROOP

a month ago

If anyone from the team does have an idea on the cause would love to know now, just don’t want it happening overnight again


andremaytorena
PROOP

a month ago

Hi, wanted to check in if there's an update?


andremaytorena
PROOP

a month ago

Hi, reaching out again, this happened multiple times, and don't want this to get looked past


codydearkland
EMPLOYEE

a month ago

Hey Andre — sorry for the delay here; it took a bit to dig up all the details. The issue was on a physical machine in our us-east region that hosts your services and databases. One of its network cables started failing late May 11 and kept flapping — dropping and recovering on its own — which is why you saw intermittent timeouts across that whole window instead of one clean outage.

It also explains why redeploys, scaling replicas, and switching to public DB URLs didn't help: the problem was at the host's network layer, not your app or your DB config. A platform engineer fully isolated the bad cable on May 12, which is when your Postgres recovered, and your remaining services settled that day. A technician was onsite the next day to physically replace the failed hardware, so it can't recur the same way.


Status changed to Awaiting User Response Railway 29 days ago


andremaytorena
PROOP

a month ago

Hi I appreciate the response, is there anything we can do to mitigate this in the future? I understand things break, but honestly lately it feels like I'm on these threads pretty often due to issues, and without communication especially when our production servers are just failing, it just puts me in a position where I have no idea what to do but wait. I love railway and do not want to migrate away, but nothings been said about what the plans are to mitigate these types of issues, or at least better recognition or support when they do arise, as I had to wait a whole day for it to be acknowledged when our whole server is absolutely down. Which also means I have to waste hours of my time trying to fix the server issues and paying attention to my clients as they continue to message. I'd just like to know if there's anything ongoing to try to avoid these problems or atleast just some better communication with the users it impacts.


Status changed to Awaiting Railway Response Railway 29 days ago


codydearkland
EMPLOYEE

a month ago

Yeah, before I say anything else, I just want to say - I hear you, and totally get where you're coming from on it. The feedback is well heard, and obviously while I can't copy/paste international conversations, know that its something we're spending a lot of energy on internally to improve on (re: stability). Things are scaling really quickly, and like you mentioned - things do break - but ultimately, it doesn't feel great on the receiving end.

There's a lot in flight to improve stability in these areas, and a lot of it is already in place - despite some of these bumps that come up.

Your ask is very fair (better comms, sooner, and ways to mitigate). I can for sure take that feedback into the team, but its totally something we're taking very serious right now.

Really sorry about how this one played out, and the lag in response. We'll keep chasing down making it better.

On this one, let me know if there's anything I can do to help calm the worries/frustrations down. I've got this thread tagged - ill be checking back in.


Status changed to Awaiting User Response Railway 28 days ago


andremaytorena
PROOP

a month ago

Hey, I completely understand, I guess there's not much that can be done now, I just hope in the future these issues will be responded to faster. Without any responses I can't tell my clients what's going on, and I can't even guarantee them when the dashboard will be up which is obviously a huge issue.


Status changed to Awaiting Railway Response Railway 27 days ago


a month ago

Completely agree, but I feel obligated to mention that this was posted on Discord, the team does not actively monitor Discord and Discord threads do not come with any response guarantees whatsoever.

The Central Station would be the correct place to open a thread when the issue pertains to the Railway platform.


andremaytorena
PROOP

a month ago

Good to know now, wasn't clear at first since discord threads appear in my threads in the central station


a month ago

They are bridged to Central Station, but they are not put in front of the admin view.


andremaytorena
PROOP

a month ago

got it


a month ago

Though in this case this thread was because Noah escalated it, but that's not something to count on always happening.


marw11n
PRO

a month ago

image.png

Attachments


marw11n
PRO

a month ago

I just woke up to this


andremaytorena

Hey, I completely understand, I guess there's not much that can be done now, I just hope in the future these issues will be responded to faster. Without any responses I can't tell my clients what's going on, and I can't even guarantee them when the dashboard will be up which is obviously a huge issue.

codydearkland
EMPLOYEE

a month ago

Totally fair take on this, and I agree. Thanks for being open to talking about it, and keeping the feedback coming. Let us/me know how we can help.


Status changed to Awaiting User Response Railway 26 days ago


andremaytorena
PROOP

24 days ago

I guess this didn't age well, server down for the whole night :(


Status changed to Awaiting Railway Response Railway 24 days ago


Hey there,

Closing the loop on this one: the May 11-12 timeouts were caused by a flapping network cable in our US East region, which our platform team isolated on the 12th and the on-site tech replaced the next day. Apologies for the disruption.

The May 20 overnight issue you mentioned was a separate event: our GCP cloud account got auto-restricted, taking down our API for a few hours. Full writeup here: https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage. Recovery is complete on our side.

For future urgent issues, posting via Central Station is the right channel; Discord isn't actively monitored by support and doesn't carry response guarantees.

Thanks,

Angelo


Status changed to Awaiting User Response Railway 23 days ago


Railway
BOT

16 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 16 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...