requests timing out

andremaytorena

PROOP

2 months ago

Our requests to our API have been timing out/just never responding. Started happening earlier today, then stopped. I redeployed the API and nothing's working. This is very urgent, nothing in our setup has changed and its been fine all week. This is happening to about 6 different databases.

Solved

78 Replies

andremaytorena

PROOP

2 months ago

Screenshot_2026-05-12_at_03.28.06.png

Attachments

Screenshot_...

andremaytorena

PROOP

2 months ago

Screenshot_2026-05-12_at_03.29.31.png

Attachments

Screenshot_...

0x5b62656e5d

MODERATOR

2 months ago

What region is this deployed in?

andremaytorena

PROOP

2 months ago

us east

andremaytorena

PROOP

2 months ago

All requests are failing righ tnow

andremaytorena

PROOP

2 months ago

No errors on the application or db

andremaytorena

PROOP

2 months ago

Never had this issue before, any clues? could it be railway issue?

andremaytorena

PROOP

2 months ago

Some requests are now going through

andremaytorena

PROOP

2 months ago

Ok they aren't failing anymore

andremaytorena

PROOP

2 months ago

So this is most likely not an app issue on my end or atleast I can't see anything leading me to believe that

andremaytorena

PROOP

2 months ago

This happened again like 3 hours ago, went to bed, just woke up to many messages, so would like some confirmation from someone if this is on my side or railway if possible

andremaytorena

PROOP

2 months ago

Any ideas? It’s on and off

andremaytorena

PROOP

2 months ago

Would adding replicas help against this?

andremaytorena

PROOP

2 months ago

Idk how accurate this is since it's AI, but just want to try to figure out what caused this

Screenshot_2026-05-12_at_09.57.45.png

Attachments

Screenshot_...

andremaytorena

PROOP

2 months ago

Hi this is happening again currently

andremaytorena

PROOP

2 months ago

Pretty urgent, so if anyone has any ideas 🙂

andremaytorena

PROOP

2 months ago

Screenshot_2026-05-12_at_18.40.57.png

Attachments

Screenshot_...

alexop1000

PRO

2 months ago

Happening for us too

andremaytorena

PROOP

2 months ago

Scaling to 2 replicas didn't work, databases seem fine, redis seems fine, no clue what's going on

andremaytorena

PROOP

2 months ago

499 errors?

alexop1000

PRO

2 months ago

Yeah

alexop1000

PRO

2 months ago

Attachments

image.png

andremaytorena

PROOP

2 months ago

anyone from the team could confirm if it's an app issue?

andremaytorena

PROOP

2 months ago

pls

andremaytorena

PROOP

2 months ago

Checking DB logs:

2026-05-12 16:49:27.871 UTC [37676] LOG:  unexpected EOF on client connection with an open transaction
2026-05-12 16:49:27.880 UTC [37671] LOG:  could not receive data from client: Connection reset by peer
2026-05-12 16:49:27.890 UTC [37675] LOG:  could not receive data from client: Connection reset by peer
2026-05-12 16:49:27.884 UTC [37671] LOG:  unexpected EOF on client connection with an open transaction
2026-05-12 16:49:27.890 UTC [37675] LOG:  unexpected EOF on client connection with an open transaction
2026-05-12 16:49:28.634 UTC [37604] LOG:  could not receive data from client: Connection reset by peer
2026-05-12 16:49:28.635 UTC [37682] LOG:  could not receive data from client: Connection reset by peer```

andremaytorena

PROOP

2 months ago

This was after a postgres restart

andremaytorena

PROOP

2 months ago

Redeploy the db now is just stuck on deploying and innaccessible

Screenshot_2026-05-12_at_18.55.16.png

Attachments

Screenshot_...

alexop1000

PRO

2 months ago

I am not using postgres btw. Could be an internal networking issue, since I'm using Dragonfly

andremaytorena

PROOP

2 months ago

one of my postgres wont even deploy anymore 🙁

andremaytorena

PROOP

2 months ago

here's more logs:

    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/app/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in full_dispatch_request
    rv = self.ensure_sync(before_func)()
    g.db_conn = get_connection()
  File "/app/app/db.py", line 29, in get_connection
    _pools[host] = pool.ThreadedConnectionPool(1, 20, dsn)
  File "/app/.venv/lib/python3.10/site-packages/psycopg2/pool.py", line 59, in __init__
  File "/app/.venv/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
psycopg2.OperationalError: connection to server at "postgres-88e0f11e.railway.internal" (fd12:685a:3643:0:a000:2f:ca06:13c5), port 5432 failed: Connection timed out
    Is the server running on that host and accepting TCP/IP connections?```

andremaytorena

PROOP

2 months ago

still happening to you?

alexop1000

PRO

2 months ago

Yes

alexop1000

PRO

2 months ago

Switching to external redis connection for the time being

andremaytorena

PROOP

2 months ago

gonna try that

andremaytorena

PROOP

2 months ago

Didn't work for me, switched redis and postgres to public

alexop1000

PRO

2 months ago

I meant that I started a redis on Upstash and pointed my app to the external url

andremaytorena

PROOP

2 months ago

ahh

andremaytorena

PROOP

2 months ago

that fixed it?

alexop1000

PRO

2 months ago

Yeah

alexop1000

PRO

2 months ago

Something is breaking with the internal networking

andremaytorena

PROOP

2 months ago

ok i disabled all redis and seems to work

andremaytorena

PROOP

2 months ago

Still not working for me

andremaytorena

PROOP

2 months ago

ffs

andremaytorena

PROOP

2 months ago

Please can someone from the team take a look

andremaytorena

PROOP

2 months ago

it's now been an hour of constant 499 errors

Anonymous

FREE

2 months ago

Your workspace has been restricted. You cannot create new resources.

Contact Support why

andremaytorena

PROOP

2 months ago

The server works for the first 10 seconds after a redeploy, after that it goes abck to 499 errors, we've had this API running for a year+ and never have enoucntered these issues, nor have we changed anything recently to cause this

alexop1000

PRO

2 months ago

Okay this has escalated to full downtime now, is it possible to get someone on this?

andremaytorena

PROOP

2 months ago

^ Anyone

alexop1000

PRO

2 months ago

There's an AWS us-east-1 outage, could be because of that

andremaytorena

PROOP

2 months ago

do u have a link to that?

andremaytorena

PROOP

2 months ago

strange that would affect us tho

alexop1000

PRO

2 months ago

https://health.aws.amazon.com/health/status

alexop1000

PRO

2 months ago

~~Or Claude is lying~~ Claude was lying

Attachments

image.png

andremaytorena

PROOP

2 months ago

I don’t see it

andremaytorena

PROOP

2 months ago

I dont see it

andremaytorena

PROOP

2 months ago

Anyone? pls

andremaytorena

PROOP

2 months ago

@Alex Op seems to be recovering, you?

alexop1000

PRO

2 months ago

Yeah

andremaytorena

PROOP

2 months ago

Can anyone from the team advise? if it was an app issue it would most likely not get resolved by itself no?

noahd

EMPLOYEE

2 months ago

Status changed to Awaiting Railway Response Railway • about 2 months ago

andremaytorena

PROOP

2 months ago

If anyone from the team does have an idea on the cause would love to know now, just don’t want it happening overnight again

andremaytorena

PROOP

2 months ago

Hi, wanted to check in if there's an update?

andremaytorena

PROOP

2 months ago

Hi, reaching out again, this happened multiple times, and don't want this to get looked past

codydearkland

EMPLOYEE

2 months ago

Hey Andre — sorry for the delay here; it took a bit to dig up all the details. The issue was on a physical machine in our us-east region that hosts your services and databases. One of its network cables started failing late May 11 and kept flapping — dropping and recovering on its own — which is why you saw intermittent timeouts across that whole window instead of one clean outage.

It also explains why redeploys, scaling replicas, and switching to public DB URLs didn't help: the problem was at the host's network layer, not your app or your DB config. A platform engineer fully isolated the bad cable on May 12, which is when your Postgres recovered, and your remaining services settled that day. A technician was onsite the next day to physically replace the failed hardware, so it can't recur the same way.

Status changed to Awaiting User Response Railway • about 2 months ago

andremaytorena

PROOP

2 months ago

Hi I appreciate the response, is there anything we can do to mitigate this in the future? I understand things break, but honestly lately it feels like I'm on these threads pretty often due to issues, and without communication especially when our production servers are just failing, it just puts me in a position where I have no idea what to do but wait. I love railway and do not want to migrate away, but nothings been said about what the plans are to mitigate these types of issues, or at least better recognition or support when they do arise, as I had to wait a whole day for it to be acknowledged when our whole server is absolutely down. Which also means I have to waste hours of my time trying to fix the server issues and paying attention to my clients as they continue to message. I'd just like to know if there's anything ongoing to try to avoid these problems or atleast just some better communication with the users it impacts.

Status changed to Awaiting Railway Response Railway • about 2 months ago

codydearkland

EMPLOYEE

2 months ago

Yeah, before I say anything else, I just want to say - I hear you, and totally get where you're coming from on it. The feedback is well heard, and obviously while I can't copy/paste international conversations, know that its something we're spending a lot of energy on internally to improve on (re: stability). Things are scaling really quickly, and like you mentioned - things do break - but ultimately, it doesn't feel great on the receiving end.

There's a lot in flight to improve stability in these areas, and a lot of it is already in place - despite some of these bumps that come up.

Your ask is very fair (better comms, sooner, and ways to mitigate). I can for sure take that feedback into the team, but its totally something we're taking very serious right now.

Really sorry about how this one played out, and the lag in response. We'll keep chasing down making it better.

On this one, let me know if there's anything I can do to help calm the worries/frustrations down. I've got this thread tagged - ill be checking back in.

Status changed to Awaiting User Response Railway • about 2 months ago

andremaytorena

PROOP

2 months ago

Hey, I completely understand, I guess there's not much that can be done now, I just hope in the future these issues will be responded to faster. Without any responses I can't tell my clients what's going on, and I can't even guarantee them when the dashboard will be up which is obviously a huge issue.

Status changed to Awaiting Railway Response Railway • about 2 months ago

brody

EMPLOYEE

2 months ago

Completely agree, but I feel obligated to mention that this was posted on Discord, the team does not actively monitor Discord and Discord threads do not come with any response guarantees whatsoever.

The Central Station would be the correct place to open a thread when the issue pertains to the Railway platform.

andremaytorena

PROOP

2 months ago

Good to know now, wasn't clear at first since discord threads appear in my threads in the central station

brody

EMPLOYEE

2 months ago

They are bridged to Central Station, but they are not put in front of the admin view.

andremaytorena

PROOP

2 months ago

got it

brody

EMPLOYEE

2 months ago

Though in this case this thread was because Noah escalated it, but that's not something to count on always happening.

marw11n

PRO

2 months ago

Attachments

image.png

marw11n

PRO

2 months ago

I just woke up to this

andremaytorena

codydearkland

EMPLOYEE

2 months ago

Totally fair take on this, and I agree. Thanks for being open to talking about it, and keeping the feedback coming. Let us/me know how we can help.

Status changed to Awaiting User Response Railway • about 2 months ago

andremaytorena

PROOP

2 months ago

I guess this didn't age well, server down for the whole night :(

Status changed to Awaiting Railway Response Railway • about 2 months ago

angelo-railway

EMPLOYEE

2 months ago

Hey there,

Closing the loop on this one: the May 11-12 timeouts were caused by a flapping network cable in our US East region, which our platform team isolated on the 12th and the on-site tech replaced the next day. Apologies for the disruption.

The May 20 overnight issue you mentioned was a separate event: our GCP cloud account got auto-restricted, taking down our API for a few hours. Full writeup here: https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage. Recovery is complete on our side.

For future urgent issues, posting via Central Station is the right channel; Discord isn't actively monitored by support and doesn't carry response guarantees.

Thanks,

Angelo

Status changed to Awaiting User Response Railway • about 2 months ago

Railway

BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway • about 1 month ago

Welcome!