16 days ago
Is there an ongoing incident? We have elevated error rates on some services.
29 Replies
Status changed to Open Railway • 16 days ago
16 days ago
same across, seems we have issues in general yet to be flagged
16 days ago
same
16 days ago
same
16 days ago
Seeing connection issues to Railway-hosted Redis here too since around 0945 GMT
16 days ago
Same with redis for me
16 days ago
same
redis
16 days ago
@railway Can you please call an incident?! Why does it always take you such a long time??
16 days ago
EU West
16 days ago
No, will do and report back
16 days ago
A lil update, the team is currently investigating the elevated rate of TCP timeouts in EU West.
16 days ago
Same here! It seems every week we have something as a present from Railway!! That's great!
16 days ago
Possibly the same incident (or related): from US-East, TCP proxy connections via tramway.proxy.rlwy.net are reset on idle within 10–45s. Started ~09:45 UTC today.
TCP-level repro:
# DIES in 10-45s:
ssh -o ServerAliveInterval=5 user@tramway.proxy.rlwy.net "sleep 45"
# → client_loop: send disconnect: Connection reset
# LIVES indefinitely (continuous stdout):
ssh ... "for i in $(seq 30); do date; sleep 2; done"Eliminated locally:
- Container health (no OOM, CPU not throttled, sshd auth.log clean, tmux survives → not restarting)
- Container-specific bug (reproduces on two services with different ports/images: 35946 and 41991)
- Client network (reproduces from mobile-data hotspot AND another physical device — different ISPs)
So this is upstream of the container. Variable timing (10-45s, not fixed) suggests probabilistic middlebox eviction. Redeploy won't help — already verified container is healthy.
16 days ago
Hi @0x5b62656e5d — this has happened a few times now where I noticed and reported incident-like behavior hours before it was acknowledged on your side.
Is there a better escalation path or internal-facing channel where I can send early signals when I see this happening? I’m happy to help provide timely heads-ups if that’s useful.
16 days ago
Hi,
Sorry about this issue, incident is updated with details, it should be mitigated now.
I wanted to confirm, are folks are seeing recovery?
Status changed to Awaiting User Response Railway • 16 days ago
lawrencegripperwrk
Hi, Sorry about this issue, incident is updated with details, it should be mitigated now. I wanted to confirm, are folks are seeing recovery?
16 days ago
It looks like it's sorted for now, yes.
Status changed to Awaiting Railway Response Railway • 16 days ago
Status changed to Solved Railway • 16 days ago
16 days ago
Yes, it seems normal again
Status changed to Awaiting Railway Response Railway • 16 days ago
Status changed to Solved haayhappen • 16 days ago
3 days ago
This is happening again.
Status changed to Awaiting Railway Response Railway • 3 days ago
3 days ago
@lawrencegripperwrk @0x5b62656e5d
3 days ago
And it happened yesterday, at the exact same time
haayhappen
And it happened yesterday, at the exact same time
3 days ago
i have smilar problem
3 days ago
i have the same issues
2 days ago
I'm losing my patience here...this is the third day in a row and i have heard NOTHING from the team. PLEASE INVESTIGATE ASAP
2 days ago
It’s an absolute disgrace… and yet you’re still seeing a 99% availability rate… they’re a complete mess – I can’t think of any other word for it.
2 days ago
Are you guys playing? do you have structured workflows? nothing is tested before you apply for changes? We don't have a week where nothing happens, and still you'll say this only affected to 1/10000 services, but unfortunately, we all are affected any time. Horrible & frustrating.


