"Server has closed the connection." - Some services have increased error rates 502
haayhappen
PROOP

16 days ago

Is there an ongoing incident? We have elevated error rates on some services.

$20 Bounty

29 Replies

Status changed to Open Railway 16 days ago


Anonymous
HOBBY

16 days ago

same across, seems we have issues in general yet to be flagged


andremaytorena
PRO

16 days ago

same


panickz1
PRO

16 days ago

same


andreazero
PRO

16 days ago

facing issue here


tomcporter
HOBBY

16 days ago

Seeing connection issues to Railway-hosted Redis here too since around 0945 GMT


andremaytorena
PRO

16 days ago

Same with redis for me


gadatos
PRO

16 days ago

same

redis


haayhappen
PROOP

16 days ago

@railway Can you please call an incident?! Why does it always take you such a long time??


What region are such services hosted in?


haayhappen
PROOP

16 days ago

EU West


Have you tried redeploying the affected services?


haayhappen
PROOP

16 days ago

No, will do and report back


A lil update, the team is currently investigating the elevated rate of TCP timeouts in EU West.


diegogalocha
PRO

16 days ago

Same here! It seems every week we have something as a present from Railway!! That's great!


romanychlogin
PRO

16 days ago

Possibly the same incident (or related): from US-East, TCP proxy connections via tramway.proxy.rlwy.net are reset on idle within 10–45s. Started ~09:45 UTC today.

TCP-level repro:

# DIES in 10-45s:
ssh -o ServerAliveInterval=5 user@tramway.proxy.rlwy.net "sleep 45"
# → client_loop: send disconnect: Connection reset

# LIVES indefinitely (continuous stdout):
ssh ... "for i in $(seq 30); do date; sleep 2; done"

Eliminated locally:

  • Container health (no OOM, CPU not throttled, sshd auth.log clean, tmux survives → not restarting)
  • Container-specific bug (reproduces on two services with different ports/images: 35946 and 41991)
  • Client network (reproduces from mobile-data hotspot AND another physical device — different ISPs)

So this is upstream of the container. Variable timing (10-45s, not fixed) suggests probabilistic middlebox eviction. Redeploy won't help — already verified container is healthy.


haayhappen
PROOP

16 days ago

Hi @0x5b62656e5d — this has happened a few times now where I noticed and reported incident-like behavior hours before it was acknowledged on your side.

Is there a better escalation path or internal-facing channel where I can send early signals when I see this happening? I’m happy to help provide timely heads-ups if that’s useful.


haayhappen
PROOP

16 days ago

A re-deployment did not help by the way



lawrencegripperwrk
EMPLOYEE

16 days ago

Hi,

Sorry about this issue, incident is updated with details, it should be mitigated now.

I wanted to confirm, are folks are seeing recovery?


Status changed to Awaiting User Response Railway 16 days ago


lawrencegripperwrk

Hi, Sorry about this issue, incident is updated with details, it should be mitigated now. I wanted to confirm, are folks are seeing recovery?

diegogalocha
PRO

16 days ago

It looks like it's sorted for now, yes.


Status changed to Awaiting Railway Response Railway 16 days ago


Status changed to Solved Railway 16 days ago


haayhappen
PROOP

16 days ago

Yes, it seems normal again


Status changed to Awaiting Railway Response Railway 16 days ago


Status changed to Solved haayhappen 16 days ago


haayhappen
PROOP

3 days ago

This is happening again.


Status changed to Awaiting Railway Response Railway 3 days ago


haayhappen
PROOP

3 days ago

@lawrencegripperwrk @0x5b62656e5d


haayhappen
PROOP

3 days ago

And it happened yesterday, at the exact same time


haayhappen

And it happened yesterday, at the exact same time

gadatos
PRO

3 days ago

i have smilar problem


samehes
PRO

3 days ago

i have the same issues


haayhappen
PROOP

2 days ago

I'm losing my patience here...this is the third day in a row and i have heard NOTHING from the team. PLEASE INVESTIGATE ASAP


diegogalocha
PRO

2 days ago

It’s an absolute disgrace… and yet you’re still seeing a 99% availability rate… they’re a complete mess – I can’t think of any other word for it.


diegogalocha
PRO

2 days ago

Are you guys playing? do you have structured workflows? nothing is tested before you apply for changes? We don't have a week where nothing happens, and still you'll say this only affected to 1/10000 services, but unfortunately, we all are affected any time. Horrible & frustrating.


Welcome!

Sign in to your Railway account to join the conversation.

Loading...