"Server has closed the connection." - Some services have increased error rates 502

haayhappen

PROOP

2 months ago

Is there an ongoing incident? We have elevated error rates on some services.

$20 Bounty

29 Replies

Status changed to Open Railway • 2 months ago

Anonymous

HOBBY

2 months ago

same across, seems we have issues in general yet to be flagged

andremaytorena

PRO

2 months ago

same

panickz1

PRO

2 months ago

same

andreazero

PRO

2 months ago

facing issue here

tomcporter

HOBBY

2 months ago

Seeing connection issues to Railway-hosted Redis here too since around 0945 GMT

andremaytorena

PRO

2 months ago

Same with redis for me

gadatos

PRO

2 months ago

same

redis

haayhappen

PROOP

2 months ago

@railway Can you please call an incident?! Why does it always take you such a long time??

0x5b62656e5d

MODERATOR

2 months ago

What region are such services hosted in?

haayhappen

PROOP

2 months ago

EU West

0x5b62656e5d

MODERATOR

2 months ago

Have you tried redeploying the affected services?

haayhappen

PROOP

2 months ago

No, will do and report back

0x5b62656e5d

MODERATOR

2 months ago

A lil update, the team is currently investigating the elevated rate of TCP timeouts in EU West.

diegogalocha

PRO

2 months ago

Same here! It seems every week we have something as a present from Railway!! That's great!

romanychlogin

PRO

2 months ago

Possibly the same incident (or related): from US-East, TCP proxy connections via tramway.proxy.rlwy.net are reset on idle within 10–45s. Started ~09:45 UTC today.

TCP-level repro:

# DIES in 10-45s:
ssh -o ServerAliveInterval=5 user@tramway.proxy.rlwy.net "sleep 45"
# → client_loop: send disconnect: Connection reset

# LIVES indefinitely (continuous stdout):
ssh ... "for i in $(seq 30); do date; sleep 2; done"

Eliminated locally:

Container health (no OOM, CPU not throttled, sshd auth.log clean, tmux survives → not restarting)
Container-specific bug (reproduces on two services with different ports/images: 35946 and 41991)
Client network (reproduces from mobile-data hotspot AND another physical device — different ISPs)

So this is upstream of the container. Variable timing (10-45s, not fixed) suggests probabilistic middlebox eviction. Redeploy won't help — already verified container is healthy.

haayhappen

PROOP

2 months ago

Hi @0x5b62656e5d — this has happened a few times now where I noticed and reported incident-like behavior hours before it was acknowledged on your side.

Is there a better escalation path or internal-facing channel where I can send early signals when I see this happening? I’m happy to help provide timely heads-ups if that’s useful.

haayhappen

PROOP

2 months ago

A re-deployment did not help by the way

0x5b62656e5d

MODERATOR

2 months ago

An incident has been called: https://status.railway.com/incident/V08OD1KI

lawrencegripperwrk

EMPLOYEE

2 months ago

Hi,

Sorry about this issue, incident is updated with details, it should be mitigated now.

I wanted to confirm, are folks are seeing recovery?

Status changed to Awaiting User Response Railway • 2 months ago

lawrencegripperwrk

Hi, Sorry about this issue, incident is updated with details, it should be mitigated now. I wanted to confirm, are folks are seeing recovery?

diegogalocha

PRO

2 months ago

It looks like it's sorted for now, yes.

Status changed to Awaiting Railway Response Railway • 2 months ago

Status changed to Solved Railway • 2 months ago

haayhappen

PROOP

2 months ago

Yes, it seems normal again

Status changed to Awaiting Railway Response Railway • 2 months ago

Status changed to Solved haayhappen • 2 months ago

haayhappen

PROOP

2 months ago

This is happening again.

Status changed to Awaiting Railway Response Railway • about 2 months ago

haayhappen

PROOP

2 months ago

@lawrencegripperwrk @0x5b62656e5d

haayhappen

PROOP

2 months ago

And it happened yesterday, at the exact same time

haayhappen

And it happened yesterday, at the exact same time

gadatos

PRO

2 months ago

i have smilar problem

samehes

PRO

2 months ago

i have the same issues

haayhappen

PROOP

2 months ago

I'm losing my patience here...this is the third day in a row and i have heard NOTHING from the team. PLEASE INVESTIGATE ASAP

diegogalocha

PRO

2 months ago

It’s an absolute disgrace… and yet you’re still seeing a 99% availability rate… they’re a complete mess – I can’t think of any other word for it.

diegogalocha

PRO

2 months ago

Are you guys playing? do you have structured workflows? nothing is tested before you apply for changes? We don't have a week where nothing happens, and still you'll say this only affected to 1/10000 services, but unfortunately, we all are affected any time. Horrible & frustrating.

Welcome!