Internal Mesh Outage
alfredbenoel
PROOP

17 days ago

Hi!

We've encountered this same problem twice now (May 4 and again on May 22), where intermittent connectivity issues caused the internal mesh to go down, our backend couldn't communicate with other services via private networking, and external traffic returned SSL handshake failures and 529s. Both times affected Southeast Asia (Singapore).

Two questions:

  1. Is this a known recurring issue with the networking control plane / edge proxy? What's being done structurally to prevent it from happening again?

  2. Is there anything we can do architecturally to survive mesh outages? We considered aggressive health checks to trigger a redeploy, since on May 22, a redeploy seemed to have fixed it but that doesn't and might not fix the root cause, which is on the network layer. Are there patterns you'd recommend (e.g. public networking fallback, multi-region) for services that need higher availability?

photo_2026-05-27_21.34.10.jpeg

Attachments

9 Replies

(+1 on the today, had a ECONNRESET at 12pm utc)



alfredbenoel
PROOP

17 days ago

@Brody @angelo


alfredbenoel
PROOP

15 days ago

@Railway


did you observe this again?


alfredbenoel
PROOP

15 days ago

hey @angelo we didnt observe this again, but we wanted to better understand why this happens so we can have more proactive measures to recover from this


alfredbenoel
PROOP

6 days ago

hey @angelo @Brody this just happened to us again, we re-deployed twice and it only worked on the 2nd re-deployment.

could we please get some guidance and visibility into why this happens and what we can do to fix it


6 days ago

Please do not ping the team - #🛂|readme #5


alfredbenoel
PROOP

6 days ago

really sorry about that but we just faced the same problem again and we still havent gotten any help on this matter


Welcome!

Sign in to your Railway account to join the conversation.

Loading...