Internal Mesh Outage

Question

Hi!

We've encountered this same problem twice now (May 4 and again on May 22), where intermittent connectivity issues caused the internal mesh to go down, our backend couldn't communicate with other services via private networking, and external traffic returned SSL handshake failures and 529s. Both times affected Southeast Asia (Singapore).

Two questions:

1. Is this a known recurring issue with the networking control plane / edge proxy? What's being done structurally to prevent it from happening again?

2. Is there anything we can do architecturally to survive mesh outages? We considered aggressive health checks to trigger a redeploy, since on May 22, a redeploy seemed to have fixed it but that doesn't and might not fix the root cause, which is on the network layer. Are there patterns you'd recommend (e.g. public networking fallback, multi-region) for services that need higher availability?

![photo_2026-05-27_21.34.10.jpeg](https://station-server.railway.com/attachments/att_01ksnp5wpkfk68rfckf340vvhe)