Service keep going 502 for every few days
razrinn
PROOP

15 days ago

Hello, every 1-2 days i got report from my user that our app is down. They see the cloudflare 502 pages where the host (the railway app) is error. I think it happens for a week already (totally unacceptable). Nothing changed from my app side and the traffic is not even high. For those times, i usually just restart/redeploy the service and it went up again. But, only a matter of time until the service will be down again and i need to keep restarting.

I see the status page here https://status.railway.com/cmltkyu8905sl13amlju6j5yh says it resolved. But it just happen again ~1 hour ago. It's not ideal to keep restarting/redeploying everytime it crashes down. Any suggestion from your side? I might migrate off of railway if this keep going on

Solved

8 Replies

What happened was we had SYN attacks to some workloads. You may have had Cloudflare in front of your app, however, non WAF workloads could have affected the availability. As such we have implemented short terms measures to prevent this moving forward. Since then we've expanded interconnect capacity with new peering and Direct Connect, deployed per-container eBPF firewalls, hardened our edge, and built continuous connectivity monitoring between hosts. We're also rolling out a WAF for all customers, edge-level protection against malicious request patterns across all plans.

Was this one recent?


Status changed to Awaiting User Response Railway 15 days ago


razrinn
PROOP

15 days ago

Yes recent, almost happens daily since last week.


Status changed to Awaiting Railway Response Railway 15 days ago


15 days ago

Then you should be good going forward, but please do let us know if you continue to see this, and we can dig in further!


Status changed to Awaiting User Response Railway 15 days ago


razrinn
PROOP

15 days ago

Okay, let's check in again in a couple of days. Really like the platform, sad to see if the service quality is not optimal.

By the way this is the response time on last 7D, last occurence is ~5 hours ago. You can see sometimes it went up to 30s. I had to restart it to make it work again.


Status changed to Awaiting Railway Response Railway 15 days ago


15 days ago

Noted, please keep an eye on it and let us know if it spikes again.


Status changed to Awaiting User Response Railway 15 days ago


razrinn
PROOP

15 days ago

ummm...

anyway, how to make sure its on my app side or on railway side?

edit:
cpu/ram usage seems normal on my end

Attachments


Status changed to Awaiting Railway Response Railway 15 days ago


15 days ago

Some form of APM and tracing is underrated and underutilized. Adding that to your app would help with so much.


Status changed to Awaiting User Response Railway 15 days ago


razrinn
PROOP

14 days ago

okay i finished set up a monitoring and tracing. for now will resolve the thread. will report again if still happens.

thank you


Status changed to Awaiting Railway Response Railway 14 days ago


Status changed to Solved razrinn 14 days ago


Loading...