15 days ago
Hello, every 1-2 days i got report from my user that our app is down. They see the cloudflare 502 pages where the host (the railway app) is error. I think it happens for a week already (totally unacceptable). Nothing changed from my app side and the traffic is not even high. For those times, i usually just restart/redeploy the service and it went up again. But, only a matter of time until the service will be down again and i need to keep restarting.
I see the status page here https://status.railway.com/cmltkyu8905sl13amlju6j5yh says it resolved. But it just happen again ~1 hour ago. It's not ideal to keep restarting/redeploying everytime it crashes down. Any suggestion from your side? I might migrate off of railway if this keep going on
8 Replies
15 days ago
What happened was we had SYN attacks to some workloads. You may have had Cloudflare in front of your app, however, non WAF workloads could have affected the availability. As such we have implemented short terms measures to prevent this moving forward. Since then we've expanded interconnect capacity with new peering and Direct Connect, deployed per-container eBPF firewalls, hardened our edge, and built continuous connectivity monitoring between hosts. We're also rolling out a WAF for all customers, edge-level protection against malicious request patterns across all plans.
Was this one recent?
Status changed to Awaiting User Response Railway • 15 days ago
15 days ago
Yes recent, almost happens daily since last week.
Status changed to Awaiting Railway Response Railway • 15 days ago
15 days ago
Then you should be good going forward, but please do let us know if you continue to see this, and we can dig in further!
Status changed to Awaiting User Response Railway • 15 days ago
15 days ago
Okay, let's check in again in a couple of days. Really like the platform, sad to see if the service quality is not optimal.
By the way this is the response time on last 7D, last occurence is ~5 hours ago. You can see sometimes it went up to 30s. I had to restart it to make it work again.
Status changed to Awaiting Railway Response Railway • 15 days ago
Status changed to Awaiting User Response Railway • 15 days ago
15 days ago
ummm...
anyway, how to make sure its on my app side or on railway side?
edit:
cpu/ram usage seems normal on my end
Attachments
Status changed to Awaiting Railway Response Railway • 15 days ago
15 days ago
Some form of APM and tracing is underrated and underutilized. Adding that to your app would help with so much.
Status changed to Awaiting User Response Railway • 15 days ago
14 days ago
okay i finished set up a monitoring and tracing. for now will resolve the thread. will report again if still happens.
thank you
Status changed to Awaiting Railway Response Railway • 14 days ago
Status changed to Solved razrinn • 14 days ago