5 months ago
I have a FastAPI server, which runs with ~8 workers. I am running on maxed out 32 vCPUs and 32GB of RAM on Railway Instances in US East.
At peak I am expecting about ~300 request per second.
When I fire off requests with only 1 instance, all of my request are generally working, with the occasional dropped for network/502 but they work via a retry mechanism.
However when I scale up on railway to run multiple parallel instances, I get MANY 502's from Railway's edge network.
I read the docs at https://docs.railway.com/reference/errors/application-failed-to-respond, however I've guaranteed the port and ip. I don't think its an "Application Under Heavy Load" error either, since you'd expect each instance to have LESS load after adding horizontal scaling.
I checked usage logs, memory, cpu utilization, and network throughput. All of the numbers seems quite reasonable with our provisioned resources...
This leads me to believe there's an error with Railway's edge proxy network...
Error:
{"time": "2025-03-07T23:59:59.630028", "error": "Request failed with status 502 - JSON response: {'status': 'error', 'code': 502, 'message': 'Application failed to respond', 'request_id': '3TrIcMNnSHaQVqjQtwtNCg_2654280189'} | Status: 502 | Response Text: {\"status\":\"error\",\"code\":502,\"message\":\"Application failed to respond\",\"request_id\":\"3TrIcMNnSHaQVqjQtwtNCg_2654280189\"} | Response JSON: {'status': 'error', 'code': 502, 'message': 'Application failed to respond', 'request_id': '3TrIcMNnSHaQVqjQtwtNCg_2654280189'} | Headers: {'content-length': '120', 'content-type': 'application/json', 'server': 'railway-edge', 'x-railway-edge': 'railway/us-west1', 'x-railway-fallback': 'true', 'x-railway-request-id': '3TrIcMNnSHaQVqjQtwtNCg_2654280189', 'date': 'Sat, 08 Mar 2025 07:59:59 GMT'}"}
Additionally I communicate between two railway services, and have NEVER had 502 errors between services, but now experience regular network errors between the microservices sending messages to each other.
This is critically blocking me from serving one of my biggest customers.
Project ID: 7c37f583-3c7b-4fb6-8fc8-9b57b0eb3606
Service 1 ID: 1eee9966-3a1e-4f0b-9313-5737db82166d
Service 2 ID: affec715-fa3e-4e25-a6ea-d3aa8165a01e
1 Replies
5 months ago
Are you making requests between service using the edge proxy? Could you attempt to use the private network instead?
If you're still seeing it on the private network, then it would be an issue with the application code
If it disappears, please do let us know and we can look into it from the edge proxy level. This would be the only report we have so far of this
Status changed to Awaiting User Response Railway • 5 months ago