9 days ago
We're seeing a severe latency spike in the Singapore region.
There have been no deployments, config changes, or traffic pattern changes on our side, but response times have increased by more than 10x compared to normal. p95/p99 latency is now spiking into multi-second / tens-of-seconds range.
Railway's status page still shows everything as normal, but this looks like an incident or host-level degradation to us.
Could the Railway team please investigate and resolve this as soon as possible?
Attachments
12 Replies
THIS IS A MAJOR INCIDENT. We’re seeing severe latency spikes every 4 minutes, with no traffic changes on our side. This strongly looks like a host-level issue.
We tried scaling out replicas, but it did not help.
This exact repeated spike pattern has happened before (Railway incident).
Could someone from the Railway team please look into this urgently? This looks like host-level or platform-level degradation, and we are losing tons of money right now because of it.
Attachments
9 days ago
Do you have performance logging in place to isolate exactly where the latency is originating from? Would be helpful for isolating the cause
At the moment, it looks DB-related from our side. The latency is showing up mostly inside ActiveRecord / PostgreSQL queries, including normally fast indexed lookups. We're not seeing any corresponding deployments, config changes, or traffic changes that would explain this.
Also, scaling out app replicas did not improve the issue, which makes us suspect the bottleneck is not in the Rails app layer itself, but somewhere around the database or underlying host/network path.
We're still investigating, but based on what we can see so far, the slowdown appears to be happening around DB access.
9 days ago
thanks for sharing, this is very useful information, regarding the PostgreSQL database - did you deploy the Railway provided one (Right Click -> Database -> PostgreSQL) or is this a custom one / template deployment?
9 days ago
Thank you, I'm asking some more people about this 🙏
9 days ago
Same in europe, we didn't change anything, metrics are the same
9 days ago
Same issue here, my services were not redeployed nor modified. Now a request that was taking 1-2 seconds before takes from 7 to 30+ seconds
9 days ago
with the additional reports, I'll escalate this to the team, thanks for flagging guys!
9 days ago
This thread has been escalated to the Railway team.
Status changed to Awaiting Railway Response dev • 9 days ago
9 days ago
This thread is a duplicate of Repeated incidents, host level performance variability.. is anyone else experiencing this?. Closing this one.
Status changed to Duplicate Railway • 9 days ago