2 months ago
Hi,
We're currently experiencing a gradual increase in latency without any changes on our side, and wanted to check if there might be an issue affecting our host.
Starting around Apr 4th, both our database latency and API response times have been steadily increasing over time.
- Database query times have gradually increased
- API latency (p95) has risen accordingly
- No code changes or traffic spikes were introduced during this period
From our monitoring:
- The increase is gradual rather than a sudden spike
- It affects multiple queries, not just a specific one
- This suggests a broader database or host-level degradation rather than a query regression
Given past incidents related to host-level issues, we're wondering:
- Is our database currently on a degraded or heavily loaded host?
- Were there any underlying issues around this timeframe?
- Would it be possible to move this workload to a less crowded host?
This is impacting production traffic, so any insight would be greatly appreciated.
Thanks in advance
Attachments
14 Replies
Status changed to Awaiting Railway Response Railway • about 2 months ago
a month ago
Hey, thanks for the detailed report. The gradual latency increase starting Apr 4th is consistent with a host-level issue rather than a query regression. We're checking which host your database is on and whether it's under elevated load. If so, we can migrate your workload to a healthier host. Will follow up shortly.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Confirmed, your database is on a host that experienced a significant load spike starting April 4th. System load jumped from ~25 to 95+ and has remained elevated since, with IO wait times and memory pressure both increasing. (Although we should be good now.)
This directly correlates with the latency increase you observed. Are you in a better spot now?
a month ago
Hi! It does seem to have improved since the peak on April 5th, but it hasn't fully recovered yet. That said, we're definitely in a better state now compared to April 5th.
We haven't had a chance to observe the impact of the changes you just made yet, but we will continue monitoring closely and will let you know if anything changes.
Really appreciate your support!
Attachments
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
Hello. This should be resolved now. We were validating a config to speed up reads, but for specific instances it had a negative affect on writes
We've now rolled out a new config which should increase the speed of both reads AND writes (thus solving your issue)
Please let us know if that's not the case!
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Based on our observations, it has improved, though it hasn't fully returned to previous levels yet.
Attachments
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
Hi, is this possibly degraded again?
We're seeing a very simple query taking over 7 seconds:
SELECT "sessions".*
FROM "sessions"
WHERE "sessions"."tenant" = $1 AND "sessions"."token" = $2
LIMIT $3Sorry for repeatedly suspecting host performance, but we haven't been able to identify any significant issues on our side in terms of database queries.
Would appreciate it if you could take another look. Thanks for your help as always!
Attachments
Status changed to Solved injung • about 1 month ago
Status changed to Awaiting Railway Response Railway • about 1 month ago
21 days ago
Hey how are you collecting those metrics? I've looked around the host and a few other metrics, not really seeing any major signs here.
Definitely don't want to be a blocker to you here though.
Status changed to Awaiting User Response Railway • 21 days ago
21 days ago
I'm collecting these metrics from both Sentry and the Railway dashboard. They show similar trends, but the Railway dashboard is a bit harder to interpret, so I've attached the Sentry data instead.
As you can see, this reflects the P95 latency fluctuations we've been experiencing since Apr 17.
Attachments
Status changed to Awaiting Railway Response Railway • 21 days ago
19 days ago
We're seeing the issue again.
There have still been no deploys or traffic changes on our side, but:
- We experienced a period of timeouts for about 30 minutes
- Overall latency has increased significantly — now roughly 3x slower than baseline
- Database and API latency are both affected
This looks very similar to the previous incident and is happening again without any changes on our end.
Could you take another look at the host or underlying infrastructure?
This is now happening repeatedly and is impacting production, so we'd really appreciate urgent investigation.
Attachments
18 days ago
Same!
14 days ago
Apologies for the delay here.
We're aware of this and actively investigating. We'll update you as soon as we have more to share.
In the meantime, if you're experiencing impact, redeploying your service to a different region can help as an immediate workaround.
Status changed to Awaiting User Response Railway • 14 days ago
13 days ago
It finally seems to be fixed. We didn't change anything on our side, but latency has returned to normal. I hope it stays this way.
Attachments
Status changed to Awaiting Railway Response Railway • 13 days ago
11 days ago
Glad to hear it's back to normal. We identified and addressed the root cause on the host your database was running on. Please don't hesitate to reopen this thread if the latency returns.
Status changed to Awaiting User Response Railway • 11 days ago
Status changed to Solved codydearkland • 11 days ago