Gradual increase in database latency
injung
PROOP

2 months ago

Hi,

We're currently experiencing a gradual increase in latency without any changes on our side, and wanted to check if there might be an issue affecting our host.

Starting around Apr 4th, both our database latency and API response times have been steadily increasing over time.

  • Database query times have gradually increased
  • API latency (p95) has risen accordingly
  • No code changes or traffic spikes were introduced during this period

From our monitoring:

  • The increase is gradual rather than a sudden spike
  • It affects multiple queries, not just a specific one
  • This suggests a broader database or host-level degradation rather than a query regression

Given past incidents related to host-level issues, we're wondering:

  • Is our database currently on a degraded or heavily loaded host?
  • Were there any underlying issues around this timeframe?
  • Would it be possible to move this workload to a less crowded host?

This is impacting production traffic, so any insight would be greatly appreciated.

Thanks in advance

Attachments

Solved

14 Replies

Status changed to Awaiting Railway Response Railway about 2 months ago


Hey, thanks for the detailed report. The gradual latency increase starting Apr 4th is consistent with a host-level issue rather than a query regression. We're checking which host your database is on and whether it's under elevated load. If so, we can migrate your workload to a healthier host. Will follow up shortly.


Status changed to Awaiting User Response Railway about 1 month ago


Confirmed, your database is on a host that experienced a significant load spike starting April 4th. System load jumped from ~25 to 95+ and has remained elevated since, with IO wait times and memory pressure both increasing. (Although we should be good now.)

This directly correlates with the latency increase you observed. Are you in a better spot now?


injung
PROOP

a month ago

Hi! It does seem to have improved since the peak on April 5th, but it hasn't fully recovered yet. That said, we're definitely in a better state now compared to April 5th.

We haven't had a chance to observe the impact of the changes you just made yet, but we will continue monitoring closely and will let you know if anything changes.

Really appreciate your support!

Attachments


Status changed to Awaiting Railway Response Railway about 1 month ago


a month ago

Hello. This should be resolved now. We were validating a config to speed up reads, but for specific instances it had a negative affect on writes

We've now rolled out a new config which should increase the speed of both reads AND writes (thus solving your issue)

Please let us know if that's not the case!


Status changed to Awaiting User Response Railway about 1 month ago


injung
PROOP

a month ago

Based on our observations, it has improved, though it hasn't fully returned to previous levels yet.

Attachments


Status changed to Awaiting Railway Response Railway about 1 month ago


injung
PROOP

a month ago

Hi, is this possibly degraded again?

We're seeing a very simple query taking over 7 seconds:

SELECT "sessions".*
FROM "sessions"
WHERE "sessions"."tenant" = $1 AND "sessions"."token" = $2
LIMIT $3

Sorry for repeatedly suspecting host performance, but we haven't been able to identify any significant issues on our side in terms of database queries.

Would appreciate it if you could take another look. Thanks for your help as always!

Attachments


Status changed to Solved injung about 1 month ago


injung
PROOP

a month ago

Hey, it's getting worse. Any updates on this?

Attachments


Status changed to Awaiting Railway Response Railway about 1 month ago


21 days ago

Hey how are you collecting those metrics? I've looked around the host and a few other metrics, not really seeing any major signs here.

Definitely don't want to be a blocker to you here though.


Status changed to Awaiting User Response Railway 21 days ago


injung
PROOP

21 days ago

I'm collecting these metrics from both Sentry and the Railway dashboard. They show similar trends, but the Railway dashboard is a bit harder to interpret, so I've attached the Sentry data instead.

As you can see, this reflects the P95 latency fluctuations we've been experiencing since Apr 17.

Attachments


Status changed to Awaiting Railway Response Railway 21 days ago


injung
PROOP

19 days ago

We're seeing the issue again.

There have still been no deploys or traffic changes on our side, but:

  • We experienced a period of timeouts for about 30 minutes
  • Overall latency has increased significantly — now roughly 3x slower than baseline
  • Database and API latency are both affected

This looks very similar to the previous incident and is happening again without any changes on our end.

Could you take another look at the host or underlying infrastructure?

This is now happening repeatedly and is impacting production, so we'd really appreciate urgent investigation.

Attachments


Anonymous
PRO

18 days ago

Same!


14 days ago

Apologies for the delay here.

We're aware of this and actively investigating. We'll update you as soon as we have more to share.

In the meantime, if you're experiencing impact, redeploying your service to a different region can help as an immediate workaround.


Status changed to Awaiting User Response Railway 14 days ago


injung
PROOP

13 days ago

It finally seems to be fixed. We didn't change anything on our side, but latency has returned to normal. I hope it stays this way.

Attachments


Status changed to Awaiting Railway Response Railway 13 days ago


codydearkland
EMPLOYEE

11 days ago

Glad to hear it's back to normal. We identified and addressed the root cause on the host your database was running on. Please don't hesitate to reopen this thread if the latency returns.


Status changed to Awaiting User Response Railway 11 days ago


Status changed to Solved codydearkland 11 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...