Random intermittent long running database operations

rcgeorge23

PROOP

4 months ago

Hi railway support,

In my application I have noticed that I see occasional requests that take 30+ seconds to complete. These long running requests always involve database operations. I use railway postgres for persistent storage.

We do not have a significant amount of data in postgres - we find that long running database operations are random. A query that executed in a few milliseconds on one occasion will take 30 seconds on another.

I suspect this is an issue with stale connections but it is difficult to prove. I wonder if you have any advice / recommendations for db connection pool tuning settings that have been shown to work well on railway.

For context, the application is a java / spring boot application. We use Hikari for connection pooling. We currently have the following hikari configuration:

hikari:
  maximum-pool-size: 8
  minimum-idle: 2
  connection-timeout: 5000
  idle-timeout: 600000
  max-lifetime: 1800000 # 30 mins
  leak-detection-threshold: 300000 # 5 mins
  keepalive-time: 60000 # 60 seconds
  validation-timeout: 2000

Thanks

Richard

Solved$10 Bounty

Pinned Solution

domehane

FREE

4 months ago

Hello,

i suggest to you two things worth trying:

first, hikaricp docs actually recommend setting minimum-idle equal to maximum-pool-size so all connections stay warm. right now you have minimum-idle at 2 and max-pool-size at 8, meaning 6 connections are cold and created on demand. set minimum-idle to 8 and keepalive will then apply to all connections not just 2

second, worth ruling out postgres lock contention. next time you see a slow request run this on your db: select pid, state, wait_event_type, wait_event, query_start, query from pg_stat_activity where state != 'idle' order by query_start and check if anything shows wait_event_type = 'lock' or state = 'idle in transaction'. if yes the issue is locking not connections. you can guard against that by adding this to hikari config: connection-init-sql: "SET idle_in_transaction_session_timeout = '10000'"

i hope this help you :)

4 Replies

chandrika

EMPLOYEE

4 months ago

Your app and Postgres are both in the same project communicating over private networking, and we see no dropped packets or connectivity issues on the platform side. That rules out several common causes like cross-region latency or public network overhead.

For diagnosing the intermittent spikes, our HTTP logs support filtering by duration, so you can use @totalDuration:>1000 or @upstreamRqDuration:>1000 in your service's log filters to isolate the slow requests and correlate them with your database operations. If upstreamRqDuration is consistently high on those requests, the delay is happening within your application or database rather than in the network path. More detail on that is in our troubleshooting slow applications guide.

Status changed to Awaiting User Response Railway • 4 months ago

chandrika

EMPLOYEE

4 months ago

HikariCP tuning is outside our first-party support scope, but we do have an active community that may be able to help with that specific configuration. I'm going to bounty this question so our community can help you out further

domehane

FREE

4 months ago

Hello,

i suggest to you two things worth trying:

first, hikaricp docs actually recommend setting minimum-idle equal to maximum-pool-size so all connections stay warm. right now you have minimum-idle at 2 and max-pool-size at 8, meaning 6 connections are cold and created on demand. set minimum-idle to 8 and keepalive will then apply to all connections not just 2

second, worth ruling out postgres lock contention. next time you see a slow request run this on your db: select pid, state, wait_event_type, wait_event, query_start, query from pg_stat_activity where state != 'idle' order by query_start and check if anything shows wait_event_type = 'lock' or state = 'idle in transaction'. if yes the issue is locking not connections. you can guard against that by adding this to hikari config: connection-init-sql: "SET idle_in_transaction_session_timeout = '10000'"

i hope this help you :)

rcgeorge23

PROOP

4 months ago

Many thanks domehane and chandrika - I will give these suggestions a try