18 days ago
Our apps are having a high response time since today 7:54 GMT-3.
Nothing change on our side.
1 Replies
Status changed to Open Railway • 18 days ago
18 days ago
We experienced high response times on our backend, but since nothing had changed on our side, no deploys, no traffic spikes, we started suspecting the database.
After investigating, we diagnosed that the slowness was coming from I/O latency on Railway's storage layer.
We made several tuning changes that significantly improved the situation, but didn't fully eliminate the issue:
- Increased shared_buffers from 128MB to 6GB
- Increased checkpoint_timeout from 5min to 15min
- Set synchronous_commit = off (This prevents commits from waiting for WAL fsync. The tradeoff is up to ~200ms of data loss on a hard crash, no corruption risk. This gave us immediate improvement on commit latency. )
These changes reduced the severity and frequency of the spikes, but the underlying storage limitation on Railway's side is still there.
1. Confirm the root cause — run this during a slow period:
SELECT pid, now() - query_start AS duration, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
If you see wait_event = WALSync or WALWrite, the bottleneck is WAL I/O.
2. Quick wins with no restart required:
ALTER SYSTEM SET checkpoint_timeout = '15min';
ALTER SYSTEM SET synchronous_commit = off;
SELECT pg_reload_conf();
3. Requires restart — but high impact:
ALTER SYSTEM SET shared_buffers = '2GB'; -- adjust based on your plan RAM
What does your pg_stat_activity show during the slowdowns?
Status changed to Solved abreu • 17 days ago