I/O Latency: Severe Database Performance Issues and 'Extend' Wait Events in EU West
tim-nocode
PROOP

2 months ago

Hi Railway Support Team,

I am reaching out regarding a critical performance degradation of my Postgres instance in the EU West (Amsterdam) region. Despite having a 50GB Volume and optimized configuration, the database is experiencing severe I/O bottlenecks that are paralyzing my production environment (n8n).

Technical Evidence of Infrastructure Issues:

  1. Extreme 'Extend' Wait Events: Our pg_stat_activity monitoring shows that simple INSERT operations on the execution_metadata table are frequently stuck in an 'extend' wait state for over 4 minutes (260+ seconds). This indicates a massive delay in the storage layer when the database attempts to allocate new blocks.
  2. Abnormal Checkpoint Durations: Logs indicate that a relatively small checkpoint (writing only ~48MB of buffers) took 263.4 seconds to complete. This I/O throughput is far below expected performance levels for an SSD-backed volume.
  3. High WALWrite Latency: We observed COMMIT operations and WALWrite events taking upwards of 1 second, even after setting synchronous_commit = off.

Environment Details:

  • Service: Postgres
  • Region: EU West (Amsterdam)
  • Volume Size: 50GB (range-volume)
  • Current Config: We have already tuned shared_buffers (8GB), max_connections (300), and adjusted checkpoint intervals, but the underlying I/O latency persists.

Impact: My n8n Primary and Worker instances are constantly failing with "Database connection timed out" because the database becomes unresponsive during these I/O spikes.

Could you please investigate the health of the underlying storage node or the physical host where this instance is located? It appears to be a "noisy neighbor" issue or a degradation of the EBS/Volume performance in this specific zone.

Solved

1 Replies

Status changed to Awaiting Railway Response Railway about 2 months ago


Sorry for the late reply

Your Postgres volume is spec'd at 3,000 read/write IOPS, so a 48MB checkpoint taking 263 seconds (~0.18 MB/s throughput) is far below expected performance and indicates a storage-layer issue on our side.

You can migrate the DB to a different region and migrate it back to move it to a uncontended host.


Status changed to Awaiting User Response Railway about 2 months ago


Railway
BOT

2 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway about 2 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...