Our PostgreSQL database is experiencing severe I/O performance degradation
critter-rafael
PROOP

2 months ago

Problem Description

Database checkpoints are taking excessively long (80-270 seconds instead of the expected 10-15 seconds), which is blocking queries and causing the n8n worker to timeout after 10 seconds when attempting to connect.

Root Cause Analysis

Based on PostgreSQL logs, we've identified the following potential issues:

  1. Shared Storage Bottleneck – Railway's shared storage infrastructure may be causing I/O contention
  2. IOPS Throttling – The current plan may have insufficient I/O operations per second (IOPS) allocated
  3. Database Growth – Excessive data writes are overwhelming the disk subsystem
  4. Disk Space Constraints – Limited available disk space may be degrading PostgreSQL performance

Evidence

  • Checkpoint at 11:50-11:55 took 269.9 seconds (wrote 12,438 buffers / 75.9%)
  • Checkpoint at 12:10-12:13 took 143.4 seconds (wrote 1,418 buffers / 8.7%)
  • Checkpoint at 13:05-13:07 took 84.3 seconds (wrote 196 buffers / 1.2%)

Requested Actions

  1. Review current IOPS allocation and disk I/O performance metrics
  2. Confirm available disk space on the PostgreSQL instance
  3. Assess if plan upgrade is necessary for increased I/O capacity
  4. Provide recommendations for optimization or resource scaling
$20 Bounty

4 Replies

By default n8n writes the state of every node during execution to the database. To drastically reduce database writes, you can apply these env variables to your n8n deployment:

EXECUTIONS_DATA_SAVE_ON_SUCCESS=none only save failed executions for debugging

EXECUTIONS_DATA_SAVE_ON_PROGRESS=false stops n8n from writing to the DB after every single node completes

EXECUTIONS_DATA_PRUNE=true deletes finished executions along with their execution data and binary data

EXECUTIONS_DATA_MAX_AGE=24 keep data for a maximum of 24 hours, or less if possible

You can read more about this here: https://docs.n8n.io/hosting/scaling/execution-data/


critter-rafael
PROOP

2 months ago

Thanks! ill give it a try


critter-rafael
PROOP

2 months ago

I tried , did those commands, and moved my railway to us east. Nothing seems to improve and fix those errors.


chandrika
EMPLOYEE

20 hours ago

Hey, apologies — this thread should have gotten a response much sooner. We've identified the root cause: your Postgres instance is on a storage host with degraded I/O performance, which is what's been causing the extreme checkpoint times and worker timeouts.

We've staged a migration to a healthier host. To apply it, trigger a redeploy on the Postgres service at your convenience — the volume will migrate with it. There will be brief downtime during the migration (typically a few minutes).

Once you're on the new host, checkpoint times should return to normal.


Welcome!

Sign in to your Railway account to join the conversation.

Loading...