2 months ago
Problem Description
Database checkpoints are taking excessively long (80-270 seconds instead of the expected 10-15 seconds), which is blocking queries and causing the n8n worker to timeout after 10 seconds when attempting to connect.
Root Cause Analysis
Based on PostgreSQL logs, we've identified the following potential issues:
- Shared Storage Bottleneck – Railway's shared storage infrastructure may be causing I/O contention
- IOPS Throttling – The current plan may have insufficient I/O operations per second (IOPS) allocated
- Database Growth – Excessive data writes are overwhelming the disk subsystem
- Disk Space Constraints – Limited available disk space may be degrading PostgreSQL performance
Evidence
- Checkpoint at 11:50-11:55 took 269.9 seconds (wrote 12,438 buffers / 75.9%)
- Checkpoint at 12:10-12:13 took 143.4 seconds (wrote 1,418 buffers / 8.7%)
- Checkpoint at 13:05-13:07 took 84.3 seconds (wrote 196 buffers / 1.2%)
Requested Actions
- Review current IOPS allocation and disk I/O performance metrics
- Confirm available disk space on the PostgreSQL instance
- Assess if plan upgrade is necessary for increased I/O capacity
- Provide recommendations for optimization or resource scaling
4 Replies
2 months ago
By default n8n writes the state of every node during execution to the database. To drastically reduce database writes, you can apply these env variables to your n8n deployment:
EXECUTIONS_DATA_SAVE_ON_SUCCESS=none only save failed executions for debugging
EXECUTIONS_DATA_SAVE_ON_PROGRESS=false stops n8n from writing to the DB after every single node completes
EXECUTIONS_DATA_PRUNE=true deletes finished executions along with their execution data and binary data
EXECUTIONS_DATA_MAX_AGE=24 keep data for a maximum of 24 hours, or less if possible
You can read more about this here: https://docs.n8n.io/hosting/scaling/execution-data/
2 months ago
Thanks! ill give it a try
2 months ago
I tried , did those commands, and moved my railway to us east. Nothing seems to improve and fix those errors.
20 hours ago
Hey, apologies — this thread should have gotten a response much sooner. We've identified the root cause: your Postgres instance is on a storage host with degraded I/O performance, which is what's been causing the extreme checkpoint times and worker timeouts.
We've staged a migration to a healthier host. To apply it, trigger a redeploy on the Postgres service at your convenience — the volume will migrate with it. There will be brief downtime during the migration (typically a few minutes).
Once you're on the new host, checkpoint times should return to normal.