Postgres unclean shutdown + recurring crashes on Legacy region (n8n)
pashahoholko
HOBBYOP

21 days ago

Hi Railway Support,

I'm experiencing recurring Postgres crashes causing "Database is not ready!" errors in my n8n instance. All my services are already on US West (California, USA) — Metal region.

Project: shimmering-consideration

Environment: production

Affected service: Postgres (deployment cd9443cf)

— Recurring issues (last few weeks) —

• Frequent "Database connection timed out" errors

• Collation version mismatch warnings: 2.36 vs 2.41

(I ran ALTER DATABASE railway REFRESH COLLATION VERSION; — didn't resolve it)

• n8n crashing with "Database is not ready!" every few minutes, self-recovering in 30–50 seconds

— Today's incident (May 1, ~10:37 UTC) —

Postgres Deploy Logs showed:

• "database system was interrupted; last known up at 2026-05-01 10:37:39 UTC"

• "the database system was not properly shut down; automatic recovery in progress"

• "invalid record length at 1/77E38AF8: expected at least 24, got 0"

• Recovery completed at ~10:47:50 UTC (~10 minutes of downtime)

No changes were made on my end. All services (Postgres, Redis, Primary, Worker) are on US West (California, USA) Metal region.

My questions:

1. Was there any infrastructure maintenance or forced restart around 10:37 UTC today?

2. What is causing the recurring unclean shutdowns on Metal region?

3. What can I do to prevent this from happening again?

Thank you

$10 Bounty

1 Replies

Status changed to Open Railway 21 days ago


mrhopper199345
HOBBY

3 days ago

Hey Pasha,

  1. Was there any infrastructure maintenance or forced restart? Only Railway Support can confirm specific host maintenance on the Metal region at that exact minute, these sudden, unannounced restarts are almost always OOM (Out of Memory) kills. When a container exceeds its allocated RAM limit, Railway’s infrastructure forcefully terminates it without warning to protect the host node.

  2. What is causing the recurring unclean shutdowns? The logs point to resource exhaustion causing forced terminations (SIGKILL), rather than a Postgres bug or data corruption. Here is exactly what those logs mean:"Database system was interrupted" & "not properly shut down": The Postgres container was killed instantly, bypassing the normal, graceful shutdown process."Invalid record length... got 0": This looks alarming but is actually a red herring. It is a standard Postgres message that appears at the end of crash recovery. It simply means Postgres reached the end of the Write-Ahead Log (WAL) and successfully replayed all available data up to the crash point. Collation mismatch: This is a warning caused by the underlying Linux image updating its C library (glibc) version during a redeploy. Running the refresh command was the right move to clear the warning, but this mismatch does not cause crashes.The root cause is highly likely related to n8n database bloat. By default, n8n stores massive amounts of execution data. If it isn't aggressively pruned, it causes Postgres to consume all available memory and CPU until Railway kills it. The fact that recovery took ~10 minutes means Postgres had a massive amount of uncheckpointed data to process when it came back up.

  3. What can I do to prevent this from happening again? To stabilize the database, I recommend taking these steps:Check the Metrics: Look at the Memory usage graph for the Postgres service in your Railway dashboard leading up to 10:37 UTC. If it flatlines at the top of your plan limit right before the crash, it was definitely an OOM kill.Enable n8n Execution Pruning: If you haven't already, add these environment variables to your n8n service to stop the database from endlessly growing:EXECUTIONS_DATA_PRUNE=trueEXECUTIONS_DATA_MAX_AGE=168 (keeps only 7 days of logs; adjust as needed)DB_POSTGRESDB_VACUUM_ON_STARTUP=true (helps reclaim space on restart, though it will make the initial boot slightly slower)Scale Resources: If pruning is already enabled and functioning, you may simply need to allocate more RAM to the Postgres service in your Railway project settings to handle the volume of workflows you are currently running.

Hope this helps!


Welcome!

Sign in to your Railway account to join the conversation.

Loading...