Stuck on db loading loop: "We are unable to connect to the database via SSH"
toby237
HOBBYOP

3 months ago

Hi. I have a n8n environment with 4 services, Worker, Primary, Redis and Postgres (from template https://railway.com/deploy/n8n-with-workers).

My Worker suddenly crashed after updating image, and it seems that the logs are pointing to it being a Postgres/db issue.

I wanted to test a bit and revert changes if it didn't solve it, but my first change caused the issue I'm now having, where it just loads on the database connection part in the Postgres service. I tried to add public address and use "psql" too, but that also times out now.

The change I did was to use the GUI in Postgres service and entered the "workflow_entity" table and set a row's "versionId" to null or empty string.

Immediately after pressing "save" it got stuck in this loop that seems to be impossible to get out of, I've tried restart, redeploy (redeploy fails) and log out and in again

Should I expect to be able to connect to the database again after doing this, or was that a "dangerous" change and it might not be a fix and I must delete my database/volume?

$10 Bounty

3 Replies

Railway
BOT

3 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


marxsen4you-svg
HOBBY

3 months ago

firecracker emojiWHAT HAPPENED (The Real Issue)

When you edited workflow_entity.versionId manually in the Railway Postgres GUI, you did something n8n does not allow at the DB level:

versionId is a NOT NULL + foreign key–dependent field

In n8n:

  • workflow_entity.versionId is required

  • It references workflow_version.id

  • It is used during boot to migrate, load workflows, and lock versions

  • Setting it to NULL or empty string means…

rotating_light emoji n8n boot MIGRATIONS now crash mid-transaction

When the Worker/Primary starts, n8n loads all workflows:

SELECT ... FROM workflow_entity w
JOIN workflow_version v ON w.versionId = v.id

But since a row has versionId = null, Postgres throws:

ERROR:  null value in column "versionId" violates not-null constraint

Depending on where the failure occurred, n8n leaves a transaction open or keeps retrying.

firecracker emojiWHY THE POSTGRES SERVICE NOW HANGS

Railway Postgres logs show the “Database Connection” spinner forever because:

exclamation emoji n8n is repeatedly attempting migrations → locking system catalog

This causes:

  • Table locks

  • Long-running transactions

  • Any new connection (psql, Railway GUI) = timeout

firecracker emojiGOOD NEWS: You probably do NOT need to delete your database

Because…

fire emoji Railway almost never becomes unrecoverable

99% of these cases are fixed by killing the bad query and correcting the corrupted row.

white_check_mark emojiTHE FIX (safe, no data loss)

You need to:

  1. Stop the Worker + Primary

  2. Kill stuck connections

  3. Connect to Postgres

  4. Restore the broken row’s versionId

STEP 1 — Stop services

In Railway:

Stop:

  • x emoji Worker

  • x emoji Primary (main n8n)

Leave only:

  • white_check_mark emoji Postgres

  • white_check_mark emoji Redis

This prevents n8n from instantly reconnecting and re-locking the DB.

STEP 2 — Kill all active pids in Postgres

Railway → Postgres → Shell → run:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = current_database()
AND pid <> pg_backend_pid();

If GUI is frozen, run via psql:

psql "postgresql://USER:PASSWORD@HOST:PORT/railway"

Then run the terminate query again.

STEP 3 — Fix the broken row

Find the corrupted workflow:

SELECT id, name, versionId FROM workflow_entity WHERE versionId IS NULL OR versionId = '';

You should get 1 row—the one you changed.

Now find the latest version:

SELECT id FROM workflow_version WHERE workflowId = '<workflow-id>' ORDER BY createdAt DESC LIMIT 1;

Then restore it:

UPDATE workflow_entity
SET "versionId" = '<the version id you found>'
WHERE id = '<workflow id>';

STEP 4 — Verify the schema

Check:

\d workflow_entity

Confirm from output:

versionId UUID NOT NULL

If it's NOT NULL, you MUST set a value again (n8n requires it).

STEP 5 — Restart n8n Primary (only)

If it boots normally, restart Worker after.

green_circle emojiIf this works → your DB is safe

red_circle emojiIf all connections still hang after terminating backends

You have 2 possibilities:

Option A — There's still an open transaction from n8n (most likely)

Run:

SELECT * FROM pg_stat_activity;

Look for long-running queries like:

idle in transaction

Kill them with:

SELECT pg_terminate_backend(<pid>);

Keep terminating until none remain.

Option B — Corrupt row causes boot-loop

If you cannot restore the versionId because you don't know it:

You can copy version from another workflow version:

UPDATE workflow_entity SET "versionId" = (
  SELECT id FROM workflow_version WHERE workflowId = workflow_entity.id LIMIT 1
)
WHERE workflow_entity.versionId IS NULL OR workflow_entity.versionId = '';

🧨 MUST I DELETE THE DATABASE? (Rare)

Only if ALL of these are true:

  • You cannot connect

  • Terminating backends fails

  • Even psql times out when services are stopped

  • Railway DB logs show corrupted Postgres cluster, not locked tables

From your description, this is only a lock, not corruption.

You should NOT need to delete your DB.


ramfat
HOBBY

3 months ago

Hi, I would ignore the obvious bot response before mine. Here's what I think is going on as I'm having the exact same issue (not with N8n though). Internal networking seems to not work at all in my new project. Using public Host & Port / URL works everytime. So far i've tried with a Mysql DB, Redis and MinIO service..


Loading...