Stuck on db loading loop: "We are unable to connect to the database via SSH"

toby237

HOBBYOP

6 months ago

Hi. I have a n8n environment with 4 services, Worker, Primary, Redis and Postgres (from template https://railway.com/deploy/n8n-with-workers).

My Worker suddenly crashed after updating image, and it seems that the logs are pointing to it being a Postgres/db issue.

I wanted to test a bit and revert changes if it didn't solve it, but my first change caused the issue I'm now having, where it just loads on the database connection part in the Postgres service. I tried to add public address and use "psql" too, but that also times out now.

The change I did was to use the GUI in Postgres service and entered the "workflow_entity" table and set a row's "versionId" to null or empty string.

Immediately after pressing "save" it got stuck in this loop that seems to be impossible to get out of, I've tried restart, redeploy (redeploy fails) and log out and in again

Should I expect to be able to connect to the database again after doing this, or was that a "dangerous" change and it might not be a fix and I must delete my database/volume?

$10 Bounty

3 Replies

Railway

BOT

6 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

marxsen4you-svg

HOBBY

6 months ago

🧨WHAT HAPPENED (The Real Issue)

When you edited workflow_entity.versionId manually in the Railway Postgres GUI, you did something n8n does not allow at the DB level:

`versionId` is a NOT NULL + foreign key–dependent field

In n8n:

workflow_entity.versionId is required
It references workflow_version.id
It is used during boot to migrate, load workflows, and lock versions
Setting it to NULL or empty string means…

🚨 n8n boot MIGRATIONS now crash mid-transaction

When the Worker/Primary starts, n8n loads all workflows:

SELECT ... FROM workflow_entity w
JOIN workflow_version v ON w.versionId = v.id

But since a row has versionId = null, Postgres throws:

ERROR:  null value in column "versionId" violates not-null constraint

Depending on where the failure occurred, n8n leaves a transaction open or keeps retrying.

🧨WHY THE POSTGRES SERVICE NOW HANGS

Railway Postgres logs show the “Database Connection” spinner forever because:

❗ n8n is repeatedly attempting migrations → locking system catalog

This causes:

Table locks
Long-running transactions
Any new connection (psql, Railway GUI) = timeout

🧨**GOOD NEWS: You probably do NOT need to delete your database**

Because…

🔥 Railway almost never becomes unrecoverable

99% of these cases are fixed by killing the bad query and correcting the corrupted row.

THE FIX (safe, no data loss)

You need to:

Stop the Worker + Primary
Kill stuck connections
Connect to Postgres
Restore the broken row’s versionId

STEP 1 — Stop services

In Railway:

Stop:

Worker
Primary (main n8n)

Leave only:

Postgres
Redis

This prevents n8n from instantly reconnecting and re-locking the DB.

STEP 2 — Kill all active pids in Postgres

Railway → Postgres → Shell → run:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = current_database()
AND pid <> pg_backend_pid();

If GUI is frozen, run via psql:

psql "postgresql://USER:PASSWORD@HOST:PORT/railway"

Then run the terminate query again.

STEP 3 — Fix the broken row

Find the corrupted workflow:

SELECT id, name, versionId FROM workflow_entity WHERE versionId IS NULL OR versionId = '';

You should get 1 row—the one you changed.

Now find the latest version:

SELECT id FROM workflow_version WHERE workflowId = '<workflow-id>' ORDER BY createdAt DESC LIMIT 1;

Then restore it:

UPDATE workflow_entity
SET "versionId" = '<the version id you found>'
WHERE id = '<workflow id>';

STEP 4 — Verify the schema

Check:

\d workflow_entity

Confirm from output:

versionId UUID NOT NULL

If it's NOT NULL, you MUST set a value again (n8n requires it).

STEP 5 — Restart n8n Primary (only)

If it boots normally, restart Worker after.

🟢If this works → your DB is safe

🔴If all connections still hang after terminating backends

You have 2 possibilities:

Option A — There's still an open transaction from n8n (most likely)

Run:

SELECT * FROM pg_stat_activity;

Look for long-running queries like:

idle in transaction

Kill them with:

SELECT pg_terminate_backend(<pid>);

Keep terminating until none remain.

Option B — Corrupt row causes boot-loop

If you cannot restore the versionId because you don't know it:

You can copy version from another workflow version:

UPDATE workflow_entity SET "versionId" = (
  SELECT id FROM workflow_version WHERE workflowId = workflow_entity.id LIMIT 1
)
WHERE workflow_entity.versionId IS NULL OR workflow_entity.versionId = '';

🧨 MUST I DELETE THE DATABASE? (Rare)

Only if ALL of these are true:

You cannot connect
Terminating backends fails
Even psql times out when services are stopped
Railway DB logs show corrupted Postgres cluster, not locked tables

From your description, this is only a lock, not corruption.

You should NOT need to delete your DB.

ramfat

HOBBY

6 months ago

Hi, I would ignore the obvious bot response before mine. Here's what I think is going on as I'm having the exact same issue (not with N8n though). Internal networking seems to not work at all in my new project. Using public Host & Port / URL works everytime. So far i've tried with a Mysql DB, Redis and MinIO service..

Welcome!

🧨WHAT HAPPENED (The Real Issue)

versionId is a NOT NULL + foreign key–dependent field

🚨 n8n boot MIGRATIONS now crash mid-transaction

🧨WHY THE POSTGRES SERVICE NOW HANGS

❗ n8n is repeatedly attempting migrations → locking system catalog

🧨GOOD NEWS: You probably do NOT need to delete your database

🔥 Railway almost never becomes unrecoverable

THE FIX (safe, no data loss)

STEP 1 — Stop services

STEP 2 — Kill all active pids in Postgres

STEP 3 — Fix the broken row

STEP 4 — Verify the schema

STEP 5 — Restart n8n Primary (only)

🟢If this works → your DB is safe

🔴If all connections still hang after terminating backends

Option A — There's still an open transaction from n8n (most likely)

Option B — Corrupt row causes boot-loop

🧨 MUST I DELETE THE DATABASE? (Rare)

`versionId` is a NOT NULL + foreign key–dependent field

🧨**GOOD NEWS: You probably do NOT need to delete your database**