Healthcheck Failure in Redeploy

vinibr

HOBBYOP

5 months ago

I'm trying for several days to do a redeploy in n8n, but, I'm always getting the error:

Initialization - OK

Deploy - OK

Network > Healthcheck - FALHOU (após 4min53s)

Post-deploy - Não iniciou

May someone help me?

Solved$10 Bounty

24 Replies

Railway

BOT

5 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

Railway

Hey there! We've found the following might help you get unblocked faster: - [🧵 Healthcheck Failure!!](https://station.railway.com/questions/healthcheck-failure-ecac6092) - [🧵 Unable to redeploy n8n - failing healthcheck](https://station.railway.com/questions/unable-to-redeploy-n8n-failing-healthc-c7997df8) - [🧵 Eternal Healthcheck failure](https://station.railway.com/questions/healthcheck-failure-00d29c2b) If you find the answer from one of these, please let us know by solving the thread!

vinibr

HOBBYOP

5 months ago

This, does not help.

vinibr

This, does not help.

bbhoss

FREE

5 months ago

Set a variable N8N_PORT to whatever port you defined in the service settings under "Networking -> Public Networking". You can remove the healthcheck or verify it's /healthz . Likely just removing it will not help because it seems your service is misconfigured.

vinibr

HOBBYOP

5 months ago

Thank you! The variable N8N_PORT wasn't there. I have created it, set it and redeployed, but, it didn't work.

ilyassbreth

FREE

5 months ago

i think the healthcheck is failing because n8n's /healthz endpoint is disabled by default. try adding this environment variable to your primary service:

N8N_METRICS=true

this should enable the healthcheck endpoint. also double-check that your healthcheck path in railway settings (under networking) is set to /healthz

if that doesn't work, there might be a port mismatch, make sure the PORT variable matches what n8n is actually listening on. you can check the deploy logs to see what port n8n binds to

let me know if this helps

vinibr

HOBBYOP

5 months ago

Thank you, ilyassbreath! It didn't work. Added the variable, tried the redeploy, but with no successful deploy. In the logs hasn't the port number.

ilyassbreth

FREE

5 months ago

thanks for the update! if the logs aren't showing a port number, n8n probably isn't starting up properly. this is usually a database connection issue

can you share your deploy logs? just go to the primary service → deployments → click the latest one → and copy/paste what you see in the logs here

also, can you check these variables in your primary service:

do you have DB_TYPE set to postgresdb?
what's your DB_POSTGRESDB_HOST set to? (it should be something like postgres.railway.internal or the name of your postgres service)
do you have all the postgres connection variables? (DB_POSTGRESDB_DATABASE, DB_POSTGRESDB_USER, DB_POSTGRESDB_PASSWORD, DB_POSTGRESDB_PORT)

the healthcheck is probably failing because n8n can't connect to the database and never actually starts up. once we fix the database connection, the healthcheck should work

vinibr

HOBBYOP

5 months ago

LOGs:

You reached the start of the range

Jan 9, 2026, 10:05 PM

Starting Container

Initializing n8n process

There was an error initializing DB

Could not establish database connection within the configured timeout of 120,000 ms. Please ensure the database is configured correctly and the server is reachable. You can increase the timeout by setting the 'DB_POSTGRESDB_CONNECTION_TIMEOUT' environment variable.

Error: Could not establish database connection within the configured timeout of 120,000 ms. Please ensure the database is configured correctly and the server is reachable. You can increase the timeout by setting the 'DB_POSTGRESDB_CONNECTION_TIMEOUT' environment variable.

at DbConnection.init (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+db@file+packages+@n8n+db_@opentelemetry+api@1.9.0_@opentelemetry+sdk-trace-base@1._ab22bba05a964211b9fe14bf4b841570/node_modules/@n8n/db/src/connection/db-connection.ts:58:13)

at processTicksAndRejections (node:internal/process/task_queues:105:5)

at Start.init (/usr/local/lib/node_modules/n8n/src/commands/base-command.ts:104:3)

at Start.init (/usr/local/lib/node_modules/n8n/src/commands/start.ts:203:3)

at CommandRegistry.execute (/usr/local/lib/node_modules/n8n/src/command-registry.ts:82:4)

at /usr/local/lib/node_modules/n8n/bin/n8n:63:2

Connection terminated due to connection timeout

Last session crashed

Initializing n8n process

There was an error initializing DB

at processTicksAndRejections (node:internal/process/task_queues:105:5)

at Start.init (/usr/local/lib/node_modules/n8n/src/commands/base-command.ts:104:3)

at Start.init (/usr/local/lib/node_modules/n8n/src/commands/start.ts:203:3)

at CommandRegistry.execute (/usr/local/lib/node_modules/n8n/src/command-registry.ts:82:4)

at /usr/local/lib/node_modules/n8n/bin/n8n:63:2

Connection terminated due to connection timeout

Last session crashed

Initializing n8n process

vinibr

HOBBYOP

5 months ago

do you have DB_TYPE set to postgresdb? Yes. what's your DB_POSTGRESDB_HOST set to? Yes: postgres.railway.internal do you have all the postgres connection variables? Yes, all of them.

How can I test the database connection? Can I test it in n8n?

Thank you!!🙏

ilyassbreth

FREE

5 months ago

okay i think n8n can't reach the postgres database. the error says it's timing out trying to connect to postgres.railway.internal

check this:

is your postgres service actually running? go to your railway project then check if the postgres service shows as "deployed" (green) , if it's crashed or not running, redeploy it
what's your postgres service actually called in railway?, your DB_POSTGRESDB_HOST should match the exact service name, for example, if your service is called "postgres-production" then the host should be postgres-production.railway.internalNOT just postgres.railway.internal
are your n8n and postgres both in "production" environment? or are they in different environments?

can you check these and let me know what you find? most likely your postgres service name doesn't match what you have in the HOST variable

thorsol

HOBBY

5 months ago

I have the same issue, and tried a bunch of things. You are correct, n8n cannot reach the Postgres DB.

@ilyassbreth

1. Postgres is running ("online"), but the database connection cannot be established. Redis is online, too. Worker and Primary crash due to the failing DB connection.

2. What do you mean by "service name"? The environment is called "production" and the 4 services are Postgres, Redis, Worker, and Primary.

3. Yes, they are in the same environment.

The issue appeared without having made any changes to any settings, and redeployments (for updates) worked before.

Postgres variable "PGHOST" = postgres.railway.internal

Primary variable "DB_POSTGRESDB_HOST" = postgres.railway.internal

thorsol

HOBBY

5 months ago

Here is a summary of what I have tried so far:

I’m hosting n8n on Railway with a Primary + Worker + Postgres + Redis setup (queue mode). It used to work, but now deployments fail and the app becomes unreachable.

Railway shows the healthcheck failing on /healthz with repeated 503 Service Unavailable (“Starting Healthcheck… Attempt #… failed with service unavailable”). When I open the public Railway URL, I get “Application failed to respond.”

In the Primary deploy logs, n8n repeatedly crashes during startup with:

“There was an error initializing DB”
“Could not establish database connection within the configured timeout of 120,000 ms”
“Connection terminated due to connection timeout”

So n8n never reaches a “ready” state and Railway healthchecks keep retrying / restarting.

Current environment / config

n8n runs in queue mode (EXECUTIONS_MODE=queue) with a Worker service.
Postgres and Redis are Online in Railway.
DB variables in n8n are set like:
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres.railway.internal
- DB_POSTGRESDB_PORT=5432
- DB_POSTGRESDB_DATABASE=railway
- DB_POSTGRESDB_USER=postgres
- DB_POSTGRESDB_PASSWORD=...
- DB_POSTGRESDB_CONNECTION_TIMEOUT=120000 (and higher previously)
Networking variables:
- ENABLE_ALPINE_PRIVATE_NETWORKING=true
- N8N_LISTEN_ADDRESS=:: [I also tried 0.0.0.0]
- PORT=5678
Railway healthcheck path is /healthz and it gets stuck retrying.

What I tried (and what happened)

Confirmed DB config (DB host/user/db/port values look correct).
Increased DB timeout (even very high), but n8n still times out connecting to Postgres.
Tried SSL settings (because it worked before without SSL), but it didn’t resolve the issue.
Tried switching execution mode (queue vs regular) to reduce moving parts; still blocked by DB connection on startup.
Created an Alpine diagnostic service to test connectivity to the private Postgres host and ran nc from logs (since no shell available). Result:
- nc: bad address 'postgres.railway.internal' This suggests private DNS resolution for *.railway.internal is failing from inside the services.
Checked Postgres variables: it provides both:
- DATABASE_URL → points to postgres.railway.internal:5432
- DATABASE_PUBLIC_URL → points to a ...proxy.rlwy.net:<port> public endpoint Railway shows a warning that using the public endpoint may incur egress fees.
Tried rolling back to an earlier n8n deployment. Railway shows “deployment successful/online”, but the app URL still shows “Application failed to respond”, and logs still show DB init failing.

What I suspect / what I need help with

It looks like n8n cannot reach Postgres because either:

Railway private networking / private DNS isn’t working, so postgres.railway.internal isn’t resolvable, or
there’s a Railway networking/region/environment mismatch causing that internal hostname not to resolve.

I’d like help confirming:

Why postgres.railway.internal would return “bad address” inside a service even though Postgres is online.
Whether I must switch n8n to the public Postgres proxy URL to make it work (and what the best practice is to avoid egress fees).
Any specific Railway/n8n settings needed for private networking with Postgres + Worker.

If you want, I can also share redacted screenshots of:

Primary deploy logs showing “error initializing DB”
Healthcheck retries on /healthz
Alpine diag logs showing bad address postgres.railway.internal
Postgres variables page showing DATABASE_URL and DATABASE_PUBLIC_URL

Does anyone have an idea what can be done here?

thorsol

HOBBY

5 months ago

I wonder why it's called "/healthz" and not "/health". Could that be the issue?

vinibr

HOBBYOP

5 months ago

I thought here in Railway do we had some kind of support of the own Railway.

This is serious!

We have a security faill on N8N and can't update because of tecnical issues.

I'm going to change to a VPS this way.

Thank you guys for all the help.

ilyassbreth

FREE

5 months ago

that alpine test nailed it , postgres.railway.internal literally isn't resolving

check your actual postgres service name in railway dashboard. the hostname must be <exact-service-name>.railway.internal - if it's named "Postgres-prod" or anything other than just "postgres", that's your issue

if the service name is correct and dns still fails, it's a railway platform bug. temp fix: use the DATABASE_PUBLIC_URL proxy address instead of the internal one

always happy to help 🙂

thorsol

HOBBY

5 months ago

Thank you, ilyassbreth! Please see the screenshots; the settings are exactly the same.

I will try the public URL address...

Attachments

postgres.jpg

Primary.jpg

thorsol

HOBBY

5 months ago

I created a small Alpine diagnostic service on Railway to test database connectivity separately from n8n. I first tested the public Postgres endpoint from DATABASE_PUBLIC_URL (the …proxy.rlwy.net:<port> address) using nc (netcat), and it showed “open”, which means the TCP port is reachable from inside Railway.

Then I tried a real Postgres login against that same public endpoint using psql with SSL required (PGSSLMODE=require). Even though the port was open, psql failed with “server closed the connection unexpectedly”. At the same time, n8n was failing with a similar symptom (ECONNRESET / “error initializing DB”). So I proved the public proxy endpoint is reachable at the network level, but the actual Postgres session/handshake is being terminated when a real client connects.

thorsol

HOBBY

5 months ago

But I do not understand what to do now. Any ideas?

thorsol

HOBBY

5 months ago

I finally fixed this using ChatGPT and thanks to ilyassbreth's indications. Here is how (summary from ChatGPT):

1. Overview

The root problem was that n8n (Primary and Worker) couldn’t reach Redis, so both services kept crashing with Redis client connect ETIMEDOUT and exiting after 10 seconds. Because your n8n setup uses queue mode / task broker, Redis is a required dependency. The fix was to (1) stabilize Primary by temporarily running without Redis, then (2) switch both Primary and Worker to use Redis’s public TCP proxy URL (REDIS_PUBLIC_URL) instead of the private internal URL (REDIS_URL), and finally (3) re-enable queue mode once Redis connectivity was confirmed.

2. Steps

Primary (what I did)

Stabilize Primary first
- Set EXECUTIONS_MODE=regular so Primary could start without Redis.
Stop Redis crash loops
- Remove/disable queue/Redis-related settings temporarily (so Primary wouldn’t exit on Redis timeouts).
After Redis was fixed
- Switch EXECUTIONS_MODE back to queue.
- Configure Primary to use Redis via public proxy (REDIS_PUBLIC_URL parts).
Redeploy Primary and confirm /healthz passes.

Worker (what I did)

Keep Worker from running while Redis was broken (or accept that it will crash in queue mode).
Configure Worker to use the same Redis public proxy endpoint as Primary:
- Set EXECUTIONS_MODE=queue
- Set Bull Redis connection vars to the public proxy host/port/password.
Redeploy Worker and confirm it stays online (no Redis timeout loop).

Redis (what I used)

In Redis service Variables, we identified two endpoints:
- REDIS_URL → private/internal (redis.railway.internal:6379)
- REDIS_PUBLIC_URL → public TCP proxy (*.proxy.rlwy.net:<port>)
Because internal routing was timing out, we chose the public proxy and used it for n8n.

3. Node Configuration (exact variables I used)

Redis → choose the right URL

From Redis Variables:

Use REDIS_PUBLIC_URL (format: redis://default:<PASSWORD>@<HOST>:<PORT>)

Extract:

Host = <HOST>
Port = <PORT>
Password = <PASSWORD>

Worker variables (queue mode)

Railway → Worker → Variables:

EXECUTIONS_MODE=queue
QUEUE_BULL_REDIS_HOST=<HOST from REDIS_PUBLIC_URL>
QUEUE_BULL_REDIS_PORT=<PORT from REDIS_PUBLIC_URL>
QUEUE_BULL_REDIS_PASSWORD=<PASSWORD from REDIS_PUBLIC_URL>

Redeploy Worker.

Primary variables (safe sequence)

Phase 1 (stabilize)

EXECUTIONS_MODE=regular

Redeploy Primary.

Phase 2 (enable queue after Worker/Redis confirmed)

EXECUTIONS_MODE=queue
QUEUE_BULL_REDIS_HOST=<HOST from REDIS_PUBLIC_URL>
QUEUE_BULL_REDIS_PORT=<PORT from REDIS_PUBLIC_URL>
QUEUE_BULL_REDIS_PASSWORD=<PASSWORD from REDIS_PUBLIC_URL>

Redeploy Primary.

4. Optional Enhancements

Once stable, you can try switching back from REDIS_PUBLIC_URL to REDIS_URL (private) to avoid any potential egress fees — but only after you have a stable baseline.
Keep all services (Primary/Worker/Redis/Postgres) in the same region to reduce latency and timeouts.

5. Final Notes

The key insight was: queue mode requires Redis, and the internal Redis endpoint was timing out, so the reliable fix was using the Redis public TCP proxy and switching Primary back to queue only after the Redis path was confirmed working.

thorsol

HOBBY

5 months ago

n8n is now working again. The first thing I did was to backup all workflows ;-),

thorsol

HOBBY

5 months ago

Below is the extended version of all the things I did (again, a summary from my ChatGPT chat). I have to split it up because it is too long for one reply. I hope it helps:

1. Overview

This is the full “A to Z” runbook we ended up using to fix your Railway deployment: n8n Primary/Worker were crash-looping because Redis and Postgres connectivity via private/internal Railway networking was unreliable (timeouts / DNS issues). The solution was to (1) prove connectivity with a dedicated Alpine test service, (2) switch n8n to use the public TCP proxy endpoints for Postgres and Redis, (3) stabilize Primary in regular mode first, and only then (4) re-enable queue mode and bring Worker back.

Below is the complete step-by-step guide with the exact successful sequence.

2. Steps (A to Z)

A) Capture the real crash reason (don’t guess)

Railway → Primary → Deploy Logs
Look for the first “real error” before it restarts:
- DB errors (Postgres): timeout, auth failed, ECONNRESET
- Redis errors: connect ETIMEDOUT + “Exiting process due to Redis connection error”
In our case we saw both at different times:
- Postgres connection problems (initially)
- Redis timeouts (later)

B) Create a “diagnostic Alpine” service (this was key)

This lets you test networking inside Railway, instead of guessing.

Railway → New Service
Choose Docker Image
Use image: alpine:3.19
Deploy it (it will be “unexposed service” — that’s fine)

We then used this service to test Postgres and Redis connectivity with real commands.

C) Diagnose Postgres connectivity with Alpine

C1) Use the Postgres public proxy

Railway → Postgres service → Variables
Find:
- DATABASE_PUBLIC_URL (this is the public TCP proxy)
- DATABASE_URL (often internal/private)
Copy the DATABASE_PUBLIC_URL value.

C2) Put Postgres public URL into Alpine

Railway → Alpine service → Variables
Add:
- DATABASE_URL=<paste DATABASE_PUBLIC_URL>

C3) Alpine start command to test Postgres

Alpine → Settings / Deploy / Start Command (where Railway lets you set it), set:

sh -lc "
apk add --no-cache postgresql15-client >/dev/null 2>&1;
echo 'Testing Postgres...';
psql \"$DATABASE_URL\" -c 'select version();';
sleep 999999
"

If it prints Postgres version → network + URL work.

Note: When we tried internal/private domains, we saw errors like nc: bad address or it simply wouldn’t resolve.

thorsol

HOBBY

5 months ago

D) Fix n8n (Primary) Postgres by switching to the public proxy

When Postgres test succeeds in Alpine, apply the same idea to n8n.

Railway → Primary → Variables
Set Postgres variables to match the public proxy connection details:
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=<proxy host from DATABASE_PUBLIC_URL>
- DB_POSTGRESDB_PORT=<proxy port from DATABASE_PUBLIC_URL>
- DB_POSTGRESDB_DATABASE=railway
- DB_POSTGRESDB_USER=postgres
- DB_POSTGRESDB_PASSWORD=<password from DATABASE_PUBLIC_URL>
- DB_POSTGRESDB_SSL_ENABLED=true
- DB_POSTGRESDB_SSL_REJECT_UNAUTHORIZED=false
- (Optional) DB_POSTGRESDB_CONNECTION_TIMEOUT=120000
Redeploy Primary
Confirm:
- Healthcheck passes
- n8n UI loads

This was the step that made Primary come online reliably again.

E) Diagnose Redis connectivity (the cause of later crash loops)

When Primary later crashed, logs showed:

[Redis client] connect ETIMEDOUT
“Exiting process due to Redis connection error”

That means n8n was configured to require Redis (queue/task broker), but Redis endpoint wasn’t reachable.

F) Stabilize Primary so it stops crashing during Redis issues

Before touching Redis, keep Primary alive.

Railway → Primary → Variables
Set:
- EXECUTIONS_MODE=regular
Temporarily disable anything that forces Redis:
- N8N_RUNNERS_ENABLED=false (if present)
- OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=false (if present)
Remove/unset any QUEUE_BULL_REDIS_* variables on Primary (temporarily)
Redeploy Primary

Primary stays up even if Redis is broken.

G) Identify Redis endpoints (you found the key detail)

Railway → Redis service → Variables

You saw:

REDIS_URL → private/internal (redis.railway.internal:6379)
REDIS_PUBLIC_URL → public proxy (*.proxy.rlwy.net:<port>)

Because internal networking was unreliable, we used REDIS_PUBLIC_URL.

H) Fix Worker first using Redis public proxy

This is the safe order: Worker must have Redis in queue mode.

Railway → Worker → Variables
Set:
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=<host from REDIS_PUBLIC_URL>
- QUEUE_BULL_REDIS_PORT=<port from REDIS_PUBLIC_URL>
- QUEUE_BULL_REDIS_PASSWORD=<password from REDIS_PUBLIC_URL>
Redeploy Worker
Confirm Worker stays online (no ETIMEDOUT loops)

This stopped Worker crash-looping.

I) Switch Primary back to queue mode (after Worker is stable)

Railway → Primary → Variables
Set:
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=<same public proxy host>
- QUEUE_BULL_REDIS_PORT=<same port>
- QUEUE_BULL_REDIS_PASSWORD=<same password>
Redeploy Primary
Confirm Primary logs no longer show Redis timeouts

This restored queue mode without crashing.

J) Final validation checklist

Primary:
- Healthcheck passes
- UI loads
- No repeated restart loops
Worker:
- Stays online
- No Redis client connect ETIMEDOUT
Redis:
- Confirm REDIS_PUBLIC_URL is the one being used by n8n
Postgres:
- n8n works (proves DB connectivity)
- Ignore Railway Postgres “Database Connection” UI spinner if it stays stuck (it may still be trying internal connection paths)

thorsol

HOBBY

5 months ago

3. Node Configuration (the exact variable “recipes”)

Primary (final working state)

Postgres via public proxy:
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=<from DATABASE_PUBLIC_URL>
- DB_POSTGRESDB_PORT=<from DATABASE_PUBLIC_URL>
- DB_POSTGRESDB_DATABASE=railway
- DB_POSTGRESDB_USER=postgres
- DB_POSTGRESDB_PASSWORD=<from DATABASE_PUBLIC_URL>
- DB_POSTGRESDB_SSL_ENABLED=true
- DB_POSTGRESDB_SSL_REJECT_UNAUTHORIZED=false
Redis via public proxy:
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=<from REDIS_PUBLIC_URL>
- QUEUE_BULL_REDIS_PORT=<from REDIS_PUBLIC_URL>
- QUEUE_BULL_REDIS_PASSWORD=<from REDIS_PUBLIC_URL>

Worker (final working state)

Same Redis public proxy settings as Primary
Same Postgres settings as Primary (DB must also be reachable by Worker)

Redis (what mattered)

Use REDIS_PUBLIC_URL for n8n if internal routing is flaky.

4. Optional Enhancements

Once everything is stable, you can try switching back to private (REDIS_URL / internal Postgres) to reduce potential egress fees — but only after you have a stable baseline and only if internal networking works again.
Create a simple “health workflow” in n8n:
- Cron → Postgres “SELECT 1” → Redis ping (HTTP/Code) → alert on failure.

5. Final Notes

The big winning pattern here was: Stop guessing; test connectivity from inside Railway (Alpine), then use the public proxy URLs for the dependencies that were timing out on private/internal networking. Once both Postgres + Redis were reachable, Primary and Worker stopped crash-looping and the healthcheck succeeded.

vinibr

HOBBYOP

5 months ago

Thank you!!

Status changed to Open brody • 5 months ago

Status changed to Solved brody • 5 months ago

Welcome!