2 months ago
I'm trying for several days to do a redeploy in n8n, but, I'm always getting the error:
Initialization - OK
Deploy - OK
Network > Healthcheck - FALHOU (após 4min53s)
Post-deploy - Não iniciou
May someone help me?
24 Replies
2 months ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
Railway
Hey there! We've found the following might help you get unblocked faster: - [🧵 Healthcheck Failure!!](https://station.railway.com/questions/healthcheck-failure-ecac6092) - [🧵 Unable to redeploy n8n - failing healthcheck](https://station.railway.com/questions/unable-to-redeploy-n8n-failing-healthc-c7997df8) - [🧵 Eternal Healthcheck failure](https://station.railway.com/questions/healthcheck-failure-00d29c2b) If you find the answer from one of these, please let us know by solving the thread!
2 months ago
This, does not help.
vinibr
This, does not help.
2 months ago
Set a variable N8N_PORT to whatever port you defined in the service settings under "Networking -> Public Networking". You can remove the healthcheck or verify it's /healthz . Likely just removing it will not help because it seems your service is misconfigured.
2 months ago
Thank you! The variable N8N_PORT wasn't there. I have created it, set it and redeployed, but, it didn't work.
2 months ago
i think the healthcheck is failing because n8n's /healthz endpoint is disabled by default. try adding this environment variable to your primary service:
N8N_METRICS=truethis should enable the healthcheck endpoint. also double-check that your healthcheck path in railway settings (under networking) is set to /healthz
if that doesn't work, there might be a port mismatch, make sure the PORT variable matches what n8n is actually listening on. you can check the deploy logs to see what port n8n binds to
let me know if this helps
2 months ago
Thank you, ilyassbreath! It didn't work. Added the variable, tried the redeploy, but with no successful deploy. In the logs hasn't the port number.
2 months ago
thanks for the update! if the logs aren't showing a port number, n8n probably isn't starting up properly. this is usually a database connection issue
can you share your deploy logs? just go to the primary service → deployments → click the latest one → and copy/paste what you see in the logs here
also, can you check these variables in your primary service:
do you have
DB_TYPEset topostgresdb?what's your
DB_POSTGRESDB_HOSTset to? (it should be something likepostgres.railway.internalor the name of your postgres service)do you have all the postgres connection variables? (
DB_POSTGRESDB_DATABASE,DB_POSTGRESDB_USER,DB_POSTGRESDB_PASSWORD,DB_POSTGRESDB_PORT)
the healthcheck is probably failing because n8n can't connect to the database and never actually starts up. once we fix the database connection, the healthcheck should work
2 months ago
LOGs:
You reached the start of the range
Jan 9, 2026, 10:05 PM
Starting Container
Initializing n8n process
There was an error initializing DB
Could not establish database connection within the configured timeout of 120,000 ms. Please ensure the database is configured correctly and the server is reachable. You can increase the timeout by setting the 'DB_POSTGRESDB_CONNECTION_TIMEOUT' environment variable.
Error: Could not establish database connection within the configured timeout of 120,000 ms. Please ensure the database is configured correctly and the server is reachable. You can increase the timeout by setting the 'DB_POSTGRESDB_CONNECTION_TIMEOUT' environment variable.
at DbConnection.init (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+db@file+packages+@n8n+db_@opentelemetry+api@1.9.0_@opentelemetry+sdk-trace-base@1._ab22bba05a964211b9fe14bf4b841570/node_modules/@n8n/db/src/connection/db-connection.ts:58:13)
at processTicksAndRejections (node:internal/process/task_queues:105:5)
at Start.init (/usr/local/lib/node_modules/n8n/src/commands/base-command.ts:104:3)
at Start.init (/usr/local/lib/node_modules/n8n/src/commands/start.ts:203:3)
at CommandRegistry.execute (/usr/local/lib/node_modules/n8n/src/command-registry.ts:82:4)
at /usr/local/lib/node_modules/n8n/bin/n8n:63:2
Connection terminated due to connection timeout
Last session crashed
Initializing n8n process
There was an error initializing DB
Could not establish database connection within the configured timeout of 120,000 ms. Please ensure the database is configured correctly and the server is reachable. You can increase the timeout by setting the 'DB_POSTGRESDB_CONNECTION_TIMEOUT' environment variable.
Error: Could not establish database connection within the configured timeout of 120,000 ms. Please ensure the database is configured correctly and the server is reachable. You can increase the timeout by setting the 'DB_POSTGRESDB_CONNECTION_TIMEOUT' environment variable.
at DbConnection.init (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+db@file+packages+@n8n+db_@opentelemetry+api@1.9.0_@opentelemetry+sdk-trace-base@1._ab22bba05a964211b9fe14bf4b841570/node_modules/@n8n/db/src/connection/db-connection.ts:58:13)
at processTicksAndRejections (node:internal/process/task_queues:105:5)
at Start.init (/usr/local/lib/node_modules/n8n/src/commands/base-command.ts:104:3)
at Start.init (/usr/local/lib/node_modules/n8n/src/commands/start.ts:203:3)
at CommandRegistry.execute (/usr/local/lib/node_modules/n8n/src/command-registry.ts:82:4)
at /usr/local/lib/node_modules/n8n/bin/n8n:63:2
Connection terminated due to connection timeout
Last session crashed
Initializing n8n process
2 months ago
do you have
DB_TYPEset topostgresdb? Yes.
what's your DB_POSTGRESDB_HOST set to? Yes: postgres.railway.internal
do you have all the postgres connection variables? Yes, all of them.
How can I test the database connection? Can I test it in n8n?
Thank you!!🙏
2 months ago
okay i think n8n can't reach the postgres database. the error says it's timing out trying to connect to postgres.railway.internal
check this:
is your postgres service actually running? go to your railway project then check if the postgres service shows as "deployed" (green) , if it's crashed or not running, redeploy it
what's your postgres service actually called in railway?, your
DB_POSTGRESDB_HOSTshould match the exact service name, for example, if your service is called "postgres-production" then the host should bepostgres-production.railway.internalNOT justpostgres.railway.internalare your n8n and postgres both in "production" environment? or are they in different environments?
can you check these and let me know what you find? most likely your postgres service name doesn't match what you have in the HOST variable
2 months ago
I have the same issue, and tried a bunch of things. You are correct, n8n cannot reach the Postgres DB.
@ilyassbreth
1. Postgres is running ("online"), but the database connection cannot be established. Redis is online, too. Worker and Primary crash due to the failing DB connection.
2. What do you mean by "service name"? The environment is called "production" and the 4 services are Postgres, Redis, Worker, and Primary.
3. Yes, they are in the same environment.
The issue appeared without having made any changes to any settings, and redeployments (for updates) worked before.
Postgres variable "PGHOST" = postgres.railway.internal
Primary variable "DB_POSTGRESDB_HOST" = postgres.railway.internal
2 months ago
Here is a summary of what I have tried so far:
I’m hosting n8n on Railway with a Primary + Worker + Postgres + Redis setup (queue mode). It used to work, but now deployments fail and the app becomes unreachable.
Railway shows the healthcheck failing on /healthz with repeated 503 Service Unavailable (“Starting Healthcheck… Attempt #… failed with service unavailable”). When I open the public Railway URL, I get “Application failed to respond.”
In the Primary deploy logs, n8n repeatedly crashes during startup with:
“There was an error initializing DB”
“Could not establish database connection within the configured timeout of 120,000 ms”
“Connection terminated due to connection timeout”
So n8n never reaches a “ready” state and Railway healthchecks keep retrying / restarting.
Current environment / config
n8n runs in queue mode (
EXECUTIONS_MODE=queue) with a Worker service.Postgres and Redis are Online in Railway.
DB variables in n8n are set like:
DB_TYPE=postgresdbDB_POSTGRESDB_HOST=postgres.railway.internalDB_POSTGRESDB_PORT=5432DB_POSTGRESDB_DATABASE=railwayDB_POSTGRESDB_USER=postgresDB_POSTGRESDB_PASSWORD=...DB_POSTGRESDB_CONNECTION_TIMEOUT=120000(and higher previously)
Networking variables:
ENABLE_ALPINE_PRIVATE_NETWORKING=trueN8N_LISTEN_ADDRESS=:: [I also tried 0.0.0.0]PORT=5678
Railway healthcheck path is
/healthzand it gets stuck retrying.
What I tried (and what happened)
Confirmed DB config (DB host/user/db/port values look correct).
Increased DB timeout (even very high), but n8n still times out connecting to Postgres.
Tried SSL settings (because it worked before without SSL), but it didn’t resolve the issue.
Tried switching execution mode (queue vs regular) to reduce moving parts; still blocked by DB connection on startup.
Created an Alpine diagnostic service to test connectivity to the private Postgres host and ran
ncfrom logs (since no shell available).
Result:nc: bad address 'postgres.railway.internal'
This suggests private DNS resolution for
*.railway.internalis failing from inside the services.Checked Postgres variables: it provides both:
DATABASE_URL→ points topostgres.railway.internal:5432DATABASE_PUBLIC_URL→ points to a...proxy.rlwy.net:<port>public endpoint
Railway shows a warning that using the public endpoint may incur egress fees.
Tried rolling back to an earlier n8n deployment. Railway shows “deployment successful/online”, but the app URL still shows “Application failed to respond”, and logs still show DB init failing.
What I suspect / what I need help with
It looks like n8n cannot reach Postgres because either:
Railway private networking / private DNS isn’t working, so
postgres.railway.internalisn’t resolvable, orthere’s a Railway networking/region/environment mismatch causing that internal hostname not to resolve.
I’d like help confirming:
Why
postgres.railway.internalwould return “bad address” inside a service even though Postgres is online.Whether I must switch n8n to the public Postgres proxy URL to make it work (and what the best practice is to avoid egress fees).
Any specific Railway/n8n settings needed for private networking with Postgres + Worker.
If you want, I can also share redacted screenshots of:
Primary deploy logs showing “error initializing DB”
Healthcheck retries on
/healthzAlpine diag logs showing
bad address postgres.railway.internalPostgres variables page showing
DATABASE_URLandDATABASE_PUBLIC_URL
Does anyone have an idea what can be done here?
2 months ago
I wonder why it's called "/healthz" and not "/health". Could that be the issue?
2 months ago
I thought here in Railway do we had some kind of support of the own Railway.
This is serious!
We have a security faill on N8N and can't update because of tecnical issues.
I'm going to change to a VPS this way.
Thank you guys for all the help.
2 months ago
that alpine test nailed it , postgres.railway.internal literally isn't resolving
check your actual postgres service name in railway dashboard. the hostname must be <exact-service-name>.railway.internal - if it's named "Postgres-prod" or anything other than just "postgres", that's your issue
if the service name is correct and dns still fails, it's a railway platform bug. temp fix: use the DATABASE_PUBLIC_URL proxy address instead of the internal one
always happy to help 
2 months ago
Thank you, ilyassbreth! Please see the screenshots; the settings are exactly the same.
I will try the public URL address...
Attachments
2 months ago
I created a small Alpine diagnostic service on Railway to test database connectivity separately from n8n. I first tested the public Postgres endpoint from DATABASE_PUBLIC_URL (the …proxy.rlwy.net:<port> address) using nc (netcat), and it showed “open”, which means the TCP port is reachable from inside Railway.
Then I tried a real Postgres login against that same public endpoint using psql with SSL required (PGSSLMODE=require). Even though the port was open, psql failed with “server closed the connection unexpectedly”. At the same time, n8n was failing with a similar symptom (ECONNRESET / “error initializing DB”). So I proved the public proxy endpoint is reachable at the network level, but the actual Postgres session/handshake is being terminated when a real client connects.
2 months ago
But I do not understand what to do now. Any ideas?
2 months ago
I finally fixed this using ChatGPT and thanks to ilyassbreth's indications. Here is how (summary from ChatGPT):
1. Overview
The root problem was that n8n (Primary and Worker) couldn’t reach Redis, so both services kept crashing with Redis client connect ETIMEDOUT and exiting after 10 seconds. Because your n8n setup uses queue mode / task broker, Redis is a required dependency. The fix was to (1) stabilize Primary by temporarily running without Redis, then (2) switch both Primary and Worker to use Redis’s public TCP proxy URL (REDIS_PUBLIC_URL) instead of the private internal URL (REDIS_URL), and finally (3) re-enable queue mode once Redis connectivity was confirmed.
2. Steps
Primary (what I did)
Stabilize Primary first
Set
EXECUTIONS_MODE=regularso Primary could start without Redis.
Stop Redis crash loops
Remove/disable queue/Redis-related settings temporarily (so Primary wouldn’t exit on Redis timeouts).
After Redis was fixed
Switch
EXECUTIONS_MODEback toqueue.Configure Primary to use Redis via public proxy (
REDIS_PUBLIC_URLparts).
Redeploy Primary and confirm
/healthzpasses.
Worker (what I did)
Keep Worker from running while Redis was broken (or accept that it will crash in queue mode).
Configure Worker to use the same Redis public proxy endpoint as Primary:
Set
EXECUTIONS_MODE=queueSet Bull Redis connection vars to the public proxy host/port/password.
Redeploy Worker and confirm it stays online (no Redis timeout loop).
Redis (what I used)
In Redis service Variables, we identified two endpoints:
REDIS_URL→ private/internal (redis.railway.internal:6379)REDIS_PUBLIC_URL→ public TCP proxy (*.proxy.rlwy.net:<port>)
Because internal routing was timing out, we chose the public proxy and used it for n8n.
3. Node Configuration (exact variables I used)
Redis → choose the right URL
From Redis Variables:
Use
REDIS_PUBLIC_URL(format:redis://default:<PASSWORD>@<HOST>:<PORT>)
Extract:
Host =
<HOST>Port =
<PORT>Password =
<PASSWORD>
Worker variables (queue mode)
Railway → Worker → Variables:
EXECUTIONS_MODE=queueQUEUE_BULL_REDIS_HOST=<HOST from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PORT=<PORT from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PASSWORD=<PASSWORD from REDIS_PUBLIC_URL>
Redeploy Worker.
Primary variables (safe sequence)
Phase 1 (stabilize)
EXECUTIONS_MODE=regular
Redeploy Primary.
Phase 2 (enable queue after Worker/Redis confirmed)
EXECUTIONS_MODE=queueQUEUE_BULL_REDIS_HOST=<HOST from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PORT=<PORT from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PASSWORD=<PASSWORD from REDIS_PUBLIC_URL>
Redeploy Primary.
4. Optional Enhancements
Once stable, you can try switching back from
REDIS_PUBLIC_URLtoREDIS_URL(private) to avoid any potential egress fees — but only after you have a stable baseline.Keep all services (Primary/Worker/Redis/Postgres) in the same region to reduce latency and timeouts.
5. Final Notes
The key insight was: queue mode requires Redis, and the internal Redis endpoint was timing out, so the reliable fix was using the Redis public TCP proxy and switching Primary back to queue only after the Redis path was confirmed working.
2 months ago
n8n is now working again. The first thing I did was to backup all workflows ;-),
2 months ago
Below is the extended version of all the things I did (again, a summary from my ChatGPT chat). I have to split it up because it is too long for one reply. I hope it helps:
1. Overview
This is the full “A to Z” runbook we ended up using to fix your Railway deployment: n8n Primary/Worker were crash-looping because Redis and Postgres connectivity via private/internal Railway networking was unreliable (timeouts / DNS issues). The solution was to (1) prove connectivity with a dedicated Alpine test service, (2) switch n8n to use the public TCP proxy endpoints for Postgres and Redis, (3) stabilize Primary in regular mode first, and only then (4) re-enable queue mode and bring Worker back.
Below is the complete step-by-step guide with the exact successful sequence.
2. Steps (A to Z)
A) Capture the real crash reason (don’t guess)
Railway → Primary → Deploy Logs
Look for the first “real error” before it restarts:
DB errors (Postgres): timeout, auth failed, ECONNRESET
Redis errors:
connect ETIMEDOUT+ “Exiting process due to Redis connection error”
In our case we saw both at different times:
Postgres connection problems (initially)
Redis timeouts (later)
B) Create a “diagnostic Alpine” service (this was key)
This lets you test networking inside Railway, instead of guessing.
Railway → New Service
Choose Docker Image
Use image:
alpine:3.19Deploy it (it will be “unexposed service” — that’s fine)
We then used this service to test Postgres and Redis connectivity with real commands.
C) Diagnose Postgres connectivity with Alpine
C1) Use the Postgres public proxy
Railway → Postgres service → Variables
Find:
DATABASE_PUBLIC_URL(this is the public TCP proxy)DATABASE_URL(often internal/private)
Copy the
DATABASE_PUBLIC_URLvalue.
C2) Put Postgres public URL into Alpine
Railway → Alpine service → Variables
Add:
DATABASE_URL=<paste DATABASE_PUBLIC_URL>
C3) Alpine start command to test Postgres
Alpine → Settings / Deploy / Start Command (where Railway lets you set it), set:
sh -lc "
apk add --no-cache postgresql15-client >/dev/null 2>&1;
echo 'Testing Postgres...';
psql \"$DATABASE_URL\" -c 'select version();';
sleep 999999
"
If it prints Postgres version → network + URL work.
Note: When we tried internal/private domains, we saw errors like
nc: bad addressor it simply wouldn’t resolve.
2 months ago
D) Fix n8n (Primary) Postgres by switching to the public proxy
When Postgres test succeeds in Alpine, apply the same idea to n8n.
Railway → Primary → Variables
Set Postgres variables to match the public proxy connection details:
DB_TYPE=postgresdbDB_POSTGRESDB_HOST=<proxy host from DATABASE_PUBLIC_URL>DB_POSTGRESDB_PORT=<proxy port from DATABASE_PUBLIC_URL>DB_POSTGRESDB_DATABASE=railwayDB_POSTGRESDB_USER=postgresDB_POSTGRESDB_PASSWORD=<password from DATABASE_PUBLIC_URL>DB_POSTGRESDB_SSL_ENABLED=trueDB_POSTGRESDB_SSL_REJECT_UNAUTHORIZED=false(Optional)
DB_POSTGRESDB_CONNECTION_TIMEOUT=120000
Redeploy Primary
Confirm:
Healthcheck passes
n8n UI loads
This was the step that made Primary come online reliably again.
E) Diagnose Redis connectivity (the cause of later crash loops)
When Primary later crashed, logs showed:
[Redis client] connect ETIMEDOUT“Exiting process due to Redis connection error”
That means n8n was configured to require Redis (queue/task broker), but Redis endpoint wasn’t reachable.
F) Stabilize Primary so it stops crashing during Redis issues
Before touching Redis, keep Primary alive.
Railway → Primary → Variables
Set:
EXECUTIONS_MODE=regular
Temporarily disable anything that forces Redis:
N8N_RUNNERS_ENABLED=false(if present)OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=false(if present)
Remove/unset any
QUEUE_BULL_REDIS_*variables on Primary (temporarily)Redeploy Primary
Primary stays up even if Redis is broken.
G) Identify Redis endpoints (you found the key detail)
Railway → Redis service → Variables
You saw:
REDIS_URL→ private/internal (redis.railway.internal:6379)REDIS_PUBLIC_URL→ public proxy (*.proxy.rlwy.net:<port>)
Because internal networking was unreliable, we used REDIS_PUBLIC_URL.
H) Fix Worker first using Redis public proxy
This is the safe order: Worker must have Redis in queue mode.
Railway → Worker → Variables
Set:
EXECUTIONS_MODE=queueQUEUE_BULL_REDIS_HOST=<host from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PORT=<port from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PASSWORD=<password from REDIS_PUBLIC_URL>
Redeploy Worker
Confirm Worker stays online (no ETIMEDOUT loops)
This stopped Worker crash-looping.
I) Switch Primary back to queue mode (after Worker is stable)
Railway → Primary → Variables
Set:
EXECUTIONS_MODE=queueQUEUE_BULL_REDIS_HOST=<same public proxy host>QUEUE_BULL_REDIS_PORT=<same port>QUEUE_BULL_REDIS_PASSWORD=<same password>
Redeploy Primary
Confirm Primary logs no longer show Redis timeouts
This restored queue mode without crashing.
J) Final validation checklist
Primary:
Healthcheck passes
UI loads
No repeated restart loops
Worker:
Stays online
No
Redis client connect ETIMEDOUT
Redis:
Confirm
REDIS_PUBLIC_URLis the one being used by n8n
Postgres:
n8n works (proves DB connectivity)
Ignore Railway Postgres “Database Connection” UI spinner if it stays stuck (it may still be trying internal connection paths)
2 months ago
3. Node Configuration (the exact variable “recipes”)
Primary (final working state)
Postgres via public proxy:
DB_TYPE=postgresdbDB_POSTGRESDB_HOST=<from DATABASE_PUBLIC_URL>DB_POSTGRESDB_PORT=<from DATABASE_PUBLIC_URL>DB_POSTGRESDB_DATABASE=railwayDB_POSTGRESDB_USER=postgresDB_POSTGRESDB_PASSWORD=<from DATABASE_PUBLIC_URL>DB_POSTGRESDB_SSL_ENABLED=trueDB_POSTGRESDB_SSL_REJECT_UNAUTHORIZED=false
Redis via public proxy:
EXECUTIONS_MODE=queueQUEUE_BULL_REDIS_HOST=<from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PORT=<from REDIS_PUBLIC_URL>QUEUE_BULL_REDIS_PASSWORD=<from REDIS_PUBLIC_URL>
Worker (final working state)
Same Redis public proxy settings as Primary
Same Postgres settings as Primary (DB must also be reachable by Worker)
Redis (what mattered)
Use
REDIS_PUBLIC_URLfor n8n if internal routing is flaky.
4. Optional Enhancements
Once everything is stable, you can try switching back to private (
REDIS_URL/ internal Postgres) to reduce potential egress fees — but only after you have a stable baseline and only if internal networking works again.Create a simple “health workflow” in n8n:
Cron → Postgres “SELECT 1” → Redis ping (HTTP/Code) → alert on failure.
5. Final Notes
The big winning pattern here was: Stop guessing; test connectivity from inside Railway (Alpine), then use the public proxy URLs for the dependencies that were timing out on private/internal networking. Once both Postgres + Redis were reachable, Primary and Worker stopped crash-looping and the healthcheck succeeded.
2 months ago
Thank you!!
Status changed to Open brody • about 2 months ago
Status changed to Solved brody • about 2 months ago