2 months ago
Problem:
Every new build in the production environment crashes silently ~90 seconds after startup. The exact same code and config runs perfectly in the dev
environment for hours.
Evidence:
- Failing production deploy: d6a40d3b-3bb3-440d-b00f-fbed53ddf6e7 (Mar 13, commit 0138319)
- Server starts, healthcheck passes, memory is stable at ~108MB RSS / 49MB heap
- After ~90 seconds the process dies — no error logs, no uncaughtException, no V8 OOM (--max-old-space-size=450 was set)
- Health endpoint returns 502 "connection refused"
- Working dev deploy: 32749624-f0c3-491e-9b7a-149ef393f55b (same code, same railway.json)
- Runs for 4+ hours, handles scans, no issues
- Working production rollback: e8846ed5-a350-4213-9c56-bd09c0c98050 (old image from March 7)
- Old image works fine, any new build crashes
Timeline:
- Last working fresh production build: March 7 (1f596faa)
- First broken production build: March 8 (ca6bef54) — triggered by a config patch, same commit as the working build
- Every new production build since then crashes, including pinned Dockerfile builds with node:20.20.1-slim
What we've ruled out:
- Not OOM — memory is 108MB, limit is 32GB, V8 heap cap at 450MB didn't trigger
- Not a code bug — same code runs fine in dev environment
- Not Node.js version — tried pinned node:20.20.1-slim Dockerfile, same crash
- Not uncaught exceptions — process.on('uncaughtException') handler doesn't fire
- Not database — pool.on('error') handler added, no errors logged
Conclusion:
The production container process is being SIGKILL'd by something outside Node.js. This only affects the production environment — the dev environment
with identical code and config is stable.
10 Replies
2 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 2 months ago
2 months ago
Hello, the code didn't change between the last working build and the first broken one , you said so yourself. the only thing that changed was the config patch on march 8. that's your only lead. whatever that patch changed in the config is what's killing production. roll it back or diff it against the march 7 config and that's your culprit
domehane
Hello, the code didn't change between the last working build and the first broken one , you said so yourself. the only thing that changed was the config patch on march 8\. that's your only lead. whatever that patch changed in the config is what's killing production. roll it back or diff it against the march 7 config and that's your culprit
2 months ago
as I mentioned at the start of my message, configs are the same between dev and prod right now
- Container starts successfully ("Server running on port 3000" in logs)
- Process stays alive (memory logs every 30s confirm this)
- But all HTTP requests including /health return 502 from Railway's load balancer
- Identical code + config works in the dev environment
- Only production environment is affected, started March 8 after a config patch
2 months ago
That's honestly really strange that my thread is converted to the public and I got no priority support from the railway team 🙁
Status changed to Solved dealwith • 2 months ago
2 months ago
it's honestly the worst support experience
Status changed to Open brody • 2 months ago
dealwith
as I mentioned at the start of my message, configs are the same between dev and prod right now \- Container starts successfully ("Server running on port 3000" in logs) \- Process stays alive (memory logs every 30s confirm this) \- But all HTTP requests including /health return 502 from Railway's load balancer \- Identical code + config works in the dev environment \- Only production environment is affected, started March 8 after a config patch
2 months ago
ok so the process is not dying; it's alive , the 502 is coming from railway's load balancer, meaning railway can't reach your container on the port it expects. your logs say the app is on port 3000, but the question is: what port is railway actually trying to route to in production? check what process.env.PORT is set to in the production environment vs dev. if railway assigned a different port and your app is hardcoded to 3000 instead of listening on process.env.PORT, the load balancer will 502 every time even though the process looks healthy
domehane
ok so the process is not dying; it's alive , the 502 is coming from railway's load balancer, meaning railway can't reach your container on the port it expects. your logs say the app is on port 3000, but the question is: what port is railway actually trying to route to in production? check what `process.env.PORT` is set to in the production environment vs dev. if railway assigned a different port and your app is hardcoded to 3000 instead of listening on `process.env.PORT`, the load balancer will 502 every time even though the process looks healthy
2 months ago
great suggestion, the way how I did it, I just removed my port variable, and was able to connect to the server and see what the port variable actually populated by railway, the PORT by default was 8080
2 months ago
that's your bug ,your app was hardcoded to listen on 3000 but railway was routing traffic to 8080. just change your app to listen on process.env.PORT instead of hardcoded 3000 and it'll work
domehane
that's your bug ,your app was hardcoded to listen on 3000 but railway was routing traffic to 8080\. just change your app to listen on `process.env.PORT` instead of hardcoded 3000 and it'll work
2 months ago
that's not "my bug" that's what should be told by railway, hey, we expect you to use port 8080
Status changed to Open brody • 2 months ago
dealwith
that's not "my bug" that's what should be told by railway, hey, we expect you to use port 8080
2 months ago
fair point, but railway does actually document that you should listen on process.env.PORT and not a hardcoded port. as long as you haven't defined a PORT variable yourself, railway will provide and expose one for you. that's standard for any cloud platform, so the expectation is on you to use their env variable, not a fixed port ??
domehane
fair point, but railway does actually document that you should listen on `process.env.PORT` and not a hardcoded port. as long as you haven't defined a `PORT` variable yourself, railway will provide and expose one for you. that's standard for any cloud platform, so the expectation is on you to use their env variable, not a fixed port ??
2 months ago
I had a defined port both in prod and dev env, for some reason on dev env it's working fine, and in prod it's not working correctly, it's maybe cached in their infra or idk
Status changed to Solved dealwith • 2 months ago
