4 days ago
We are having on-going service restarts and don't have visibility into the root cause. My Cursor agent has created this summary and our ask.
-----------
Environment
Stack: NestJS backend (@savor/backend) running on Railway
DB: Railway PostgreSQL (postgres.railway.internal:5432)
Other services: Redis on Railway
Deploy time of current version: ~2025‑12‑01 20:03 PT
Symptoms
Backend process restarts periodically with logs like:
---
[Nest] 127 -12/01/2025, 4:21:27 AM LOG [NestApplication] Nest application successfully started +351ms
UsersController: syncing user data from frontend { auth0Id: '...', ... }
ELIFECYCLE Command failed.
Environment variables loaded from .env
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "railway", schema "public" at "postgres.railway.internal:5432"
15 migrations found in prisma/migrations
No pending migrations to apply.
> @savor/backend@0.0.1 start:prod /app
> node dist/main
---
Pattern is consistent:
Service starts normally, maps all routes, logs “Nest application successfully started”.
On first or early /users/sync call (user sync endpoint), we see UsersController: syncing user data from frontend ….
Immediately after, ELIFECYCLE Command failed. appears, and Railway restarts the container.
There is no JS stack trace or error logged before the restart, even though we have:
---
process.on('uncaughtException', ...)
process.on('unhandledRejection', ...)
---
wired at the top of main.ts.
Mitigations already applied
To rule out application‑level causes, we have:
Disabled Redis user caching:
USER_CACHE_TTL=0 → guards/interceptors no longer call Redis get/setex/del.
Disabled Bull job queues:
Added BULL_ENABLED flag; when BULL_ENABLED=false we do not call BullModule.forRoot(...).
Disabled major AI workflows
Despite this, restarts continue with the same pattern, even when the app is mostly idle.
What I’m asking Railway for
For the restarts around these timestamps (example): 2025‑12‑01T04:21:27Z and 2025‑12‑01T04:30:58Z
on service backend / deployment IDs:
please provide the exact exit code and signal for the Node process (e.g. exit 1, exit 137, SIGKILL, etc.).
Check your lower‑level logs for that container for any native/runtime errors, such as: OOM kills, node or V8 crashes, Prisma/native client panics or segmentation faults, health‑check failures that cause you to terminate the process.
Confirm whether any resource limits or health checks are currently configured for this service that could be killing the process despite stable CPU/memory at the app level.
Having the exit code/signal and any native error message around those restart times will let me distinguish between:
A platform/runtime issue (OOM, crash, health‑check kill), vs.
Something inside my app/Prisma stack that isn’t visible through JS error handlers.
---
Thank you for helping us solve this confounding issue!
7 Replies
4 days ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
4 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open ray-chen • 4 days ago
4 days ago
I'm facing this exact issue with almost a pretty much identical setup. NestJS, Prisma, Redis and BullMQ.
No error logs. Tried disabling BullMQ workers, even extracted some workers into its own separate service in case it was a load issue - still crashing intermittently with no trace or error logs.
I was just about to disable user caching, but after stumbling across your post , it doesn't seem like it'll do anything,
I just made a post here describing it too: https://station.railway.com/questions/nest-js-server-is-crashing-with-no-logs-a2d99b6d
Hope we find something helpful soon!
4 days ago
Same issue here - https://station.railway.com/questions/exit-code-c6aa8f4a
Railway doesn't provide, it seems , exit codes - trying to check my app for bursts - no exit codes , no nothing
Is it really only way to ask sup to get exit codes?
4 days ago
Found working strategy for me - wrap docker like this inside Dockerfile
Wrap node to emit exit code even on hard failures
CMD ["sh", "-c", "node dist/server.js & pid=$!; wait $pid; code=$?; echo \"[WRAPPER] node exited status=$code\" >&2; sleep 1; exit $code"]
- got logs like this
Segmentation fault
[WRAPPER] node exited status=139
4 days ago
Reviewing feedback above. Thanks!
3 days ago
Thanks @rafchik! I now have visibility on the failure.
What I'm seeing in the logs now is periodic restarts, preceded by:
Segmentation fault
[WRAPPER] backend exited with error status=139Now I need to know what exactly is causing this. My Cursor agent noticed that my Prisma client and engine versions were not aligned. I fixed this by pinning them to 6.19.0 and verified the versioning for both in production by adding logging.
However, that fix did not address the root cause. I'm still seeing periodic seg faults and exit with status = 139.
Railway People: Here's the timing for some of the exits from our "backend" service. Can you look under hood and give me any more clues as to cause of the exits at these times??
---
2025-12-02T01:08:36.254122182Z [err] Segmentation fault
2025-12-02T01:08:36.254126194Z [err] [WRAPPER] backend exited with error status=139
2025-12-02T01:33:39.249690585Z [err] Segmentation fault
2025-12-02T01:33:39.249704127Z [err] [WRAPPER] backend exited with error status=139
2025-12-02T01:55:32.052545561Z [err] Segmentation fault
2025-12-02T01:55:32.052552147Z [err] [WRAPPER] backend exited with error status=139
2025-12-02T06:37:58.196049192Z [err] Segmentation fault
2025-12-02T06:37:58.196055803Z [err] [WRAPPER] backend exited with error status=139
2025-12-02T06:40:59.021172509Z [err] Segmentation fault
2025-12-02T06:40:59.021178704Z [err] [WRAPPER] backend exited with error status=139
2025-12-02T06:45:52.244560766Z [err] Segmentation fault
2025-12-02T06:45:52.244565010Z [err] [WRAPPER] backend exited with error status=139
2025-12-02T06:58:25.038102822Z [err] Segmentation fault
2025-12-02T06:58:25.038110579Z [err] [WRAPPER] backend exited with error status=139
2025-12-02T06:58:27.254112303Z [err] Segmentation fault
2025-12-02T06:58:27.254118294Z [err] [WRAPPER] backend exited with error status=139
2 days ago
I'm marking this as resolved. Thanks to @rafchik-dipstick's tip on the wrapper to catch error codes, I was able to catch a seg fault exit code of 139. This led to further examination of Prisma and associated interactions. There was a highly sub-optimal/redundant calling pattern in my web app that led to multiple concurrent calls to get the same record in the database. This slammed into a known bug with prisma library around race-condition and concurrency. The ultimate solution was to reduce unneeded concurrent calls. And I'm also trying to switch to the binary version of prisma which runs in a separate process from the JS app and apparently does not have this concurrency bug.
Thank you @rafchik-dipstick for getting me unblocked and on a path to solving the problem!
Status changed to Solved ray-chen • 2 days ago
Status changed to Open ray-chen • 2 days ago
Status changed to Solved ray-chen • 2 days ago


