NestJS Servers are crashing with no error logs
caydennn
PROOP

3 months ago

Hey

I have a couple of servers using NestJS, with BullMQ and Prisma as well.

(A) a main server

(B) a main jobs server

(C) an inngest jobs server

At random intervals, these servers are crashing with no error logs and I'm currently relying on Railway's restart policy to serve traffic. Only server A is exposed to serve public traffic and the other two are handling background jobs from BullMQ and Inngest.

They all communicate with Redis for caching/bullmq purposes.

Our DB is hosted on Supabase.

The following screenshots are filtered logs for only the starting logs.

What's strange is that within a service, these restarts are happening extremely close to each other (sometimes 3 restarts in less than a second?)

image.png

There also seems to not be any correlation between servers are crashing at the same time. Eg. Comparing the restart timings below between Server B (main jobs server, first image) and Server C (inngest jobs server, second image).

  • Even after extensive logging and error management, there are no error logs, even with listening to uncaught exceptions or unhandled rejections, no errors are logged to the console.

  • Attaching Sentry also doesn't show anything when crashes happen.

  • Crashes are most frequent for Server A - its memory peaks at only around 160-170 mb so I'm doubtful its a memory issue.

Questions:

  1. Is there a way we could verify that these crashes are happening because of the application and not because of Railway?

  2. Are there any container logs that we might not have access to that could give us clues?

  3. If anyone has any clues or guessed why this could be happening, that'd be really helpful too :')

Edit: I realised another post is facing this exact issue https://station.railway.com/questions/unexplained-node-nest-js-backend-restarts-883e4b29

They phrased their asks alot better than I could - I'm including them below specific to my current issue:

What I’m asking Railway for

  • For the restarts around these timestamps for Server A (example): 12/01/2025, 6:13:35 AM and 12/01/2025, 4:57:42 AM

on service backend / deployment IDs:

  • please provide the exact exit code and signal for the Node process (e.g. exit 1, exit 137, SIGKILL, etc.).

  • Check your lower‑level logs for that container for any native/runtime errors, such as: OOM kills, node or V8 crashes, Prisma/native client panics or segmentation faults, health‑check failures that cause you to terminate the process.

  • Confirm whether any resource limits or health checks are currently configured for this service that could be killing the process despite stable CPU/memory at the app level.

$10 Bounty

1 Replies

Railway
BOT

3 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


Loading...