Postgres database container being terminated unexpectedly - causing cascading API restarts
bojidartonev
PROOP

a month ago

Hi Railway team,

We're experiencing unexpected restarts of our Postgres database container (created via Railway's Postgres template), which then causes our API service to restart as well.

## The Problem

Our Postgres database container is being terminated without warning, causing:

1. Database logs showing "database system was interrupted" and "database system was not properly shut down; automatic recovery in progress"

2. Our API receiving SIGTERM shortly after (we see our NestJS shutdown hooks being triggered)

3. This happens randomly, not during deploys or config changes

## Evidence

Database logs show abrupt termination (not graceful shutdown):
database system was interrupted; last known up at [timestamp]

database system was not properly shut down; automatic recovery in progress

API logs confirm it receives SIGTERM from Railway:
stop_sign emoji Shutdown initiated...

Database disconnect timed out after 5000ms

(The timeout proves the DB was already dead when API tried to disconnect)

## What we've ruled out

1. Not a code issue - Our API only exits on SIGTERM/SIGINT signals or uncaught exceptions. We don't see "UNCAUGHT EXCEPTION" in logs.

2. Not resource pressure - Database metrics show low CPU, stable memory, minimal network traffic at time of restarts.

3. Not health check failures - Our API health check at /health always returns 200 (pure liveness check).

4. We're on Pro plan - So this isn't the Hobby plan sleep/inactivity feature.

## Our setup

- Database: Railway Postgres template (managed)

- Plan: Pro

- Region: Amsteradm, EU West

- Services: API (NestJS), Orchestrator (NestJS), Web (React Vite)

## Questions

1. Why is Railway terminating our Postgres container unexpectedly?

2. Is there a way to see what triggered the container restart (platform logs, events)?

3. Is there a health check or liveness probe configured for the Postgres template that we're not aware of?

4. Could this be infrastructure migration or maintenance happening silently?

5. Is there a way to get notifications before Railway restarts our database?

## What would help us

- Access to platform-level logs showing WHY the container was terminated

- Information about any automated health checks Railway runs on Postgres templates

- Any maintenance schedules or infrastructure events around the times of our restarts
- General data regarding our issue, since the services keep failing and we are in production - couldn't find any code issues and the serivces keep stopping due to external behaviour - this causes cascade issues to our infrastructure.

Thank you for investigating this!

$20 Bounty

3 Replies

a month ago

Hello,

I have checked, and we have not terminated or restarted your container on our end, so I will open this up to the community so they may help you debug what on your end caused that.


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway about 1 month ago


kezcodes123
HOBBY

a month ago

Do you have enable serverless enabled in settings ?. If So turn it off and try again


hyper674
PRO

a month ago

Go to your Postgres service settings and make sure you don't have any restart policies or serverless settings enabled that would kill it during low activity.

Possible causes:

OOM killer - Even if memory looks stable in metrics, a sudden spike can cause Linux to kill Postgres. Check if there's a brief spike right before the crash. Postgres logs would show "server process was terminated by signal 9" if this happened.

Disk space - Postgres crashes hard when the volume fills up. Check your volume usage in the Volume tab. If you're close to full, that's probably it.

Connection exhaustion - If you hit max_connections, Postgres can behave weirdly. Check your connection pool settings in the API - you might be leaking connections that build up over time.

WAL/checkpoint issues - If checkpoints are taking too long or WAL files are piling up, Postgres can crash. Check max_wal_size and checkpoint_timeout settings.

Actual errors to look for in Postgres logs:

  • "PANIC" or "FATAL" messages

  • "terminating connection due to administrator command"

  • Signal 9 (OOM killer)

  • Anything about corruption or filesystem errors

What do the Postgres logs say in the 30 seconds before "database system was interrupted"? That's where the answer is.


Loading...