22 days ago
Project: kolbertai-librechat-new
Project ID: 82d941c6-d0e9-4f6d-944b-b149fa174684
Service: MongoDB (ID: 463ce54c-654e-4b03-96a0-02e8aa94a6f9)
Environment: production
Region: europe-west4-drams3a
Hi Railway team,
My MongoDB service is being hard-killed by the platform every 30-90
seconds with no graceful shutdown. The mongod process logs NO shutdown
event — no SIGTERM, no SIGINT, no "received signal". The next thing in
the logs is a brand-new container starting up with a new hostname,
mounting the same volume.
Example from today (UTC):
17:08:13 mongod startup complete (container: previous host)
17:08:27 connection accepted from LibreChat (Mongoose)
17:08:37 another connection accepted
↓ ~35 seconds, no shutdown log AT ALL
17:09:12 "Mounting volume on: /var/lib/containers/railwayapp/bind-mounts/..."
New MongoDB starting, host: f26a59bc81ec (NEW container ID)Each restart logs "Detected unclean shutdown - Lock file is not empty"
because the previous mongod was SIGKILL-ed without cleanup.
Resource utilization (Pro plan):
- CPU: 0 vCPU avg, max 0.01 vCPU (limit 32 vCPU)
- Memory: 187 MB avg, 369 MB max (limit 32 GB)
- Disk: 2.08 GB used / 48.8 GB volume
- Network: 0 MB public traffic
So this is NOT OOM, NOT CPU starvation, NOT disk full.
This service was stable for ~6 weeks (from 2026-04-07). The problem
started around 2026-05-20 and has gotten progressively worse — initially
hours between crashes, now under a minute. Restarting via `railway
redeploy` only buys 1-5 minutes before the next platform-level kill.
Mongo version: 8.2.9 (mongo:latest)
Start command: docker-entrypoint.sh mongod --ipv6 --bind_ip ::,0.0.0.0
Healthcheck: none configured
Restart policy: ON_FAILURE, maxRetries: 10
Single instance, single region (europe-west4-drams3a)
Could you investigate what's killing the container at the platform
level? Is there a host rebalancing operation, volume migration, or
health probe causing this? The LibreChat app depends on this database
and is currently unstable as a result.
Volume bind-mount path I see in logs:
/var/lib/containers/railwayapp/bind-mounts/7087d3d1-350f-49ac-9b93-f034df2154ca/vol_4v9qoj7y0ncoix7r
Thanks!
2 Replies
Status changed to Awaiting Railway Response Railway • 22 days ago
21 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 21 days ago
21 days ago
I would separate two things: what Railway can confirm internally, and what you can rule out from the service side.
Because this service has no healthcheck configured and the restart policy is ON_FAILURE, this does not look like a normal app healthcheck loop. Railway should be restarting it only because the container process exits, or because something below mongod kills the container. The missing MongoDB shutdown line makes a SIGKILL-style stop plausible, but the runtime event should still have an exit code.
The first thing I would check is the exit code for the container that died between 17:08 and 17:09.
If it is 137, treat it as SIGKILL or a cgroup/platform-level kill, even if the service metrics do not show normal memory pressure.
If it is 143, it received SIGTERM somewhere.
If it is 1, 2, or 100, then mongod is exiting itself and the useful log line is probably earlier in the MongoDB output.
Two service-side changes are worth making while Railway checks the host/container event:
-
Stop running the database on
mongo:latest. Pin the image to the exact MongoDB version you want, ideally the version that was running during the stable period. Docker Hub showslibrary/mongo:latestwas updated recently, and your timeline starts after that, so pinning removes one big variable for a database service. -
Take a volume backup or snapshot before repeated unclean shutdowns continue. The
Detected unclean shutdownmessage is recoverable until it is not, so I would protect the data first and debug second.
I would also run one controlled redeploy with the simplest MongoDB command, without the custom IPv6 bind flags, using the official image's normal bind behavior or --bind_ip_all. If that stabilizes it, the custom command or IPv6 path is involved. If it still dies with the same exit code and no MongoDB shutdown line, then the cause is probably below mongod and Railway needs to inspect the host/container runtime event for that exact service and time window.
10 days ago
Hi team,
Following up on my earlier ticket about the MongoDB service in
kolbertai-librechat-new being hard-killed without SIGTERM.
I've applied all the recommendations from your previous response:
-
✅ Pinned image to specific version (mongo:8.2.9 by tag, was mongo:latest)
-
✅ Simplified the start command: removed
--ipv6 --bind_ip ::,0.0.0.0and now using
docker-entrypoint.sh mongod --bind_ip_allper your suggestion -
✅ Bumped restart policy to ALWAYS / max 1000 retries
The image pin helped reduce the crash frequency (previously every
30-60 seconds, now intermittent over hours/days), but the issue is
NOT resolved. The container is STILL being hard-killed.
Latest evidence — TODAY 2026-06-02:
- 09:09 UTC: Mongo crashed (deployment 5db22f80) → manual redeploy
- 10:54 UTC: I redeployed (deployment 7df638ea, current)
- 10:55:01 UTC: mongod startup complete
- 10:56:00 UTC: ANOTHER unclean shutdown + restart, only 59 seconds later
The MongoDB process still logs "Detected unclean shutdown - Lock file
is not empty" on every startup, meaning the previous mongod was SIGKILL'd
without graceful shutdown. No "received signal" or "shutting down" lines
appear in the MongoDB logs before each kill — this is conclusive evidence
that the kill comes from the platform, not from mongod itself.
Project / service details:
-
Project: kolbertai-librechat-new (id: 82d941c6-d0e9-4f6d-944b-b149fa174684)
-
Service: MongoDB (id: 463ce54c-654e-4b03-96a0-02e8aa94a6f9)
-
Environment: production (id: dc28c69b-5f26-48c5-abff-b97d80ecb21b)
-
Region: europe-west4-drams3a
-
Current deployment: 7df638ea-4606-4f0d-bdef-f04163380c64
-
Plan: Pro
-
Resources: CPU 0%, RAM 200-369 MB (limit 32 GB), Disk 2 GB / 48 GB
— NO resource pressure
Please look up the container/host events for deployment IDs:
- 7df638ea (today 10:54 UTC, the current one)
- 5db22f80 (today 09:09 UTC)
- db8705e5 (2026-05-25 12:10 UTC)
- de13b2d7 (2026-05-23 21:40 UTC — was stable ~34h then crashed)
The exit code of those killed containers would tell us:
- 137 = SIGKILL from cgroup / platform
- 143 = SIGTERM
- 1/2/100 = mongod self-exit
If it's 137 with no resource pressure on our side, the kill is coming
from your platform's host management (rebalancing, migrations, or
similar). We need a stable host placement or at least an explanation
of what's triggering these kills so we can adapt.
LibreChat (the app depending on this DB) is in production use; every
Mongo crash takes the chat down for several minutes until I manually
redeploy.
Thanks!