16 days ago
Service: MongoDB (standalone)
Service ID: 8c63e7ef-a2f3-49d2-a699-0dcf9326b2f8
Summary
Our MongoDB service has been in a continuous OOM crash loop since your platform incident. The service restarts every ~60 seconds and never stabilizes, making it impossible to connect or extract data.
What we are seeing in the logs
- MongoDB starts up and immediately logs: "Memory available to mongo process is less than total system memory"
- Every boot shows: "Detected unclean shutdown - Lock file is not empty"
- The service has restarted at least 25+ times across the following windows:
- 07:23–07:27 UTC
- 16:03–16:08 UTC
- 18:39–18:58 UTC
- 21:58–22:03 UTC (5 restarts in 5 minutes, every ~60 seconds)
- 5 minutes ago
- No errors in the logs only info-level messages suggesting the process is being killed externally by the container runtime (OOM killer)
Worked fine without issues before the incident.
Thank you
4 Replies
16 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 16 days ago
16 days ago
The clue is that MongoDB only logs normal/info messages, then disappears and Railway restarts it. That often means the container runtime killed the process from the outside because it hit the memory limit.
A few things I would try, in order:
- Check the MongoDB service Metrics tab and look at memory right before each restart. If it spikes to the limit, that confirms OOM.
- If possible, temporarily increase memory/resources so MongoDB has enough room to finish recovering from the unclean shutdown.
- Add an explicit WiredTiger cache cap before redeploying. For a small Railway Mongo service, something conservative like this is a reasonable first test:
mongod --bind_ip_all --wiredTigerCacheSizeGB 0.5
If the service has very little RAM, try 0.25 instead.
16 days ago
This strongly looks like a recovery-phase OOM loop after the platform incident.
The repeated:
“Detected unclean shutdown - Lock file is not empty”
suggests MongoDB is trying to recover WiredTiger state on every boot, but the container is likely getting killed before recovery can complete.
The previous recommendation about reducing WiredTiger cache size is a good approach. I would also try:
- temporarily scaling RAM higher for one recovery cycle
- checking service metrics for memory spikes immediately before restart
- ensuring no aggressive memory limits were re-applied after the outage
- allowing the instance enough uninterrupted time to complete recovery
Since the logs stop without a MongoDB fatal error, this really does look more like an external container/runtime kill than an internal Mongo crash.
16 days ago
I would avoid wiping the volume for now. From the logs you posted, this looks more like MongoDB is getting killed during startup/recovery rather than the data being completely lost.
The important part is this:
Memory available to mongo process is less than total system memorycombined with the fact that the service restarts every ~60 seconds and there are no real MongoDB error messages before it dies. That usually points to the container being killed externally, likely by OOM, before MongoDB has enough time/memory to finish recovering from the unclean shutdown.
The lock file message is probably a symptom of the repeated forced shutdowns:
Detected unclean shutdown - Lock file is not emptyI would try these in order:
- Do not wipe the volume.
- Temporarily increase the service memory/resources if possible.
- Let MongoDB stay up long enough to finish WiredTiger recovery.
- If it starts successfully, immediately run a dump/export.
- If it still crashes, clone/snapshot the volume first, then try recovery on the cloned volume.
- Only use
mongod --repairas a last resort, and only after making a snapshot/backup, because repair can modify data.
If Railway allows you to change the MongoDB startup command, you can also try limiting WiredTiger cache size so MongoDB does not consume too much memory during startup, for example:
mongod --wiredTigerCacheSizeGB 0.25or a higher value if your service has more memory available.
The goal is to get MongoDB stable just long enough to complete recovery and export the data. Wiping the volume should be the very last option, not the first recommendation.
16 days ago
So first of all thank you for the replays but that did not work out. It caped at 0.25 even it was set to 0.5 there was enough ram left to go higher. Usually it took 0.6 and worked.
Got it now fixed, not sure how.
- I restarted it and it lasted 6 minutes, without startup commands etc.
- Tested it again, and after 2 more restarts they still lasted 6 minutes
- I completed the dump
- I created a new mongodb and put my dump into that one and that worked.
After the app worked I looked into the still crashing mongodb. I migrated the mongodb from EU-West to US-East and without
any further do of myself, the db is working and not crashing anymore.
Pretty strange behavior.
Thx
Status changed to Solved Railway • 16 days ago