4 months ago
We are experiencing critical instability on our MongoDB service in production.
The database becomes completely unreachable every 1–2 minutes, and no client can connect during these periods.
Connectivity impact:
- Application cannot connect
mongoshcannot connect (internal or public URL)mongodumpcannot run- TCP connections are reset (
ECONNRESET)
At the moment, there is no reliable way to access the database from anywhere.
Key observations
- Inside the MongoDB container, the mounted volume (
/data/db) is reported as 100% full:
df -h /data/db
/dev/zd22208 4.4G 4.4G 0 100% /data/db - In contrast, the Railway UI reports the same volume (
industrious-volume) as using only ~300–500 MB. - MongoDB repeatedly crashes or restarts, but crash logs are not visible:
- No relevant crash logs appear in Railway logs
- No clear shutdown or panic messages are exposed
- This makes it difficult to diagnose the exact failure point while the service continues to flap.
This suggests a filesystem-level disk exhaustion not properly reflected in Railway’s volume metrics, potentially combined with missing or inaccessible runtime logs.
Impact
- 🚨 Production database completely inaccessible
- 🚨 Application downtime
- 🚨 No backups possible
- 🚨 No visibility into crash root cause (logs missing)
- 🚨 Risk of data corruption if instability continues
Environment
- Service: MongoDB
- Environment: Production
- Volume:
industrious-volume - MongoDB version:
7.0.x - Region:
europe-west4
Request
We need urgent help to:
- Restore stable access to the MongoDB database
- Understand why the volume appears full inside the container but not in Railway metrics
- Investigate why MongoDB crash/restart logs are not visible
- Ensure the database can be stabilized without data loss
This issue is production-blocking and critical.
Pinned Solution
4 months ago
Have you tried to increase the volume size in Volume > Settings > Volume Size ?
1 Replies
4 months ago
Have you tried to increase the volume size in Volume > Settings > Volume Size ?
Status changed to Solved brody • 4 months ago