MongoDB Volumes in Degraded State
derrick-dacosta
PROOP

a month ago

In our development instance of MongoDB, which is barely using any vCPU or Memory at all, we are seeing very poor performance as well as just odd behavior when trying to connect to the instance in Railway to troubleshoot. Our app keeps timing out trying to connect to it and we MongoDB as part of our login/logout workflow.

Screenshot 2026-05-28 at 12.11.29 AM.png

Screenshot 2026-05-28 at 12.12.49 AM.png

Screenshot 2026-05-28 at 12.13.01 AM.png

.

$20 Bounty

3 Replies

Railway
BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 28 days ago


dev-charles254
PRO

a month ago

Hi Derrick,

The 19.77-second Write Latency shown in your second screenshot is the main reason your app is timing out. Even though your CPU and memory usage are low, your database is completely choked on disk operations.

Here are the specific issues and how to address them:

1. Disk I/O Throttling (The "Degraded State" Cause)

  • The Issue: A 19s write latency almost always means the underlying volume has run out of IOPS (Input/Output Operations Per Second) or throughput credits, causing severe throttling.
    • The Fix: Check your Railway project metrics for Volume Disk I/O. If you are hitting the plan limits, you will need to upgrade your volume size/tier to get higher IOPS capacity. Alternatively, check if your app has an accidental infinite loop performing rapid writes.

2. Dashboard Connection Error

  • The Issue: The error Invalid database name: [object Object] in the UI suggests a configuration bug. The Railway dashboard is trying to read your database name variable but is receiving a Javascript object instead.
    • The Fix: Check your Variables tab. Ensure your MongoDB connection string (MONGODB_URL or similar) is properly formatted as a string and explicitly includes the database name at the end (e.g., ...net/your_db_name?retryWrites...).

Next Steps to Try Now:

  1. Redeploy/Restart the Service: Go to your MongoDB service settings and click Restart. If the volume is stuck in a bad physical host state, a restart can force it to remount cleanly.
  2. Check WiredTiger Cache: If the restart doesn't fix it, the slow disk is preventing MongoDB's WiredTiger engine from evicting data from memory, causing the connection drop.

dev-charles254

Hi Derrick, The 19.77-second Write Latency shown in your second screenshot is the main reason your app is timing out. Even though your CPU and memory usage are low, your database is completely choked on disk operations. Here are the specific issues and how to address them: ## 1. Disk I/O Throttling (The "Degraded State" Cause) * The Issue: A 19s write latency almost always means the underlying volume has run out of IOPS (Input/Output Operations Per Second) or throughput credits, causing severe throttling. * * The Fix: Check your Railway project metrics for Volume Disk I/O. If you are hitting the plan limits, you will need to upgrade your volume size/tier to get higher IOPS capacity. Alternatively, check if your app has an accidental infinite loop performing rapid writes. ## 2. Dashboard Connection Error * The Issue: The error Invalid database name: [object Object] in the UI suggests a configuration bug. The Railway dashboard is trying to read your database name variable but is receiving a Javascript object instead. * * The Fix: Check your Variables tab. Ensure your MongoDB connection string (MONGODB_URL or similar) is properly formatted as a string and explicitly includes the database name at the end (e.g., ...net/your_db_name?retryWrites...). ## Next Steps to Try Now: 1. Redeploy/Restart the Service: Go to your MongoDB service settings and click Restart. If the volume is stuck in a bad physical host state, a restart can force it to remount cleanly. 2. Check WiredTiger Cache: If the restart doesn't fix it, the slow disk is preventing MongoDB's WiredTiger engine from evicting data from memory, causing the connection drop.

derrick-dacosta
PROOP

a month ago

Hi dev-charles254 thank you for taking the time to troubleshoot this for me, unfortunately I have checked all your suggestions (expect the Volume Disk I/O Metric as I can't find that anywhere in the Dashboard) and nothing is showing as the smoking gun. If my assumption is correct and Railway uses K8s for their orchestration layer, I personally think that the Physical Disk is corrupted that contains the Mounted Volume that is attached to the MongoDB Pod. And if Railway had/has data replication via OpenEBS, Rook-Ceph, Longhorn, etc that is probably broken too.

My workaround and probably the endgame solution is that we moved our MongoDB deployment to MongoDB Atlas, and everything is working again.


dev-charles254
PRO

a month ago

Hi Derrick,

That makes total sense. Before migrating, you can easily spin up a completely fresh MongoDB service in your Railway project to test this, use the templates. This will provision a brand-new persistent volume on a different host node.

Just update your app's environment variables to point to the new database string and see if the write timeouts disappear. If they do, it completely confirms the original physical disk or orchestrator mount was corrupted.

Let me know if the fresh instance clears up the connection issues!


Welcome!

Sign in to your Railway account to join the conversation.

Loading...