Service crashed post-incident, container stuck at "Mounting volume", no backups

theodorecooley-cyber

HOBBYOP

a month ago

Project: trash-talk (production)

Service: trash-talk

Volume: vol_j058xxhgwzhhs2kd

After the May 19 build queue incident, my service has been unable to restart. Every deploy attempt (including rollback to known-good images) hangs at "Mounting volume" with no subsequent logs. The container never starts, so no application errors are surfaced.

I'm on the hobby plan and have no volume backups. The volume contains my production SQLite database with user accounts and content for a live mobile app. Can you investigate whether the volume is recoverable, and if not, whether the underlying data can be exported?

Solved

3 Replies

Status changed to Awaiting Railway Response Railway • about 1 month ago

sam-a

EMPLOYEE

a month ago

It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Feel free to respond if your question has not been addressed.

Status changed to Awaiting User Response Railway • about 1 month ago

sam-a

Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. [We've published a post-mortem if you'd like more information on the incident](https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage). It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you. It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process. You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c Feel free to respond if your question has not been addressed.

theodorecooley-cyber

HOBBYOP

a month ago

Thanks for the update. The bulk-response advice (redeploy to fix) does

not apply to my situation, and I want to flag this case specifically

because it involves data loss risk, not just downtime.

Recap of my case (project trash-talk, service trash-talk, volume

vol_j058xxhgwzhhs2kd):

The service is stuck at "Mounting volume" on every deploy attempt

since the May 19 incident — including rollbacks to known-good images.
I have already tried multiple redeploys. They all hang at the same

mount step. Further redeploys will not change the outcome and are

just adding to the queue.
This is the production SQLite database for a live mobile app with

25 live users. I'm on the Hobby plan

with no volume snapshots.

Specific questions I'd like Railway support to answer, in order of

preference:

Can the volume vol_j058xxhgwzhhs2kd be inspected on the

underlying storage host to confirm whether the SQLite file

(trashtalk.db) and its WAL/SHM siblings are intact?
Can the volume be detached from the current service and

reattached to a fresh service (different deploy, different image)

to bypass whatever is causing the mount hang on this one?
Does Railway take any platform-level snapshots of Hobby-tier

volumes (even short-retention ones) as part of normal operations

that could be restored from?
If the volume itself is unrecoverable, can the raw file contents

be exported by your storage team so I can extract the DB out-of-band?
If none of the above are possible, please confirm in writing so I

can move to disaster-recovery planning.

I am happy to share read-only access to the project or any other

diagnostics that help. Please escalate this if it falls outside the

generic redeploy guidance — every hour of delay compounds the user-

trust impact.

Thanks.

Status changed to Awaiting Railway Response Railway • about 1 month ago

mykal

EMPLOYEE

a month ago

Hey, apologies for the disruption.I've redeployed your service and you should be good now.

Following yesterday's outage, we worked through a heavy backlog of queued builds and deployments today. We also identified and rolled out a fix for an issue causing some builders to run out of storage causing new deployments to fail. The combination of these issues led to slow and unreliable deploys. In your case, it was a mixture of the build queue and issues with our builder.

Both issues are now resolved: the queue has fully drained and builds and deployments are processing normally across all plans. If you have a build that's still stuck or failing, trigger a redeploy from the dashboard or CLI and it should go through.

More context on the recovery effort here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Thanks for your patience throughout this. Let us know if anything still looks off on your end.

Status changed to Awaiting User Response Railway • about 1 month ago

Status changed to Solved mykal • about 1 month ago

Welcome!