Systems offline
it-integral-solutions
HOBBYOP

a month ago

Hello Railway Support,

We are experiencing a critical production outage affecting our SaaS platform hosted on Railway.

Project: TASS Pro

Region: US East

Current symptoms:

  • Deployments stuck in QUEUED state for hours
  • "Limited Access – Deploys have been paused temporarily"
  • Extremely slow or blocked builds
  • Intermittent 404 / 502 / 524 responses
  • Production web service unable to recover after deployment queue blockage
  • Backup jobs also affected/stuck
  • OAuth/CLI instability observed after the incident

Important context:

  • PostgreSQL is currently online
  • Some secondary services recovered
  • Main production web service cannot deploy due to Railway deployment throttling/queue state
  • We already identified and fixed an application-side issue related to SECRET_KEY hardening validation
  • A new deployment containing the fix is queued but Railway is not processing it

Impact:

Our CRM is currently offline for multiple travel agencies in production, preventing daily operations.

We need confirmation on:

  1. Current deployment queue status for our project
  2. Whether deploy execution is globally paused or region-specific
  3. Estimated recovery timeline
  4. Whether manual intervention/restart is possible for queued deployments

Attached screenshot shows:

  • Limited Access banner
  • Deploy stuck in processing
  • Queued services despite healthy Postgres

This is a production-critical incident.

Thank you.

Solved

1 Replies

Status changed to Awaiting Railway Response Railway about 1 month ago


sam-a
EMPLOYEE

a month ago

Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.

It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Feel free to respond if your question has not been addressed.


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 28 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...