Production app down 8h+, Postgres stuck, needs manual Railway intervention + credit request
marieamelies
HOBBYOP

a month ago

Project: fortunate-joy / Environment: production / Plan: Hobby (paying)

My production marketplace (Glowp) has been completely DOWN since the

May 19 outage (started 22:29 UTC). 8+ hours later it is still offline.

Railway's OWN deployment Diagnosis confirms this needs your team:

"The Postgres service is stuck and needs Railway support to resolve.

The CREATE_CONTAINER step has been pending across multiple consecutive

deployments, meaning the container never starts. A volume migration to

the europe-west4-drams3a region appears to be involved in the

scheduling failure."

Current state:

  • Postgres: deployment fails repeatedly, CREATE_CONTAINER stuck,

    container never starts.

  • Volume 3a73e3f6-2ba4-4a8a-a0ae-5325ddf5f3d5 (mounted at

    /var/lib/postgresql/data) holds ALL my production data. It MUST stay

    attached — do NOT migrate, wipe, detach or re-initialize it.

  • Glowp service: crash-looping (P1001 cannot reach

    postgres.railway.internal) because Postgres is down.

  • Both services correctly configured for europe-west4-drams3a. US East

    deployments were cancelled.

What I need:

  1. Manual intervention to unstick the Postgres CREATE_CONTAINER

    scheduling failure in europe-west4-drams3a, with the existing

    volume attached, so my database comes back online with all data

    intact.

  2. Full credit refund for the Railway Agent usage I was forced to

    consume during this incident — I only used the Agent because your

    outage broke the normal recovery path. Over $3 of my $5 credit

    consumed.

  3. A service credit for the production downtime, per standard practice

    after a major incident.

This is a live marketplace with real sellers and buyers locked out

right now. Please prioritize.

Thank you.

Solved

4 Replies

Status changed to Awaiting Railway Response Railway about 1 month ago


a month ago

Thanks for reaching out. We sincerely apologize for the service disruption.

We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered.

For all customers, we’ll publish a detailed postmortem outlining what happened and the steps we’re taking to prevent similar incidents in the future. For Enterprise customers, service credits are covered under our SLA and will be reviewed as part of our post-incident process.


Status changed to Awaiting User Response Railway about 1 month ago


Anonymous
PRO

a month ago

Looks like they aren't even giving credits for this extended downtime. If you are using railway for business I would suggest you look elsewhere.


Status changed to Awaiting Railway Response Railway about 1 month ago


Status changed to Awaiting User Response brody about 1 month ago


brody

Thanks for reaching out. We sincerely apologize for the service disruption. We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered. For all customers, we’ll publish a detailed postmortem outlining what happened and the steps we’re taking to prevent similar incidents in the future. For Enterprise customers, service credits are covered under our SLA and will be reviewed as part of our post-incident process.

marieamelies
HOBBYOP

a month ago

Thanks for confirming recovery — my app is back online.

I understand SLA service credits are reserved for Enterprise plans.

However, I'm not raising an SLA claim. I'm raising a billing-fairness

issue:

During the incident, your own dashboard's recovery path (Restart) was

broken, and I was pushed to use the billable Railway Agent to recover

my production database. It consumed over $3 of my $5 monthly credit.

The Agent also nearly triggered a destructive cross-region migration

that would have wiped my production volume — I had to manually catch

and reverse it.

I would never have spent that credit if your infrastructure had been

working. Charging me for it is not fair.

I'm formally requesting a refund of the Agent usage consumed during

this incident (May 19-20 outage). This is independent of any SLA

discussion.

Separately — while I understand downtime credits are Enterprise-only,

I'd genuinely appreciate any goodwill gesture for a 10+ hour

production outage on a paid account. I run a live marketplace and

this was a hard hit.

Please confirm the Agent refund. Thank you.


Status changed to Awaiting Railway Response Railway about 1 month ago


a month ago

You can request a refund by following the docs at https://docs.railway.com/reference/pricing/refunds#requesting-a-refund


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway about 1 month ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...