Road to recovery: Post GCP Outage/Builds

2 months ago

Hello all, we have hundreds of threads the Support team is replying to, we figure it's better to get something out until we can get you a direct reply. Forgive us, we are working as fast as we can.

Where we are now

Most services have been automatically restored. If yours has not, a manual redeploy from the dashboard or CLI will usually bring it back. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

This is the current issue that we are dealing with now as well as co-morbidities with applications waking up after we got the control plane online again, and hosts on GCP. (For the post-mortem, you can read it here: https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage)

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are awaiting reply, please check whether any of the situations below match yours. These are the most common cases we are seeing, and most can be resolved without a back-and-forth from the Support team.

F.A.Q.

Q: Help where are my Trial credits?

A: If your dashboard shows "Trial maxed out," a $0 balance, or your Usage page will not load: this is a display issue from the metrics pipeline still recovering. Your real credit balance and trial period are intact, no credits were consumed during the outage, and the dashboard will self-correct as we stabilize.

Q: Why is my Postgres DB still offline?

If your Postgres (or other) service is stuck in a crash loop with catatonit: failed to exec pid1: No such file or directory: try one redeploy first. If it does not recover within a few minutes, reply with your project ID and service ID. The volume may need to be moved to a healthy node.

Q: Why isn't my application deploying on GitHub?

If GitHub auto-deploys stopped firing after the outage: webhooks sent during the incident window (02:25 to 07:57 UTC on May 20) were not received, and GitHub does not retry them. Push any new commit (git commit --allow-empty -m "redeploy" works) or click Redeploy in the dashboard, and auto-deploys will resume normally.

If your projects appear missing from your dashboard: check the workspace switcher in the top-left. The outage caused some users to land in a freshly-created empty workspace; your projects should still be in your original one.

Q: My customers were impacted, what are the plans for compensation?

On compensation: outage credits are part of Enterprise SLAs and are not offered on Hobby or Pro plans. We know that is disappointing given the impact, and we are sorry. The postmortem covers what we are doing to prevent a recurrence.

If none of the above match, or if you have tried the recommended action and your service is still down, please reply to this thread with your project ID, service ID, and what you are seeing. We will dig in.

Thanks for your patience, and again, we are sorry for the disruption.

120 Replies

nipdog001

PRO

2 months ago

my error isn't any of these, i am getting an application failed to respond( 502 Bad Gateway), i have a staging and production environment and the staging is up and running as expected, production is not . I redeployed, restarted and still getting the same error. i have cleared my cache and did a hard refresh. No errors in my deploy logs and my git hub can still push out changes . Any other things i can check or just wait for the all clear status from you guys ?

knarx

PRO

2 months ago

Postgres crash loop here, plus a more concerning issue beyond the standard scenario.

Project ID: 2497f3fe-fd67-4c1f-b8fb-c872f62c98f9

Service ID: pgvector (service in project trustworthy-forgiveness)

Volume currently attached: vol_jwwp0ua41m4nkdce

Standard crash loop symptoms (initdb against non-empty PGDATA on every restart), but diagnostic SSH into the volume shows the data on disk is from this service's initial provisioning on 2026-02-23 — three months of production writes are missing. The 97 MB backup may or may not contain my real data; the dashboard restore option only lets me restore in-place to the broken service, not to a fresh one for safe inspection.

Full technical detail and diagnostic findings in my existing thread:

https://station.railway.com/questions/postgres-crash-loop-2338ad96

Volume relocation as you mentioned would help with the crash loop, but I also need to know whether you have access to a snapshot or recovery state of my actual production data from before the incident. Happy to provide anything else useful.

kaladinlight

PRO

2 months ago

My services were restored, but no subsequent deployments are working essentially freezing the state of all services. I have multiple services stuck in Initializing for the past several hours....

sewmer

PRO

2 months ago

Vocês se precaveram de não reembolsar prejuízos catastróficos, mas não se precaveram de um plano B para isso? lamentável a confiança esta sendo perdida.

Anonymous

PRO

2 months ago

▎ Matches the Postgres catatonit: failed to exec pid1 case from your FAQ.

▎

▎ Project ID: b9568529-d4e1-4c85-8260-953a98c2b343

▎ Service ID: 1bd97dc8-17db-4452-b08f-d10cc5644edf (Postgres)

▎ Environment ID: 470a95a9-4e24-438f-b5e0-869887e92c6d ▎

▎ Crash-looping since the outage window (~02:25 UTC May 20). Tried multiple redeploys via dashboard, no recovery — service immediately exits with catatonit: failed to exec pid1: No such file or directory after each volume mount. ▎

▎ Per your FAQ this likely needs the volume moved to a healthy node. Volume contains live customer data — please don't reprovision empty.

▎

▎ Thanks.

ducitymp

HOBBY

2 months ago

Postgres crash loop here, plus a more concerning issue beyond the standard scenario.

Project ID: 44956dd6-9463-4d94-afcf-d77c03ebfda8

Service ID: c580447b-4060-4931-9b3a-6a2a69d9dfac

Volume currently attached: 96104140-7ed8-488f-ae22-551d385cf2b2

nipdog001

vitinx

HOBBY

2 months ago

I'm having the same problem. I have a production application and my clients can't access it; it returns a 502 Bad Gateway error. I don't know what to do.

kristof234567

HOBBY

2 months ago

I still can't deploy it, even though I want to do a new build. I see these error texts: 1.) Builds are slow to progress We have pushed a fix and are now monitoring the incident.

2.) Limited Access

Deploys have been paused temporarily

kaladinlight

My services were restored, but no subsequent deployments are working essentially freezing the state of all services. I have multiple services stuck in Initializing for the past several hours....

popeye1958

PRO

2 months ago

Im in the same boat after 12 hours now

krystianf

HOBBY

2 months ago

Project ID: 7e7dcdb2-7e66-46b4-81fc-26369e8e307d

Service ID: 10175118-324e-444b-aa2e-931963e030ba

Limited Access: Deploys have been paused temporarily

Anonymous

HOBBY

2 months ago

I have hobby plan and I got this "Deploys have been paused temporarily" why?

this is my Project ID: 41c4676a-c09e-4278-bdd2-3d5a6f23add0

I have hobby plan and I got this "Deploys have been paused temporarily" why? this is my Project ID: 41c4676a-c09e-4278-bdd2-3d5a6f23add0

davidjeppesen

HOBBY

2 months ago

I also have this problem. When can I expect to be able to deploy again?

samvoyagelink

HOBBY

2 months ago

If I upgrade to PRO will I be able to deploy new builds? We have a critical need and are considering upgrading, but not sure if it's worth it if PRO level will not let us deploy new builds. Do we know when hobby new builds will be allowed?

jbecher444

PRO

2 months ago

Same symptoms here on Pro — web service: project c6772164-8ef4-410a-812b-0cd527142449, service 6b14047b-6abc-47d1-9b87-73b6bb60b7d1. Postgres + Redis are healthy and reachable, deploys show Active/successful, but every request returns 502 in <20ms with X-Railway-Fallback: true. Tried redeploys, restarts, multiple startCommand variations — no luck. Private thread already filed with full detail. Thanks.

poncho172525

HOBBY

2 months ago

Wow, you have more than 12 hours with this issue? really? Im blocked on new deployments.

itreza7

PRO

2 months ago

My docker images have not been built and throw errors, as an exampl

Build Failed: build daemon returned an error < failed to solve: ResourceExhausted: failed to read dockerfile: failed to prepare as wgsgybnscmz48o1qe57ifegdd: failed to create prepare snapshot dir: failed to create temp dir: mkdir /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/new-752968274: no space left on device >

solvix-tech

HOBBY

2 months ago

My Postgres is stuck in a crash loop with catatonit: failed to exec pid1. Tried multiple redeploys, no recovery.

Project ID: 0ae91dc6-42eb-4039-acdf-0240d1720849

alaskanloosemoose-dotcom

PRO

2 months ago

is this because of the issues ? internal

load build definition from backend/Dockerfile

0ms

Build Failed: build daemon returned an error < failed to solve: ResourceExhausted: failed to read dockerfile: failed to create lease: write /var/lib/buildkit/runc-overlayfs/containerdmeta.db: no space left on device >

tjallingvds

PRO

2 months ago

Project ID: 87ca913a-ba45-4c5c-b14b-5d17d22ae574

Service ID: 9acce407-2f44-45dd-9e5c-6ffa3bad1322 (Postgres)

Better hope my production data isn't wiped

millennialcpa

PRO

2 months ago

Project ID: 44bbf8e1-e95c-4ccc-a768-91ba78438299 (taxweave)

Service ID: 842a8653-de8b-4438-821d-f52d7c430dcc (taxweave-backend)

Environment: staging (576327b0-75b0-4af4-998d-134bf1f909f7)

Latest failed deployment: 75d2f762-97fa-4e91-98a6-b53da3db8871 (commit 5ffda9d, region us-east4-eqdc4a)

Seven consecutive builds in the last hour have failed with "No space left on device" on the assigned build host. Errors progress through the pipeline as the host clears space between attempts:

4 builds (5:41–5:53 PM ET) failed at mise/Python setup: mise ERROR No space left on device (os error 28) at /tmp/railpack/mise/cache/python/pyenv

1 build (6:11 PM) failed at pip install: OSError: [Errno 28] No space left on device

2 builds (6:21 PM, 6:43 PM) failed at final image snapshot: ResourceExhausted: failed to create temp dir: mkdir /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/...: no space left on device

Same error class throughout, same commit. Looks like a build host needs garbage collection or rescheduling onto a clean host.

tylerrr93

HOBBY

2 months ago

This is a little crazy that I just switched to a paid plan with you and I haven't been able to build or deploy in over 24 hours now. I understand the higher paying customers get priority but the other people paying for a service they aren't receiving with no compensation or credit to their plans is a little crazy.

millennialcpa

Project ID: 44bbf8e1-e95c-4ccc-a768-91ba78438299 (taxweave) Service ID: 842a8653-de8b-4438-821d-f52d7c430dcc (taxweave-backend) Environment: staging (576327b0-75b0-4af4-998d-134bf1f909f7) Latest failed deployment: 75d2f762-97fa-4e91-98a6-b53da3db8871 (commit 5ffda9d, region us-east4-eqdc4a) Seven consecutive builds in the last hour have failed with "No space left on device" on the assigned build host. Errors progress through the pipeline as the host clears space between attempts: 4 builds (5:41–5:53 PM ET) failed at mise/Python setup: mise ERROR No space left on device (os error 28) at /tmp/railpack/mise/cache/python/pyenv 1 build (6:11 PM) failed at pip install: OSError: [Errno 28] No space left on device 2 builds (6:21 PM, 6:43 PM) failed at final image snapshot: ResourceExhausted: failed to create temp dir: mkdir /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/...: no space left on device Same error class throughout, same commit. Looks like a build host needs garbage collection or rescheduling onto a clean host.

alaskanloosemoose-dotcom

PRO

2 months ago

me too. i guess its just wait . wasnt sure if it was on my end or not

jlaraep-nc

PRO

2 months ago

Hi team — Pro plan customer here, none of the F.A.Q. cases match

exactly. Dropping our info as requested.

Project ID: 05650ba0-6627-4081-aa05-d5cd4b1596d1

Service ID: b777d658-ca20-4471-bce6-738ccbb57beb (the API)

Environment ID: 7c95dafd-a744-4bf1-872c-3a5354ab6e84

Latest failed

deployment ID: 5efe84b6-4da7-4295-bf6d-eb261686f4cb

Plan: Pro

Region: us-west1

We've attempted 5 manual redeploys over the last ~2 hours

(including after the 18:06 UTC and 18:27 UTC status updates).

Every single one is scheduled on builder

"production-builderv3-us-west1-gblf" and fails in 00:02s during

"load build definition from Dockerfile". The error has actually

gotten worse over the session:

Earlier attempts (~20:35 local / 18:35 UTC):

failed to create prepare snapshot dir: failed to create temp

dir: mkdir /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/

new-2506645505: no space left on device

Latest attempt (~20:49 local / 18:49 UTC):

failed to create lease: write

/var/lib/buildkit/runc-overlayfs/containerdmeta.db:

no space left on device

i.e. the builder's own metadata DB can no longer be written. The

host is fully out of disk and not recovering on its own across

redeploys.

Could you either reclaim disk on

production-builderv3-us-west1-gblf or route our service to a

healthy us-west1 builder? Our runtime service is fine (last

deployed revision still serves /health → 200), we just cannot

ship any new builds.

Blocking pre-launch hotfixes. Thank you.

gargi1709

HOBBY

2 months ago

Any ETA for resuming Hobby Plan deployment? My changes are on Github and auto deploy is enabled but still it is not able to deploy it.

Anonymous

HOBBY

2 months ago

Hi, my Postgres service is stuck in a crash loop with catatonit: failed to exec pid1: No such file or directory. The other two services (frontend and backend) are up but my database is down. As a result my application cannot be accessed now in production environments. I am on the hobby plan and I have users in production. Can you please help me bring this back up ASAP.

Project ID: 9f7f2326-9b99-4160-afa6-7bbd4a4fc662
Service ID: 8c3ad0a6-6c1a-4e12-a2e6-c22909830feb
Environment ID: a1256b07-62e7-4801-a524-cf67ef88516b

itreza7

My docker images have not been built and throw errors, as an exampl Build Failed: build daemon returned an error < failed to solve: ResourceExhausted: failed to read dockerfile: failed to prepare as wgsgybnscmz48o1qe57ifegdd: failed to create prepare snapshot dir: failed to create temp dir: mkdir /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/new-752968274: no space left on device >

kytra-kartel

PRO

2 months ago

I'm having the same issue.

leomarquezd

HOBBY

2 months ago

I have the following Redis instance that it's not working:

Project: 92312c43-dd49-4cd5-8c8e-40b366fd0ad3

Service ID: 90b26469-fbf5-4b42-9cae-4fcc65ef22ad

maxsilverman

HOBBY

2 months ago

Postgres crash loop here:

Project ID: f56c0c0a-cef1-4d7b-a6db-3be9d442afc7

Service ID: 9bc3197c-aec2-48b5-b444-7464f4278306

lgontijojr

PRO

2 months ago

This is the worst customer experience ever! I just migrated all of my apps from Heroku to Railway thinking that it was a smart decision and now this... ALL of my apps were out for hours. I lost hundreds of signups. Not even a single email was sent out to customers to let us know that everything was down. And now we won't even get any sort of compensation? I am absolutely moving aways from this platform.

dazed

HOBBY

2 months ago

Although I understand that there are multiple issues you are trying to fix, I believe I speak for various people when I say that punishing the "cheaper" plans and pausing deployments when we are all trying to work on projects on different levels is unfair. I believe we deserve better than a blank we're sorry when we chose this platform compared to others.

lgontijojr

lethabogs

HOBBY

2 months ago

same :/ they mark all threads as resolved and they ddnt even respond. unacceptable

lgontijojr

tylerrr93

HOBBY

2 months ago

I can understand the priority of higher paying customers being restored first but to not offer other PAYING tiers, hobby or not, some type of compensation for a service theyre paying for but not receiving is sad.

caseys

HOBBY

2 months ago

Hobby deploy seems to be blocked now and my only option is to upgrade?

plexifystudio-projects

HOBBY

2 months ago

I am still experiencing issues; the server is down, and I am unable to redeploy. Is there any update on this? Will it be resolved soon? As user @Igontijojr mentioned, we are losing a significant number of users, and our existing users are currently unable to access the server.

blackdahlia

HOBBY

2 months ago

my db crashed, i went to restart it, its wiped. thats... great.

geovani29

PRO

2 months ago

I have my backend, my database and Redis running in a project, and everything seems to be working fine, but Redis is failing; there’s no connection to my backend or anything like that

angelo-railway

EMPLOYEEOP

2 months ago

Yup, still working on the build queue. Working them as quickly as we get your reports.

angelo-railway

Yup, still working on the build queue. Working them as quickly as we get your reports.

lethabogs

HOBBY

2 months ago

thank you :) hope the queue isnt gonna take forever

crazzeis

PRO

2 months ago

Error: "no space left on device" on builder node before Dockerfile could be read.

drattray45

PRO

2 months ago

Project Id: a15e5901-1471-4d5a-8d1f-655fcf5a6600

My error is this:

My production builds keep failing with ENOSPC (no space left on device) during pnpm run build. My railway.json on the main branch is set to the Railpack builder. I have a successful older deployment serving the site, but every new deployment fails on disk space. I cannot find a clear-build-cache option in the dashboard. Can you clear any stale build cache on your side and check the disk allocation on the build runner for this service?"

nipdog001

fev1n

HOBBY

2 months ago

I am also getting same error since last night - 502 bag gateway, Can someone help me with this?

Attachments

IMG_6823.jpeg

s8902114-art

HOBBY

2 months ago

Project ID: e00a5f14-06f6-464c-a2f9-17947005057c

Service ID: 470e0b9d-558b-4c68-9632-95b5bcb33436

Issue: Service shows Online but is crashed and not responding. Cannot redeploy due to Limited Access. Bot is not running.

nipdog001

paulagot

HOBBY

2 months ago

I have this same issue

alexandrosanapolitanos

PRO

2 months ago

I love Railway but honestly is one incident after the other.. Do you test the code you are writing? Do you have actual testers running tests?

I love the product but please be a bit more careful with what you ship.. You are a big company now, people are relying on you..

paulagot

I have this same issue

fev1n

HOBBY

2 months ago

I was able to redeploy and it fixed the 502 Bad gateway err

solocommercenl

PRO

2 months ago

No compensation for Pro plans is very disappointing, imho the Pro plan is not very Pro and the gap between Pro and Enterprise plans is way to big and the gap between Hobby and Pro way to small...

It would be much better if there was a plan between the current Pro and Enterprise with some sort of SLA and better support etc for a higher price.. a $20 plan should nopt be called "Pro" when there is not much pro about it, make a $20 Basic or starter plan and make the Pro plan $200 with better support, sla and more resources...

I am pretty sure there is a huge group of users like us that fall right in this gap, to small for enterprise to big for the current Pro plan, don't get me wrong I love Railway, but there are just to much issues and to little support options other then opening a thread.

tayfunerbilen

PRO

2 months ago

queue takes like forever...

alaskanloosemoose-dotcom

is this because of the issues ? internal load build definition from backend/Dockerfile 0ms Build Failed: build daemon returned an error < failed to solve: ResourceExhausted: failed to read dockerfile: failed to create lease: write /var/lib/buildkit/runc-overlayfs/containerdmeta.db: no space left on device >

citylogical

HOBBY

2 months ago

I'm getting a similar error when I try to deploy. If it were clear to me that upgrading to Pro would get my projects deployed then I would do it, but unfortunately the current state of Railway is very unclear. I understand that things happen and Google Cloud is also to blame but I'm baffled no email was sent out to users when these issues first started more than 24hrs ago.

paul1404

PRO

2 months ago

Two Postgres services stuck in catatonit crash loop post-GCP outage. Multiple redeploys failed on both. Volumes need to be moved to healthy nodes. Do not reprovision data must be preserved.

esg

Project ID: deceed5e-8221-4e36-91e3-32c2328ba9d4

Service ID: b76cd947-bc1f-4a4b-a281-a289fe88ae1c

Volume ID: vol_mdennqnz59fo250f

RamPage

Project ID: 0c2cf01b-fe90-4fdc-8cfb-d0b38fcd12d6

Service ID: 60caa650-ad9f-4240-8864-6a6badb83642

Volume ID: vol_645zafxxyi1iaqee

andreayalaq

FREE

2 months ago

Postgres crash loop here:

Project ID: eebb77e4-1f7a-4e82-b8be-455f3b531d96

Service ID: fd764253-08e0-4838-b589-8eb450058745

crazzeis

Error: "no space left on device" on builder node before Dockerfile could be read.

keemykeem

PRO

2 months ago

same here smh

gaslaksen

PRO

2 months ago

Stuck in a Postgres crash loop also:

Project ID: 2d7dbdad-ebee-46ae-86c2-227cac22e604

Service ID: cd7c5da2-f2b0-46b2-8d9c-a70f8d676c42

dkoo761

PRO

2 months ago

We are also seeing 2 different errors that are blocking builds across some (but not all) of our services...

==================

ERROR 1:

Project ID: 16dffee8-f224-4647-ac57-f84bc0c84d40

Service ID: 7c117564-9dac-4496-b262-62d48b47f348

Service ID: d562d7c2-765a-449f-8781-0ad0cde03a61

Error: "no space left on device" during builds

==================

ERROR 2:

Project ID: 7748b63c-e160-4bed-834b-54d57b4905a7

Service ID: 45fa8a64-8cf3-43c2-9bb4-32d587372187

Error:

W: GPG error: http://deb.debian.org/debian bookworm InRelease: At least one invalid signature was encountered.

E: The repository 'http://deb.debian.org/debian bookworm InRelease' is not signed.

W: GPG error: http://deb.debian.org/debian bookworm-updates InRelease: At least one invalid signature was encountered.

E: The repository 'http://deb.debian.org/debian bookworm-updates InRelease' is not signed.

W: GPG error: http://apt.postgresql.org/pub/repos/apt bookworm-pgdg InRelease: At least one invalid signature was encountered.

E: The repository 'http://apt.postgresql.org/pub/repos/apt bookworm-pgdg InRelease' is not signed.

W: GPG error: http://deb.debian.org/debian-security bookworm-security InRelease: At least one invalid signature was encountered.

E: The repository 'http://deb.debian.org/debian-security bookworm-security InRelease' is not signed.

alaskanloosemoose-dotcom

tomagnus

PRO

2 months ago

same issue

olshchynskysn-svg

PRO

2 months ago

Matches the catatonit pid1 case from your FAQ. Tried multiple redeploys from dashboard — service exits immediately after volume mount, never reaches Postgres startup. Volume mounts successfully, data on disk should be intact.

Project ID: 5ce998e6-827c-4f7e-9d22-0ae8419f8289

Service ID: c971edd0-93f1-4bbe-babb-03d897ebecab

Environment: 5daf58ea-d37b-4bc4-a537-5defd7975ad2 (production)

Volume: vol_u38kdryvh6000fx8

Image: ghcr.io/railwayapp-templates/postgres-ssl:18

Production has been down ~15h. Upgraded Hobby → Pro today specifically to escalate this. No backups exist on this volume — please do not reprovision empty. Need volume moved to a healthy node.

Thanks.

venomghost16

HOBBY

2 months ago

4e3b077f-e541-4cf1-a114-1b35755b4159 this is my project id and it is queued in deployment, my authentication layer has been broken and it seems it is not working.

ddehueck

PRO

2 months ago

Project ID: 0848c2af-10a3-4248-8238-52408750a5fc

Service ID: ba90700b-b934-4304-9a4a-4476d4b8509b

Getting Error: "no space left on device" during builds

rooeinfn86

PRO

2 months ago

Project: (843f25ea-53e1-4304-b578-2c05d02cc530)

Error: no space left on device on builder

Node: production-builderv3-us-west1-s3s1

Multiple retries all failing with same error

jfm8duke

HOBBY

2 months ago

My Postgres volume is full and stuck in a crash loop after the GCP outage. It completes WAL recovery but then fails with FATAL: could not write to file "pg_wal/xlogtemp.35": No space left on device on every restart attempt.

Project ID: 38c22a1f-42ce-4be3-8837-056ee222aaed

Service ID: 91ccec3f-1bc0-41bd-ad76-2ed0fb65e28d

Volume: vol_supkwekmcmp3vnyq

Please expand the volume or move it to a node with available space.

ngonzo95

PRO

2 months ago

Project: 8d66af41-5f12-46ba-934a-ef7a9907840b

Service: 6fc4ac4e-ea34-4605-88bb-4a165d28745d?

environmentId: Production

Getting Error: "no space left on device" during builds Other container is able to deploy

Anonymous

HOBBY

2 months ago

ResourceExhausted: no space left on device

production-builderv3-us-west1-s3s1

ResourceExhausted: no space left on device production-builderv3-us-west1-s3s1

angelo-railway

EMPLOYEEOP

2 months ago

Try now, platform reporting that all builders are cycled.

watersmash3

HOBBY

2 months ago

Postgres getting this error: ERROR (catatonit:2): failed to exec pid1: No such file or directory

Project: 03228f7b-5a0d-451f-9960-d83f5ca434b3

Service: ddefcd13-22d5-478f-97b6-ea69922189bb?environmentId=5ca08a48-b05f-4b6d-8442-4f9b5a96efdb

angelo-railway

Try now, platform reporting that all builders are cycled.

samoh

PRO

2 months ago

finally works.

angelo-railway

EMPLOYEEOP

2 months ago

Okay, we are fully back. For the WAL reports, we're going one by one and fixing that.

paschaldev

FREE

2 months ago

I was part of a team, but now I no longer see the team when I log in

gaslaksen

Stuck in a Postgres crash loop also: Project ID: 2d7dbdad-ebee-46ae-86c2-227cac22e604 Service ID: cd7c5da2-f2b0-46b2-8d9c-a70f8d676c42

gaslaksen

PRO

2 months ago

Things are not fully back... my Postgres redeploy is still failing with the same error: ERROR (catatonit:2): failed to exec pid1: No such file or directory

srfpth

HOBBY

2 months ago

Same as the person commenting immediately above: Postgres redeploy is still failing with the same error: ERROR (catatonit:2): failed to exec pid1: No such file or directory

bmclain07

PRO

2 months ago

i got my issue fixed so ignore any of my comments or request! Thank you for help

youcedom

PRO

2 months ago

Are we going to get credited for the services that were down?

nicoturelli

HOBBY

2 months ago

My Postgres is stuck in a crash loop with catatonit: failed to exec pid1. Already tried redeploy, still crashing.

Project ID: e65d3d2a-0d54-4520-aac4-921ae9d7452b

Service ID: 566c8d91-0606-4898-b667-930f3dd0d59e

Please move the volume to a healthy node.

kyledorchester

HOBBY

2 months ago

My Postgres service is still offline after redeploy.

Project ID: 74e1c5a2-0489-4844-b6ca-be0c008b48df

Service ID: 3658c827-51f4-4b6d-9e3d-14371382f5f0

The service is stuck in a crash loop with:

ERROR (catatonit:2): failed to exec pid1: No such file or directory

It looks like the volume may need to be moved to a healthy node.

Please recover the Postgres service/volume. Do not delete the volume.

Thanks!

hafizkanji

HOBBY

2 months ago

My Postgres service is still offline after redeploy.

Project ID: 3c81147d-d082-4cc9-bd54-cdc5cbb14e3f

Service ID: 4ff06ede-c82c-4016-9be3-f5d89ca8b601

The service is stuck in a crash loop with:

ERROR (catatonit:2): failed to exec pid1: No such file or directory

It looks like the volume may need to be moved to a healthy node.

Please recover the Postgres service/volume. Do not delete the volume.

Thank you

jacobxo0

HOBBY

2 months ago

Still hitting build failures on broken builder

Project ID: 2e3c01b1-9efc-445a-bb29-d9cfe20f1209

Service ID: aa6f58db-5c5c-4811-9dbe-57b17e46579b

Environment: production (0e2dfb76-5fdd-41f2-8b4f-71778539c66a)

Builder: production-builderv3-us-west1-bbns

Issue: All builds fail on this builder — either immediately after loading Dockerfile, or partway through (non-deterministic). Build fails at various stages: sometimes at [internal] load build definition, sometimes at [builder 6/6] RUN pnpm build. No error output shown, just "Deploy failed".

Last successful deploy: May 18

Attempts: 10+ retries, always assigned to same broken builder

radarcontroller

HOBBY

2 months ago

Disregard.

fizzysoftware

HOBBY

2 months ago

Project ID: 5d363616-92f4-4aac-8817-ef93e2b385fc

Postgres Service ID: f3b38dda-13b9-4a09-b3e9-ff346f9743a2

Volume ID: cdeb2601-3dae-4a8f-8958-357a16d10a8e

Issue: Filesystem corruption (lost+found directory present, initdb fails)

Error: "catatonit: failed to exec pid1: No such file or directory"

Data: Critical business data - needs recovery

NEED HELP

onyenso

PRO

2 months ago

My services were restored, but no subsequent deployments are working essentially freezing the state of all services.

I see this from deployment logs: ERROR (catatonit:2): failed to exec pid1: No such file or directory

Project ID: 8c79578d-2224-4ce1-acc8-d8ff14075386

Service ID: 6a7dae5e-588a-44a1-bd3e-c84798386a68

aura-pix

FREE

2 months ago

My project was deleted and i was downgraded from the hobby plan to the free trial (which has expired). I'm currently handicapped. I can easily redeploy since i'm still in production and beta testing but i need my working subscription first.

vitinx

I'm having the same problem. I have a production application and my clients can't access it; it returns a 502 Bad Gateway error. I don't know what to do.

rafaramosbc

HOBBY

2 months ago

I am facing the exact same issue here. My backend service shows as "Online" on the dashboard, but all requests are returning a 502 Bad Gateway error (timing out after 15s). I need help as soon as possible because my clients are currently unable to access the application. Any help would be greatly appreciated.

jjlmoya

HOBBY

2 months ago

Database not connectig at all:

projectid: abf7ae95-9d16-443a-9e4b-b08e082778d9

serviceId: 67165f3d-c583-4c65-82af-678f903863eb

rafaramosbc

jtcorrin

HOBBY

2 months ago

I have the same 502 problem on a number of my services

yugoru

HOBBY

2 months ago

If my database disappeared during all these problems, what should I do?! How can I recover the data? And yes, even an empty database won't deploy project 27361879-8048-4379-aa11-c9c2a6bf9085.

ajkfly40

HOBBY

2 months ago

Hi — I'm in the situation described in the Postgres FAQ entry.

Project ID: b7c0bb89-dc0b-4d89-8fb5-4eb488f37c3f

Service ID: 3da1b629-8f13-4a8c-aaf6-88e9dbc19642

Postgres has been in the catatonit: failed to exec pid1 crash loop

since 2026-05-20 04:11 UTC. I've redeployed several times over the

last ~24 hours with no recovery. Please move the volume to a

healthy node when you have a moment.

Thanks!

michaelbaumgarn

HOBBY

2 months ago

Hi — affected by this incident. Need volume recovery if possible.

Project: ling-academy (e1e043a9-cf29-4be4-9419-8b56977a432f), env production, region europe-west4

Service: Postgres (467e5a6b-16f1-4a5d-8731-6980f238d292)

Volume: postgres-volume (dbe0f9fc-30ed-440f-b8cb-ed6f1a25b3b6), mount /var/lib/postgresql/data

Failover deployment: a2f93435-c88e-4899-bf88-0bbe36b51362, 2026-05-20 08:15:14 UTC, reason=failover. The deployment immediately before was c4d3894a from 2026-04-03 — ran continuously, no user-initiated redeploys

or volume changes since.

Symptom: after the failover, container mounted the volume but found only an empty lost+found/. Entered initdb crashloop for ~29h until I diagnosed it.

Important / full disclosure: I have already partially recovered. I set PGDATA=/var/lib/postgresql/data/pgdata, let initdb run, and restored a ~155 MB local pg_dump from May 14 09:53 UTC. Prod is back up, but I'm

missing May 14 19:03 → May 20 08:17 UTC of user data (~6 days of practice sessions, articles, signups). If the original disk is recoverable on an unhealthy node and can be made readable (even as a side-mount I

can pg_dump from), I want it — I'll do the merge myself.

Let me know what you need from my side.

nicoturelli

My Postgres is stuck in a crash loop with catatonit: failed to exec pid1. Already tried redeploy, still crashing. Project ID: e65d3d2a-0d54-4520-aac4-921ae9d7452b Service ID: 566c8d91-0606-4898-b667-930f3dd0d59e Please move the volume to a healthy node.

nicoturelli

HOBBY

2 months ago

Still broken. Postgres shows "Completed" but backend gets ConnectionResetError: [Errno 104] Connection reset by peer when trying to connect on startup. The volume was NOT successfully moved.

Project ID: e65d3d2a-0d54-4520-aac4-921ae9d7452b

Service ID: 566c8d91-0606-4898-b667-930f3dd0d59e

Please escalate — this has been down for 3 days.

fitsumaf

HOBBY

2 months ago

Project ID: 24945270-e631-45d5-9f6f-f1f5381179e7

Service ID (Postgres): 8ebf5df4-a5a0-4c3c-ab5c-38c4e687c0cd

Postgres is stuck in a crash loop with catatonit: failed to exec pid1: No such file or directory. I have tried to redeploy may times. The volume may need to be moved to a healthy node.

justindesrosiers

HOBBY

2 months ago

This outage happened at a terrible time for me. During an opportunity to demo my product to some serious interest and it was horribly embarrassing. If this is what being hosted on Railway is like I need to take my product elsewhere. I liked the product, but I can't tolerate incidents like that.

eocapital

HOBBY

2 months ago

Hi Angelo - same "no space left on device" build failures here since

yesterday morning. None of the FAQ cases match exactly.

Project ID: 5432aef0-30a1-4fd4-9d7f-ad9ff76ac5da (artistic-heart)

Service ID: 105cc3e7-294a-40fe-9b70-192ea50b7253 (dealflow-pro)

Environment: production

Plan: Hobby

Recent failed deployments:

39b08224 (commit cafe9f5) - 30 min ago
a489dfba (commit 09e6e7d) - 1 hour ago
(commit 778d2aa) - earlier today
Earlier failures since May 20 ~13:00 EDT (after we added an env var)

Error pattern: pnpm install --frozen-lockfile --prod=false --force returns

success, but tsc build fails because @types packages (express, compression,

cookie-parser, nodemailer, jsonwebtoken) are not present in node_modules.

Same lockfile, same nixpacks.toml, same tsconfig as last successful

deployment (c93424a1 / commit bfca418, May 18). Already tried:

Plain redeploy
Cache-bust comment in nixpacks.toml
--force flag on pnpm install

Production is still serving from the May 18 deployment, but I cannot

deploy new code. Looks like the assigned build host has a corrupted

pnpm store from the disk space incident.

Could you reclaim disk on our assigned builder or route us to a healthy

host? Thanks!

hafizkanji

My Postgres service is still offline after redeploy. Project ID: 3c81147d-d082-4cc9-bd54-cdc5cbb14e3f Service ID: 4ff06ede-c82c-4016-9be3-f5d89ca8b601 The service is stuck in a crash loop with: ERROR (catatonit:2): failed to exec pid1: No such file or directory It looks like the volume may need to be moved to a healthy node. Please recover the Postgres service/volume. Do not delete the volume. Thank you

hafizkanji

HOBBY

2 months ago

It's been 14 hours, and I haven't received any update. Is anyone else in this thread getting updates or fixes?

Project ID: 3c81147d-d082-4cc9-bd54-cdc5cbb14e3f

Service ID: d50f83e2-8812-4428-bc71-22d2e489f9f5 (Node.js app)

Environment ID: dc3b13ea-7973-4268-96be-6b5907b92802

Volume ID: vol_o4qx1ufced17vtpg

Issue: PostgreSQL service is crash-looping with 'catatonit: failed to exec pid1: No such file or directory.' The volume mounts successfully but the container cannot start. The app service is also down as a result since it cannot connect to the DB.

Steps already tried: Manual restart of the PostgreSQL service — same error repeats on every attempt.

hafizkanji

It's been 14 hours, and I haven't received any update. Is anyone else in this thread getting updates or fixes? Project ID: 3c81147d-d082-4cc9-bd54-cdc5cbb14e3f Service ID: d50f83e2-8812-4428-bc71-22d2e489f9f5 (Node.js app) Environment ID: dc3b13ea-7973-4268-96be-6b5907b92802 Volume ID: vol_o4qx1ufced17vtpg Issue: PostgreSQL service is crash-looping with 'catatonit: failed to exec pid1: No such file or directory.' The volume mounts successfully but the container cannot start. The app service is also down as a result since it cannot connect to the DB. Steps already tried: Manual restart of the PostgreSQL service — same error repeats on every attempt.

Anonymous

PRO

2 months ago

im still down same catatonic :failed to exec pid1 as you

hafizkanji

ducitymp

HOBBY

2 months ago

Same here, customers have been locked out of their data for more than 24hrs at this point..

nipdog001

bruniiinoliveira57

HOBBY

2 months ago

I'm having the same problem. I have a production application and my clients can't access it; it returns a 502 Bad Gateway error. I don't know what to do.

bruniiinoliveira57

HOBBY

2 months ago

I have a production application and my clients can't access it; it returns a 502 Bad Gateway error. I don't know what to do.

Project ID: ef34fcb0-24b9-4f4b-b4d3-800c5e795455

crisin

HOBBY

2 months ago

Project b4a93fdf-c094-4d14-83fd-35f827676014

Service 23c93a3a-4e3c-4e1b-957f-724ccfd0ab3e

Restarting/Redeploying the service was not sufficient.

Would be great if you could move my deployment to a healty node, thanks in advance <3

Anonymous

PRO

2 months ago

i was able to ctrl +k after i clicked postgres then clicked the redeploy source image and im back up im not sure if you lose anything tho i seem to be fine

vinrata

HOBBY

2 months ago

project: 99fee704-d77c-4003-9c3e-e3b50aeff6ff

service: dff60f19-8cc5-4680-8a10-0b220d24734f

Thank you so much!!

kyledorchester

HOBBY

2 months ago

Following up: this production Postgres service has been down almost 24 hours with the known catatonit pid1 crash after the GCP outage. We already tried redeploying. Please move the attached volume to a healthy node or advise ETA. This is blocking production recovery.

vinrata

Issue: PostgreSQL service is crash-looping with 'catatonit: failed to exec pid1: No such file or directory.' The volume mounts successfully, but the container cannot start. The app service is also down as a result, since it cannot connect to the DB. project: 99fee704-d77c-4003-9c3e-e3b50aeff6ff service: dff60f19-8cc5-4680-8a10-0b220d24734f Thank you so much!!

vinrata

HOBBY

2 months ago

Manually redeploying seemed to fix it!

rowgregory

PRO

2 months ago

Postgres crash-looping since May 19 ~20:55 UTC. Logs show collation version mismatch warnings (2.36 → 2.41) then server closed connection on any client query. Tried redeploy, no recovery. Project ID: 53e9dfc3-52a2-46da-9cb7-74c153164e91, Service ID: c6bb0b8c-f187-4bb9-9724-d90633427a08. Need volume moved to a healthy node per the FAQ.

nickeast12

PRO

2 months ago

Issue: ERROR (catatonit:2): failed to exec pid1: No such file or directory

project: 58976afa-7d70-4f40-be36-25090072a4fe

service: f810ce1e-092e-4e7b-9e59-b018232df391

nickeast12

**Issue**: ERROR (catatonit:2): failed to exec pid1: No such file or directory **project**: 58976afa-7d70-4f40-be36-25090072a4fe **service**: f810ce1e-092e-4e7b-9e59-b018232df391

nickeast12

PRO

2 months ago

For anyone facing the ERROR (catatonit:2): failed to exec pid1: No such file or directory error try

Database service > latest deployment > command palette Ctrl+K > Redeploy source image

nipdog001

fasara

HOBBY

a month ago

Same. I'm on a hobby plan, and getting a 502. And I haven't deployed any changes since March.

cheonglol

PRO

a month ago

I forgive you railway

styreep

PRO

a month ago

Postgres data loss post-outage — same pattern as @knarxPRO's case.

Symptom: postgres-entrypoint runs initdb against the existing PGDATA volume on every restart. Volume contains only lost+found/ directory (timestamp matches the volume's provisioning date 2026-03-05). All Postgres writes from March–May 2026 are missing.

Project / Service / Volume IDs:

Project (Vikunja): 0bf5c608-b259-48f6-876a-d0b8bff576be
Postgres Service: 0be4cd2b-4150-46c2-8379-68f423b0aee8
Environment: a9bc8090-3c89-4e49-8115-968a778ee3dc
Volume (postgres-volume, the real data volume): b9a0ce59-d1e6-44e3-9c20-64bc215d8b5b
VolumeInstance: aa5c6313-80e9-4b0c-82fa-3895d638c07e
Region: us-east4-eqdc4a

Timeline:

2026-03-05 — Postgres deployed (image: postgres:16, user-managed, not template Postgres).
2026-03-05 → 2026-05-19 — Vikunja in active production use, writes daily.
2026-05-19 22:20 UTC → 2026-05-20 ~06:14 UTC — GCP outage. Owner confirmed Vikunja recovered with full data the morning of May 20.
2026-05-22 ~15:15 UTC — Railway automatic snapshot of the volume taken (returned an empty filesystem when inspected — see below).
2026-05-22 15:21 UTC — Automatic Postgres redeploy fires (reason: redeploy, no human action documented from our side; we have no audit-log access). Deploy FAILED in 3 seconds due to a duplicate volumeMounts: ["/var/lib/postgresql/data", "/var/lib/postgresql/data"] config (legacy from a March 5 dual-volume mistake that had been latent for 2.5 months).
2026-05-22 16:50 UTC — Owner-team first investigated the issue.

Diagnostics performed:

Restored snapshot 99454ae3-6fa8-4150-a1e8-7a05bd709cd1 (auto-snapshot from 15:15 UTC, referencedMB: 1057) to a fresh volume.
Overrode startCommand to sleep infinity to avoid postgres-entrypoint touching the data.
ls -laR /var/lib/postgresql/data/ returned: only lost+found/ (16 KB, Mar 5 20:37 timestamp — the volume's provisioning date).
No PG_VERSION, no base/, no pg_wal/, no Postgres cluster files at all.
referencedMB: 1057 in the snapshot appears to be filesystem overhead / orphan blocks, not Postgres data.

Pattern matches @knarxPRO: "data on disk is from this service's initial provisioning on 2026-02-23 — three months of production writes are missing." Same here: volume root dir timestamps point to the 2026-03-05 provisioning date, with all subsequent writes missing.

Volume backups available:

99454ae3-6fa8-4150-a1e8-7a05bd709cd1 — auto-snapshot, 2026-05-22T15:15:31Z, referencedMB: 1057 (empty when restored)
ed4d325b-8139-45ba-9151-bf4d4781081e — manual, 2026-05-22T16:57:10Z, referencedMB: 1128
f3deeb6a-ee33-46c9-8df4-4abd728a7fc1 — manual, 2026-05-22T17:06:33Z, referencedMB: 1128

(All taken after the wipe was already in effect; none contain recoverable data when restored.)

Per the FAQ thread: "The volume may need to be moved to a healthy node." Requesting:

Volume move to a healthy node — please don't reprovision empty. Volume contains 2.5 months of Vikunja production state.
If you have block-storage snapshot or pgBackRest-style recovery state from before the 2026-05-22 wipe, please restore.
PITR is not available for this Postgres (user-managed postgres:16 image, not template Postgres) — but if there's an alternative recovery path Railway support can apply, please advise.

We had no Postgres backup cron in place (gap from our own post-outage runbook that wasn't closed in time). Acknowledging this is partly on us. But the wipe event itself appears to be the same systemic post-recovery issue affecting many users in this thread.

Available for any diagnostic you need.

Thanks.

aura-pix

FREE

a month ago

I somehow found out that my project at https://railway.com/project/5a1291d5-0735-4f57-8e89-c33ee7a6eb49?environmentId=33573da6-ffb0-4a6a-927d-b9ed6472c97f was actually not deleted because my pushes are still deploying but i cant even access the project and i'm still locked out of my hobby subscription. This is a frustrating first time experience i must admit. Me already certain i have found my solution and then this! It's a good thing i'm still in beta and haven't gone live yet.

otavio939

HOBBY

a month ago

SOLVED for me

Had to move the website to the same project as the database.

ducitymp

HOBBY

a month ago

For people who have the "failed to exec pid1: No such file or directory" error. I changed the region of my postgres database and then changed it back to make it work again.

jnalewabau

HOBBY

a month ago

Issue: ERROR (catatonit:2): failed to exec pid1: No such file or directory

project: 95d88071-5b8d-4f9b-b47b-1d250221f6b4

service: bada3e06-08d5-4ae0-9906-905be7dde67a

imanoel01

HOBBY

a month ago

Project ID: cbf9bd2f-326f-4b3d-b98d-1cc54221289d

Service: Postgres (production-3e4b)

Volume: postgres-volume (vol_j3lopoqdd2zj8hck)

The symptom: catatonit pid1 failure since 2026-05-20 04:50 UTC, volume mounts but entrypoint never runs

Postgres Copy with the same image works fine

classebasse

PRO

a month ago

pid1 failure

Project ID: c035fd08-804d-4e9d-8ffd-28b719a832e0

Service ID: 950c7ca6-7203-4a1c-9780-87fad0f75468

mtobeid

PRO

a month ago

In regards to whats been said here:

'On compensation: outage credits are part of Enterprise SLAs and are not offered on Hobby or Pro plans. We know that is disappointing given the impact, and we are sorry. The postmortem covers what we are doing to prevent a recurrence.'

Thats fine that theres no compensation but what are you actually doing in regards to preventing a reccurance? This is most important to me, id like to be able to reassure clients that this service is reliable and worth keeping.

fasara

HOBBY

a month ago

Hi,

The incident was declared resolved in the post-mortem

https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage,

but my app is still not accessible, and returning a 502.

I tried a re-deployment, but it didn't get triggered (not sure if it's due to the potential high volume of users trying to re-deploy).

Here's my Project ID 56e37f0b-af54-46a3-9a28-f9a1b2d8fefa

I have a frontend, backend, and MySQL DB service. They are all online in the dashboard

fasara

Hi, The incident was declared resolved in the post-mortem https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage, but my app is still not accessible, and returning a 502. I tried a re-deployment, but it didn't get triggered (not sure if it's due to the potential high volume of users trying to re-deploy). Here's my Project ID `56e37f0b-af54-46a3-9a28-f9a1b2d8fefa` I have a frontend, backend, and MySQL DB service. They are all online in the dashboard

fasara

HOBBY

a month ago

Service ID: 4a558b4f-e0aa-4948-8ff0-4426a3b2087d

fasara

Service ID: `4a558b4f-e0aa-4948-8ff0-4426a3b2087d`

fasara

HOBBY

a month ago

Hi Railway team,

Here are the details:

Backend service ID: 4a558b4f-e0aa-4948-8ff0-4426a3b2087d
Database service ID: 1e79a625-8bec-42d8-a525-59ab91dd3716
Request ID from latest curl: UilmFBbTTSyR_AJwEuroCg

Steps I've taken:

Triggered a manual redeploy — no new deployment was queued (the last deploy visible was pre-outage)
Logged in using the railway CLI and managed to trigger a re-deploy
Performed a hard redeploy from the dashboard

Current state:

App boots successfully — logs show "Nest application successfully started."
curl still returns HTTP 502 with x-railway-fallback: true and "Application failed to respond"
Dashboard shows "Networking info temporarily unavailable" when loading domain/TCP proxy details

The app appears healthy on my end, but Railway's edge proxy does not seem to be routing traffic to the container. Looks like a networking/routing issue on the infrastructure side.

bohachevskyy

HOBBY

a month ago

Receive 509 for a web application with a custom domain since May 19. Redeployment doesn't help. Last deployment happened on May 15 and operated with no issues until May 19. Metrics show no traffic since.

Service ID: 8e1482ff-86d6-4831-b319-0e38961e268d

mixalloff

HOBBY

a month ago

Project ID: 1294f691-6e62-43e3-8786-3d346b1880c3

Service ID: 1d86cc48-5fdd-4f52-8d01-bfd450fd95ed

Environment ID: 1d70f780-9b9c-4298-aa4b-81064809c106

My Postgres is stuck in a crash loop with catatonit: failed to exec pid1: No such file or directory: try one redeploy first.

Could you please move it to healthy node?

eocapital

HOBBY

a month ago

Project ID: 5432aef0-30a1-4fd4-9d7f-ad9ff76ac5da

Service: 105cc3e7-294a-40fe-9b70-192ea50b7253

Plan: Hobby

Latest failed deploy: cafe9f5 (just retried 5 min ago - same error)

Could someone please reclaim disk on our assigned builder or route us to a healthy host? Production is fine on the May 18 deployment but we cannot ship any new code.

Thanks!

eocapital

Bumping this - still blocked from deploys 6 days later. Builds still failing with same "no space left on device" → @types missing → tsc errors. Already tried plain redeploy, cache-bust comment, and --force flag on pnpm install. Project ID: 5432aef0-30a1-4fd4-9d7f-ad9ff76ac5da Service: 105cc3e7-294a-40fe-9b70-192ea50b7253 Plan: Hobby Latest failed deploy: cafe9f5 (just retried 5 min ago - same error) Could someone please reclaim disk on our assigned builder or route us to a healthy host? Production is fine on the May 18 deployment but we cannot ship any new code. Thanks!

angelo-railway

EMPLOYEEOP

a month ago

Recovery update for everyone in this thread.

Builders and deploys are fully restored. If you were seeing "Deploys have been paused temporarily" or build failures with "no space left on device," please retry now.

For Postgres or Redis crash loops with "catatonit: failed to exec pid1: No such file or directory," a normal redeploy does not re-pull the container image. Use Ctrl+K (or Cmd+K on Mac) to open the command palette on your database service, then select "Redeploy source image." Multiple users in this thread have confirmed this resolves the crash loop with data intact. If that does not work, try changing the service's region to a different one, then changing it back, which forces a volume migration to a healthy node.

For 502 Bad Gateway errors, a standard redeploy from the dashboard should restore routing. If you are still seeing 502s after redeploying, or if your dashboard shows "Networking info temporarily unavailable," please reply with your project and service IDs so we can investigate the routing individually.

For those who reported data loss or volumes showing only a lost+found directory with data reverted to the initial provisioning date, we have all your IDs logged and are working these cases individually. If you reported a wiped database but have not yet shared your project and service IDs, please reply with them.

If your projects appear missing from your dashboard, check the workspace switcher in the top-left corner. The outage caused some users to land in a freshly-created empty workspace. Your projects should still be in your original workspace.

The postmortem covering what happened and what we are doing to prevent a recurrence is here: https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage

We are sorry for the impact, and we know that does not undo the disruption to your users and your work.

As an aside, our support queues are now in a really good shape with everyone getting a response within or below the response time objectives. We appreciate you all working with us.

angelo-railway

Recovery update for everyone in this thread. Builders and deploys are fully restored. If you were seeing "Deploys have been paused temporarily" or build failures with "no space left on device," please retry now. For Postgres or Redis crash loops with "catatonit: failed to exec pid1: No such file or directory," a normal redeploy does not re-pull the container image. Use Ctrl+K (or Cmd+K on Mac) to open the command palette on your database service, then select "Redeploy source image." Multiple users in this thread have confirmed this resolves the crash loop with data intact. If that does not work, try changing the service's region to a different one, then changing it back, which forces a volume migration to a healthy node. For 502 Bad Gateway errors, a standard redeploy from the dashboard should restore routing. If you are still seeing 502s after redeploying, or if your dashboard shows "Networking info temporarily unavailable," please reply with your project and service IDs so we can investigate the routing individually. For those who reported data loss or volumes showing only a lost+found directory with data reverted to the initial provisioning date, we have all your IDs logged and are working these cases individually. If you reported a wiped database but have not yet shared your project and service IDs, please reply with them. If your projects appear missing from your dashboard, check the workspace switcher in the top-left corner. The outage caused some users to land in a freshly-created empty workspace. Your projects should still be in your original workspace. The postmortem covering what happened and what we are doing to prevent a recurrence is here: https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage We are sorry for the impact, and we know that does not undo the disruption to your users and your work. --- As an aside, our support queues are now in a really good shape with everyone getting a response within or below the response time objectives. We appreciate you all working with us.

jtcorrin

HOBBY

a month ago

Finally a response! If only you had acknowledged sooner. I now have accounts with render and flyio.

363color

FREE

20 days ago

Hi, my case doesn't match the FAQ exactly. My Postgres service shows as "Active" in Deployments after a manual restart, but the Database tab still shows:

"We are unable to connect to the database via SSH" / "Connection terminated unexpectedly"

This is consistent even after restarting the deployment. My app's connection from outside also fails with ECONNRESET.

Project ID: 8ab95573-44b7-4752-98ea-1a89329eac9f

Service ID: 6cc5afea-e158-4f20-9603-4253232eff2b

Service name: Postgres (project: zucchini-celebration)

No backups enabled (Hobby plan), so I'm hoping the volume itself is intact and just needs to be moved to a healthy node as mentioned in the FAQ. Thanks for the help.

Welcome!