3 hours ago
Full post-mortem still in-progress.
Hello all,
As part of our commitment to be as transparent as possible about the recent outage: https://status.railway.com/ (and any incident on the platform) we are summarizing what we know thus far- as well as some common questions that we have fielded thus far. We plan to update this thread with our official post-mortem once we have the details.
As it stands right now, we are working on getting everyone recovered. If not, a user can now redeploy and we will route your code to a healthy machine. We have our whole support team working getting workloads on Google Cloud hosts restored. Keep in mind, we are getting ratelimited by GH as our build pipeline is restoring to be fully healthy.
What we know:
Around 22:20 UTC, our Google Cloud account was placed into a "restricted" status hence removing all of our cloud overflow VMs, our CloudSQL instance, and our API. In removing our API, it removed a central dependency that affected all GCP host workloads, and then after our network route cache expired, then affecting all workloads hosted on the Railway platform.
We don't have full knowledge as to why our account was suspended automatically. We got into contact at the start of incident and we remain in contact with the GCP engineering team as we root cause the issue.
FAQ:
Q: "Doesn't Railway run it's own hardware?"
A: Yes, Railway runs hardware in 8 sites around 4 locations around the world. At the start of 2026, due to demand on our systems, we have bursted back onto the cloud on AWS and GCP.
A subset of non-latency sensitive customers and Enterprise customers are using a public cloud for their hosts. However, when we migrated fully onto Metal in Mar. 2025, we kept our API and our DB on GCP as we felt that leaving that workload was well within our risk model. (Candidly: we didn't expect to get our cloud account to get removed via automated enforcement.)
Q: "Why does your API being down mean that my workload went down?"
A: Railway's API talks to our distributed edge which is a suite of proxies that we have all around the world. Each location at the edge maintains a routing table of DNS, however, the network team was in the middle of fully distributing the routing table to make it that each region was fully independent. Once the routing cache at the edge was expired, then all workloads were affected and not just GCP workloads. This is due to be rectified.
Q: "My workload is still down. What gives?"
A: We're seeing recovery in our API, builds, and deployments. If your service is having an issue, please try to give it a redeploy. We'll publish a public postmortem covering what happened when we're fully recovered.
64 Replies
3 hours ago
Thank you guys for keeping us in the loop
Wishing you guys the best with the final recovery
3 hours ago
Is my mongodb getting crashed related to this or not?
"MongoDB cannot start: Linux kernel versions 6.19 and newer has a known incompatibility with this version of MongoDB. See https://jira.mongodb.org/browse/SERVER-121912 for more information."
It started happening now.
3 hours ago
For any technical issues, please open your own thread explaining any remaining issues.
3 hours ago
"Candidly: we didn't expect to get our cloud account to get removed via automated enforcement." very politely put haha
3 hours ago
These things happen. Thanks for keeping us in the loop and working hard to get us all back up and running.
yeferson59
my Postgres database crashed, I think is problem about network GCP
3 hours ago
Crashed or timed out? Mine is same, I suspect issue with server talking to DB over railway vcp
3 hours ago
Thanks for the updates! Stuff like this is always interesting to follow haha, thought I broke something
3 hours ago
I sincerely hope Google pays for this. This was a resume generating event for sure
3 hours ago
My question is, this isn't an isolated issue with service instability. I am seeing more and more issues daily; most don't affect us, but they're becoming more numerous. The failure point seems to have stemmed from a sudden need for more capacity, and if Railway hadn't been nearing capacity, this wouldn't have happened, since GCP wouldn't have been needed.
3 hours ago
Thank you for keeping us informed.
3 hours ago
redeploying kicked us back online... I wish they told us to try that earlier. Thanks for the mid-post mortem still!
ricks-yfwi
redeploying kicked us back online... I wish they told us to try that earlier. Thanks for the mid-post mortem still!
2 hours ago
Very likely that you wouldn't have been able because after we got services online, we had to re-bootstrap the build fleet. We are back. Thanks again.
2 hours ago
MongoDB cannot start: Linux kernel versions 6.19 and newer has a known incompatibility with this version of MongoDB. See https://jira.mongodb.org/browse/SERVER-121912 for more information.
2 hours ago
Whats the point of having multiple 'Replicas' (that I am paying for) when they are all going to be hosted by GCP? How do you not have a redundancy system for this? Like using AWS or something? This downtime has cost me thousands of dollars as it happened during the morning in my timezone.
edoswald
My question is, this isn't an isolated issue with service instability. I am seeing more and more issues daily; most don't affect us, but they're becoming more numerous. The failure point seems to have stemmed from a sudden need for more capacity, and if Railway hadn't been nearing capacity, this wouldn't have happened, since GCP wouldn't have been needed.
2 hours ago
Agreed. I really enjoy Railway as a platform but am starting to consider other options given recent stability issues.
edoswald
My question is, this isn't an isolated issue with service instability. I am seeing more and more issues daily; most don't affect us, but they're becoming more numerous. The failure point seems to have stemmed from a sudden need for more capacity, and if Railway hadn't been nearing capacity, this wouldn't have happened, since GCP wouldn't have been needed.
2 hours ago
This is a fair read. That said, I will say that the uptime crunchiness from Feb. - Mar. was GH and then capacity. You don't have to take my word for it, but each outage was unique way to stress test our systems to which, knock on wood, we've been able to manage so far.
Outages as of late until this one were usually tied to host failures, of which we are working to mitigate with "live migrations" (an in progress feature that will come with VMs) - that said, this one was egregiously bad because it was a single and expected point of failure like a cloud account getting removed. That said, we own our uptime and it affected everyone, so everyone has a right to be mad because we did impact businesses for 6 or so hours.
The good I will take away from this is that- we have acted on your feedback on comms and we were on the ball with information delivery, now we just need to land the rest of the reliability work to make it so that the platform is anti-fragile. For those who feel the need to migrate, it's been an honor to serve your business.
2 hours ago
Recently, I’ve seen quite a few tutorials on YouTube explaining how to use the Railway service to set up a “free” VPN. These users are taking advantage of the platform’s promotional offers for their own convenience, but they’re disrupting the system’s balance. I wonder if Google has detected abnormal network requests due to the sheer number of these “free” users and made an erroneous judgment as a result?
white
Recently, I’ve seen quite a few tutorials on YouTube explaining how to use the Railway service to set up a “free” VPN. These users are taking advantage of the platform’s promotional offers for their own convenience, but they’re disrupting the system’s balance. I wonder if Google has detected abnormal network requests due to the sheer number of these “free” users and made an erroneous judgment as a result?
2 hours ago
I would like to hope that it is, but we have no word yet on why GCP got us in the automated "cull" - to say we are livid is an understatement.
2 hours ago
That was a double dose of "welcome to API risk". I hope we get a full accounting of why Google killed access. My deployment crashed as a result, but I need to manually restart. Need a mechanism to auto-restart and continually try to restart until success, in case it was nighttime for me.
Fortunately, I'm still in alpha test deploy mode (Hence Hobby plan currently) so not live yet, but this is a bit of a rude awakening to me. I have to reassess my deployment options now.
banggsatga
MongoDB cannot start: Linux kernel versions 6.19 and newer has a known incompatibility with this version of MongoDB. See https://jira.mongodb.org/browse/SERVER-121912 for more information.
2 hours ago
This is related to your DB version not being pinned on re-deploy, which you need to pin to the correct version of Mongo.
Whats the point of having multiple 'Replicas' (that I am paying for) when they are all going to be hosted by GCP? How do you not have a redundancy system for this? Like using AWS or something? This downtime has cost me thousands of dollars as it happened during the morning in my timezone.
2 hours ago
Can you respond to this? We may as well just go direct with Google at this rate.
Can you respond to this? We may as well just go direct with Google at this rate.
2 hours ago
Not to be dismissive, but Q: "Why does your API being down mean that my workload went down?" should be the answer that you are looking for. As for your hosts, the records show that you are indeed on AWS/GCP/Metal but the networking being tied tot the API is what got you and others. We have a mitigation for this shortcoming is in progress.
2 hours ago
You should add a feature that allows customers to deploy in replica mode across AWS, Google Cloud, and Azure. I’m willing to pay extra for this. For example, if Google Cloud goes down today, my instances running on AWS and Azure would still remain operational.
santidevi
You should add a feature that allows customers to deploy in replica mode across AWS, Google Cloud, and Azure. I’m willing to pay extra for this. For example, if Google Cloud goes down today, my instances running on AWS and Azure would still remain operational.
2 hours ago
That's a good suggestion 👍
santidevi
You should add a feature that allows customers to deploy in replica mode across AWS, Google Cloud, and Azure. I’m willing to pay extra for this. For example, if Google Cloud goes down today, my instances running on AWS and Azure would still remain operational.
2 hours ago
That should be the default! All our replicas being the same host is insane. This downtime cost me thousands. Is railway not made for serious work? I get outages happen, but no redudency is mind blowing.
2 hours ago
our Postgres service has been down since the GCP outage and is now stuck in a crash loop. The volume mounts successfully but the container immediately fails with catatonit: failed to exec pid1: No such file or directory ...the Postgres binary appears missing from the container layer. We've tried restarting multiple times, same result every time. anyone else having this issue
Whats the point of having multiple 'Replicas' (that I am paying for) when they are all going to be hosted by GCP? How do you not have a redundancy system for this? Like using AWS or something? This downtime has cost me thousands of dollars as it happened during the morning in my timezone.
2 hours ago
I feel like AI has made us all dumber. If you host an app with a load balancer on AWS do you assume replicas would be hosted on external services like GCP? No, thats not the point of replicas. Replicas reduce strain on a singular system, what people are saying about "just it hosted on multiple services (AWS, GCP, etc)" is like building the whole service pipeline again, whats the point if you're building on a system intended to have quadruple 9's uptime.
Ultimately every system has bottlenecks regardless of it's size. Anger should not be pointed at the railway team, these are circumstances even the best devops team cannot plan for.
2 hours ago
Sou um iniciante ainda, não entendo muito. Já estou pensando em planos B. Vou aumentar meu custo mas tenho que ter um plano B pra quando isso acontecer e minha operação continuar rodando
yeferson59
my Postgres database crashed, I think is problem about network GCP
2 hours ago
Redploying Postgres fixed the issue for me
2 hours ago
Hey, my service keeps crashing immediately after restarting. Is there anything I can do? My complete app is down now
progrennis
Hey, my service keeps crashing immediately after restarting. Is there anything I can do? My complete app is down now
2 hours ago
The same: ERROR (catatonit:2): failed to exec pid1: No such file or directory
2 hours ago
Just started scaling up on Railway and this happened. Hilarious. I will be waiting to know what measures Railway will take to prevent these sort of outages in the future. Also why should I re-deploy the services myself again? It is pass after all or have I missed anything?
2 hours ago
My PostgreSQL has same problem. How can I recovery the data and redeploy?
asadullahjan
Is my mongodb getting crashed related to this or not? "MongoDB cannot start: Linux kernel versions 6.19 and newer has a known incompatibility with this version of MongoDB. See https://jira.mongodb.org/browse/SERVER-121912 for more information." It started happening now.
2 hours ago
yes i have same issues, i think because they still not fully recovered
mika9339
Redploying Postgres fixed the issue for me
2 hours ago
How did you redeploy the PG DB? Did you recovery the full data?
2 hours ago
I redeployed my services and it worked. Thanks for letting us know about every update on this.
2 hours ago
people that are complaining, why did you rely on a single provider yourself? you can plan for redundancy at every level and something will still take you down.
Just started scaling up on Railway and this happened. Hilarious. I will be waiting to know what measures Railway will take to prevent these sort of outages in the future. Also why should I re-deploy the services myself again? It is pass after all or have I missed anything?
2 hours ago
Same.
2 hours ago
Our production Postgres service crashed with 'ERROR (catatonit:2): failed to exec pid1: No such file or directory' and won't restart—the volume appears corrupted. We need help recovering the data without losing it. Project ID: 5beb36b2-c52f-43a5-be5b-29c953a7d463
frostykdev
Our production Postgres service crashed with 'ERROR (catatonit:2): failed to exec pid1: No such file or directory' and won't restart—the volume appears corrupted. We need help recovering the data without losing it. Project ID: 5beb36b2-c52f-43a5-be5b-29c953a7d463
2 hours ago
If you can open up a new support thread so we can get you going, would love to.
2 hours ago
redeploy again seems fixed it
2 hours ago
Ah! The problem was resolved after making the reconfiguration.
brody
For any technical issues, please open your own thread explaining any remaining issues.
2 hours ago
my data base IS DOWN ( Crashed for like 8 to 9 hours ) postgress down my clinets cant log in cant fo anything ) when we should expect this to be live
?
2 hours ago
psql: error: connection to server at "monorail.proxy.rlwy.net" , port 30194 failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.the issues responsible for this too?
brody
For any technical issues, please open your own thread explaining any remaining issues.
an hour ago
I'm still getting bad gateway 502 error. Pls help my app is not accessible through cloudflare domain
an hour ago
Yup, for those with Postgres issues, I would redeploy and check that you have matching databases.
an hour ago
I get the feeling Railway team will grow wayyyy more robust from this, weary geeks on the verge. While I thought I broke the whole thing when adding an env var; I'm glad to have taken some time off the screen. Live and learn @railway; a crash like this is worthy of a bountiful bounce back!
angelo-railway
Yup, for those with Postgres issues, I would redeploy and check that you have matching databases.
an hour ago
Do I have to do anything from my side??
doryza
I get the feeling Railway team will grow wayyyy more robust from this, weary geeks on the verge. While I thought I broke the whole thing when adding an env var; I'm glad to have taken some time off the screen. Live and learn @railway; a crash like this is worthy of a bountiful bounce back!
an hour ago
Appreciate it, we still dissapointed (understatement) a lot of our customers, we'll keep on working for you all and improving the system.
an hour ago
Hello,
Could you please provide an estimated timeline for when this issue is expected to be resolved? Having a tentative timeframe would greatly help us manage our expectations and plan accordingly.
brody
For any technical issues, please open your own thread explaining any remaining issues.
an hour ago
My app is still down. On your status page you say: Monitoring
Railway services have fully recovered. Some workloads may still need a redeploy, we're automatically redeploying any we detect as unhealthy. If your service isn't responding correctly, please trigger a redeploy from the dashboard or CLI.
We're sorry for the disruption. A detailed postmortem will follow once we've confirmed stability.
Yet after restarting my app it immediately crashes without further info
progrennis
My app is still down. On your status page you say: Monitoring Railway services have fully recovered. Some workloads may still need a redeploy, we're automatically redeploying any we detect as unhealthy. If your service isn't responding correctly, please trigger a redeploy from the dashboard or CLI. We're sorry for the disruption. A detailed postmortem will follow once we've confirmed stability. Yet after restarting my app it immediately crashes without further info
an hour ago
Note how monitoring doesn't mean resolved for this exact reason, we are working case by case to get you and others in a good spot.
an hour ago
Good job with comms. Just to confirm, will redeploying my DB instance wipe my data? Our DB is still down and I want to see what works.
an hour ago
In our case, a redeployment fixed the Postgres issue without any data loss.
santidevi
You should add a feature that allows customers to deploy in replica mode across AWS, Google Cloud, and Azure. I’m willing to pay extra for this. For example, if Google Cloud goes down today, my instances running on AWS and Azure would still remain operational.
an hour ago
this would be a cool feature
jithinzac
In our case, a redeployment fixed the Postgres issue without any data loss.
an hour ago
It worked, No data loss after the redeploy.
an hour ago
I understand outages happen, but what I don’t understand is why my deployment wasn’t automatically restarted afterward. The incident happened overnight in Europe, so I woke up to 8+ hours of downtime reports from clients.
If instances had been automatically redeployed after the outage, the impact could likely have been reduced significantly, probably closer to 4–6 hours instead of 8+.
bogk9
I understand outages happen, but what I don’t understand is why my deployment wasn’t automatically restarted afterward. The incident happened overnight in Europe, so I woke up to 8+ hours of downtime reports from clients. If instances had been automatically redeployed after the outage, the impact could likely have been reduced significantly, probably closer to 4–6 hours instead of 8+.
39 minutes ago
We are indeed rolling through redeployments, however, it's a queued system to avoid back pressure. Absolutely heard on the feedback.
24 minutes ago
Good morning, I am still without a DB and cannot restart - need help!
13 minutes ago
Same for me, multiple services cannot be restarted
11 minutes ago
Good morning, I also have problem with my servers - please help!
asadullahjan
Is my mongodb getting crashed related to this or not? "MongoDB cannot start: Linux kernel versions 6.19 and newer has a known incompatibility with this version of MongoDB. See https://jira.mongodb.org/browse/SERVER-121912 for more information." It started happening now.
7 minutes ago
Yoo, my mongodb image keeps on crashing 😿
4 minutes ago
This frustrating, anyway, I ate a burger from yesterday for breakfast 😋












