a month ago
Now that Railway is back online, the question is not just “what broke?” It is “why was Railway’s Google Cloud account blocked, why did that affect unrelated customer workloads, and what has changed to prevent recurrence?”
At 3:30 PM PST I went into a meeting with working projects. During that meeting, those projects failed in real time. For production-facing work, that is a material reliability issue.
This does not appear to be a general Google Cloud outage. I was able to stand up equivalent projects on Google One without reproducing the same behavior during the outage.
Railway’s status page indicated that Google Cloud had blocked Railway’s account. That needs a real RCA. Was this billing-related? Abuse-related? Automated enforcement? Account verification? A policy trigger? Something else entirely?
Customers should not have to speculate.
A “please don’t host these kinds of sites” agreement is not a technical mitigation. If a workload class or account-level action can destabilize unrelated customer projects, the platform needs stronger blast-radius control, workload isolation, quota enforcement, abuse detection, automated containment, and provider-level failover planning.
The support path is also part of the incident. If X was the only effective way to reach someone during an active outage, that is not acceptable for a production hosting platform. There should be a documented escalation channel, incident comms, status updates, and support routing that do not depend on a public social network.
Railway should immediately publish a postmortem covering:
- Why Google Cloud blocked the account
- Whether billing, abuse, policy enforcement, or automated controls were involved
- Why customer workloads shared the blast radius
- Which isolation or containment controls failed
- What mitigations are now in place
- What the official incident escalation path is
Customers do not need perfection. They need resilience, containment, clear escalation, and a credible RCA.
1 Replies
a month ago
Totally agree. All google cloud was up and running. This was a problem with Railway exclusively and unfortunately, we have no idea and don´t understand what was the real origin of what happened.
