2 months ago
Hey folks.
We've published a postmortem of the CDN caching incident on 30th March, 2026:
https://blog.railway.com/p/incident-report-march-30-2026-accidental-cdn-caching
Affected users should have received an email by now.
We're available here if you have any questions.
0 Threads mention this feature
47 Replies
2 months ago
Hello! Do all users in the workspace get an email or just workspace owner? Thanks
2 months ago
Our app became unuseable for hours during this ordeal last night - normal requests that should have taken a few seconds were taking 1-10+ minutes or hanging completely, affecting enterprise customers. Nothing on the status page describing the CDN situation indicated this could be related so I was debugging from my end. I posted a support thread where the AI support agent indicated this WAS caused by the CDN issue (still no human response) so I urgently moved ourselved to another provider and pulled an all nighter to get ourselves back online and to do a damage control audit, investigating for potential security and data breaches due to Railway's error. I have not received an email from Railway although they said they have emailed all affected users hours ago.
ramadanomar
Hello! Do all users in the workspace get an email or just workspace owner? Thanks
2 months ago
Just the workspace owner.
Our app became unuseable for hours during this ordeal last night - normal requests that should have taken a few seconds were taking 1-10+ minutes or hanging completely, affecting enterprise customers. Nothing on the status page describing the CDN situation indicated this could be related so I was debugging from my end. I posted a support thread where the AI support agent indicated this WAS caused by the CDN issue (still no human response) so I urgently moved ourselved to another provider and pulled an all nighter to get ourselves back online and to do a damage control audit, investigating for potential security and data breaches due to Railway's error. I have not received an email from Railway although they said they have emailed all affected users hours ago.
2 months ago
The performance impact you faced wasn't tied to this incident unfortunately. If you haven't gotten an email, you weren't impacted.
angelo-railway
The performance impact you faced wasn't tied to this incident unfortunately. If you haven't gotten an email, you weren't impacted.
2 months ago
In that case, there's an additional critical error that Railway experienced last night (and may still be experiencing) which you guys aren't aware of? Which is almost more worrying: https://station.railway.com/support/as-of-1-2-hours-ago-extremely-slow-netw-1792e047?action=null&rrid=null
I still haven't received a human response to this
Our web app immediately recovered once we performed an emergency migration to Render
angelo-railway
Just the workspace owner.
2 months ago
I never received an email as the workspace owner, and I only opened a support ticket because we started seeing authenticated user information leaking between customers. At the time, I did not know what was happening, but I later realized this was caused by the CDN issue.
Some form of proactive notification would have been greatly appreciated. Our project was definitely affected, and this may have damaged trust with a high-value customer we were expecting to convert. Our data is sensitive, so seeing customer information leak across accounts was honestly shocking.
I would really appreciate hearing directly from someone at Railway.
th3impal3r
I never received an email as the workspace owner, and I only opened a support ticket because we started seeing authenticated user information leaking between customers. At the time, I did not know what was happening, but I later realized this was caused by the CDN issue. Some form of proactive notification would have been greatly appreciated. Our project was definitely affected, and this may have damaged trust with a high-value customer we were expecting to convert. Our data is sensitive, so seeing customer information leak across accounts was honestly shocking. I would really appreciate hearing directly from someone at Railway.
2 months ago
Same here!
2 months ago
Will you provide compensation for this matter?
angelo-railway
The performance impact you faced wasn't tied to this incident unfortunately. If you haven't gotten an email, you weren't impacted.
2 months ago
In another private thread you just contradicted what you wrote above - you wrote the following to me:
Apologies for the critical performance degradation your app experienced. We had a CDN caching incident on 2026-03-30 from 11:00–11:30 UTC that affected request routing and caused severe latency for web applications.
The incident is now resolved. Your app should be performing normally again.
Can you confirm that your requests are back to normal speed? If you're still seeing degraded performance, please let us know and we can investigate further.
Just documenting this here in case other people have also experienced performance degredation, severe latency, etc. and didn't receive an email and are being told they were unaffected
th3impal3r
I never received an email as the workspace owner, and I only opened a support ticket because we started seeing authenticated user information leaking between customers. At the time, I did not know what was happening, but I later realized this was caused by the CDN issue. Some form of proactive notification would have been greatly appreciated. Our project was definitely affected, and this may have damaged trust with a high-value customer we were expecting to convert. Our data is sensitive, so seeing customer information leak across accounts was honestly shocking. I would really appreciate hearing directly from someone at Railway.
2 months ago
Same! We were affected but did not receive an email.
2 months ago
No notification was sent to us, despite actual customer data being exposed to other users. Not having full visibility into who is affected at this point is unacceptable.
jpmin7
No notification was sent to us, despite actual customer data being exposed to other users. Not having full visibility into who is affected at this point is unacceptable.
2 months ago
Damn.
2 months ago
We believe our custom domain issue is related to this CDN incident. We configured hotnovelas.com on March 30th, 2026 (during the incident window) and are still getting x-railway-fallback: true as of April 1st. Details: - Project ID: 6ee79a19-e39f-4508-a0f0-b6d6a77f07da - Service ID: 62bcf496-079d-4f22-a611-5c6f9d5bc964 - Domain: hotnovelas.com CNAME x1a9l4d9.up.railway.app (Cloudflare proxied) - www.hotnovelas.com CNAME 5y6p4b5m.up.railway.app (Cloudflare proxied) - Railway API: syncStatus=ACTIVE, DNS_RECORD_STATUS_PROPAGATED - App works fine on hotnovelas-app-production.up.railway.app The domain was created and deleted multiple times during the incident window, which likely left stale Fastly CDN routing entries. Could you please flush or re-provision CDN routing for these two domains? Thank you.
2 months ago
sorry
2 months ago
sorry
2 months ago
sorry
2 months ago
sorry
2 months ago
sorry
2 months ago
sorry. I let AI come here and explain the problem, and I think she went too far! 😄
jpmin7
No notification was sent to us, despite actual customer data being exposed to other users. Not having full visibility into who is affected at this point is unacceptable.
2 months ago
We were also definitely affected - but no notification or response at all so far, almost 48 hours later. If the response from Railway is going to work this way, I think it is better to assume being affected, until getting confirmation of a negative.
2 months ago
@ray-chen no response?
2 months ago
We were impacted too. We got many complains from customers.
Still received no email from Railway about this. Waiting for an explanation.
2 months ago
Hello all,
I am going to be leading the communications about the impact and timelines. Forgive me on the radio silence on this public thread, we've been prioritizing communication over the private support threads and email communications that many of you all have raised with us.
We have updated the root post with additional information. Unfortunately, we have confirmed that the fault was with an upstream provider but as the principal point of contact between your business and Railway, we have to own the response.
If you haven't I encourage you to open a private thread or email us at support@railway.com and we will provide further information about impact as much as possible.
angelo-railway
Hello all, I am going to be leading the communications about the impact and timelines. Forgive me on the radio silence on this public thread, we've been prioritizing communication over the private support threads and email communications that many of you all have raised with us. We have updated the root post with additional information. Unfortunately, we have confirmed that the fault was with an upstream provider but as the principal point of contact between your business and Railway, we have to own the response. If you haven't I encourage you to open a private thread or email us at [support@railway.com](mailto:support@railway.com) and we will provide further information about impact as much as possible.
2 months ago
Any ETA on when we will get a reply? Stakeholders are putting pressure on us
2 months ago
Unfortunately, we have confirmed that the fault was with an upstream provider but as the principal point of contact between your business and Railway, we have to own the response.
How was the fault with the upstream provider?
pg
> Unfortunately, we have confirmed that the fault was with an upstream provider but as the principal point of contact between your business and Railway, we have to own the response. How was the fault with the upstream provider?
2 months ago
The full technical breakdown is in the published postmortem at https://blog.railway.com/p/incident-report-march-30-2026-accidental-cdn-caching. After our joint investigation, we will provide more information.
In short, a configuration update we pushed to the CDN edge layer to enable per-domain cache identification inadvertently overrode the check that skips caching for domains without CDN enabled contrary to the spec that was published that we were expecting from IETF RFC 9111. Unfortunately, we are under NDA with that upstream provider so we can't share specifics from the upstream failure mode, only the mitigations done after.
ramadanomar
Any ETA on when we will get a reply? Stakeholders are putting pressure on us
2 months ago
If you have expedited time reporting requirements, you can bump or make a private thread and we can provide information there, I have full access to our network logs.
2 months ago
Seems like you guys are now deflecting blame. The initial incident page was clear it was user error, never mind the rollout/staged deployment blunder, downplaying of the blog post before the edits after being questioned etc. so sorry if I don't believe you.
The excuse of a service not following an RFC spec strictly is pretty embarrassing. I don't think I've seen an RFC spec implemented perfectly and be usable. That'd be like if I started blaming a language's JSON parser because it didn't follow RFC 8259 (none do https://seriot.ch/software/parsing%5Fjson.html) when I just didn't do my homework properly.
theden
Seems like you guys are now deflecting blame. The initial incident page was clear it was user error, never mind the rollout/staged deployment blunder, downplaying of the blog post before the edits after being questioned etc. so sorry if I don't believe you. The excuse of a service not following an RFC spec strictly is pretty embarrassing. I don't think I've seen an RFC spec implemented perfectly and be usable. That'd be like if I started blaming a language's JSON parser because it didn't follow RFC 8259 (none do <https://seriot.ch/software/parsing%5Fjson.html>) when I just didn't do my homework properly.
2 months ago
To be clear: the configuration change was ours, and we own that. The postmortem says that, the incident page said that, and nothing we've edited changes that.
That said, I'd push back on your analogy. A JSON parser being loose with trailing commas is a entirely different universe from a CDN layer silently serving cached responses for domains that never opted into caching. The severity of a spec deviation matters, when the failure mode is data exposure, "don't think I've seen an RFC spec implemented perfectly and be usable." isn't really a comparable situation. There are clear expectations when it comes to respecting cache control, and it was a reasonable expectation on our side that said setting wouldn't cause such extreme downstream pain.
We should have had our own safeguards to prevent this regardless, and they are now in place. But noting that a critical component didn't behave as specified is not "deflecting blame" when many of the people affected, esp. those who are reporting for legal/GDPR disclosures have a mandated requirement to know how this happened.
In addition, the postmortem edits were additive based on community questions, not revisions to shift blame. We're constrained in what we can share about the upstream provider, but we're being as transparent as those constraints allow. That said, I understand the frustration (and your statement about our credibility, which is fair)- believe me.
angelo-railway
To be clear: the configuration change was ours, and we own that. The postmortem says that, the incident page said that, and nothing we've edited changes that. That said, I'd push back on your analogy. A JSON parser being loose with trailing commas is a entirely different universe from a CDN layer silently serving cached responses for domains that never opted into caching. The severity of a spec deviation matters, when the failure mode is data exposure, "don't think I've seen an RFC spec implemented perfectly and be usable." isn't really a comparable situation. There are clear expectations when it comes to respecting cache control, and it was a reasonable expectation on our side that said setting wouldn't cause such extreme downstream pain. We should have had our own safeguards to prevent this regardless, and they are now in place. But noting that a critical component didn't behave as specified is not "deflecting blame" when many of the people affected, esp. those who are reporting for legal/GDPR disclosures have a mandated requirement to know how this happened. In addition, the postmortem edits were additive based on community questions, not revisions to shift blame. We're constrained in what we can share about the upstream provider, but we're being as transparent as those constraints allow. That said, I understand the frustration (and your statement about our credibility, which is fair)- believe me.
2 months ago
Fair on the trailing comma and cache-control being different beasts, but frankly, relying on commercial third-party software to perfectly implement an RFC spec is horribly naive. Anyone who's spent real time in infrastructure knows that you never assume a third party handles edge cases the way the spec says they should. The RFC is a specification not a guarantee. Treating it as one when the failure mode is data exposure is not a reasonable assumption, it's a gap in your process.
The safeguards you've now added would have prevented this regardless of spec compliance, then the spec non-compliance is context at best, and misdirection at worst.
Calling those edits "additive based on community questions" is a generous way to describe what most of us experienced as backpedaling under pressure. The subpar comms and framing around this incident has consistently felt like it's doing more work to protect Railway's reputation than to inform affected users.
Let's be real, the NDA shield isn't helping your case. "We can't share specifics about the upstream failure mode" paired with a putting the fault on them is not a combination that rebuilds trust. You're asking the community to take your word on the one part of the story you can't let anyone verify. I understand contractual constraints are real, but you have to see how that lands from the outside, especially when you've already majorly fractured user trust.
theden
Fair on the trailing comma and cache-control being different beasts, but frankly, relying on commercial third-party software to perfectly implement an RFC spec is horribly naive. Anyone who's spent real time in infrastructure knows that you never assume a third party handles edge cases the way the spec says they should. The RFC is a specification not a guarantee. Treating it as one when the failure mode is data exposure is not a reasonable assumption, it's a gap in your process. The safeguards you've now added would have prevented this regardless of spec compliance, then the spec non-compliance is context at best, and misdirection at worst. Calling those edits "additive based on community questions" is a generous way to describe what most of us experienced as backpedaling under pressure. The subpar comms and framing around this incident has consistently felt like it's doing more work to protect Railway's reputation than to inform affected users. Let's be real, the NDA shield isn't helping your case. "We can't share specifics about the upstream failure mode" paired with a putting the fault on them is not a combination that rebuilds trust. You're asking the community to take your word on the one part of the story you can't let anyone verify. I understand contractual constraints are real, but you have to see how that lands from the outside, especially when you've already majorly fractured user trust.
2 months ago
The safeguards you've now added would have prevented this regardless of spec compliance, then the spec non-compliance is context at best, and misdirection at worst.
Well it was from us not using said vendor anymore: so I do get where you are coming from but the big issue was there was a functionality that was benign, the docs said it was benign, the spec said it would be benign, and then we canary it, and boom 3,000 people hit. I truly apologize if it came off as us trying to deflect blame.
As for your other concern on "relying on commercial third-party software to perfectly implement an RFC", you have a point, but they are a reputable publicly listed company. With the MSAs, processes, ...and yes contracts, one would think that said countermeasures weren't necessary- but if you had to, that defeats the purpose of the vendor.
In the same way that our customers expect that a deploy on Railway just... deploys. (...and yes, protect data.) Lesson learned, albeit at a massive cost to us, our reputation, and it's one we have to eat with a smile. The larger concern I hold and still hold was all of the customers, esp. the ones you see in this very thread, be put in a massively horrible situation with their own businesses.
I know you are arguing in good faith, I know you (and others here) want us to do well, else you wouldn't be here engaging. (And you have the option to hold us to task in Slack, yet you are here- so I appreciate that.)
Which leads me to my next segue-way, communication. I agree, we should have done better. Sharing my point of view, it's a balance between timeliness and accuracy. We posted a status page update as soon as we confirmed impact. There are many reports that required corroboration, on the outside, it looks like we ignored it, but it's hard to both navigate and communicate. I know this situation wasn't helped with prior outages related to scale, but I do promise that we have taken the communication cadence extraordinarily seriously. So much so, we built a new status page from the ground up to help flag instance where community reports may be ahead of alerting (example: if Telstra IX doesn't notify us a undersea cable is cut.)
Where we did fail was the following on communicating our investigation, we were focused on meeting BAA/GDPR/California Privacy timelines while also coordinating with said vendor trying to get the accountability on our side. I apologize. I am all ears on how we can improve.
The last thing I'd like to speak to, which is my personal view applied to this matter is that this incident wouldn't be as painful as it was if there wasn't a brand new failure mode in the experience with Railway. I absolutely am empathetic with how absolutely hectic it must have been from a user's perspective when you have builds go, GH web hooks go, and then 10+ DDoSes. It's totally not lost on me that I am here after all that saying "trust me bro" and how one would be totally past it. (Deservedly so) Although the caching incident is separate in terms of how this transpired, I can totally understand why someone would be led to believe that we don't have a culture of safety.
At the end of the day, the job is simple, and the team is willing to take on as much pain as possible to make a simple trade: you ship on Railway, and Railway doesn't make that your problem. We made it your problem. The only thing that fixes that is time and execution, and we owe you and our customers both. I will pay that debt back for the rest of my life.
2 months ago
Hey everyone who is affected, have you been offered any monetary/credits compensation from Railway? I know for a fact that the initially identified domain owners have received credits from Railway for their pains.
My domain was very much affected, but we have not received any compensation yet.
Is someone else similarly affected?
haksonzvakson
Hey everyone who is affected, have you been offered any monetary/credits compensation from Railway? I know for a fact that the initially identified domain owners have received credits from Railway for their pains. My domain was very much affected, but we have not received any compensation yet. Is someone else similarly affected?
2 months ago
I'm also affected, with documentation from customers. Railway acknowledged that it happened, but no reply for 3 days or talk of compensation.
2 months ago
Yes, still nothing, even though we've received numerous complaints from customers.
2 months ago
Still have not received any coms or refund about the larger deployment & database issue a weeks ago. What is wrong with you folks
2 months ago
We shouldn't rely on a sinking ship. I'm currently planning our migration to AWS Lightsail, as it offers the reliability and predictability we need. At this point, Railway's platform is simply too underdeveloped and immature.
harshjk
We shouldn't rely on a sinking ship. I'm currently planning our migration to AWS Lightsail, as it offers the reliability and predictability we need. At this point, Railway's platform is simply too underdeveloped and immature.
2 months ago
Hi all, we've been responding to individual threads over the past few days with domain-specific traffic data and incident details. If you haven't received a response yet, please check your thread or open a private one at station.railway.com and we'll follow up directly.
We're handling each case individually as the impact varies per domain. If you open a private thread at station.railway.com we can gather your specific traffic data from the incident window.
2 months ago
I'm with Railway since 2 months and the number of time the app crashed it's not professional. I'm thinking to transfer everything. Really bad experience from my side.
2 months ago
Same case here, they sent an email that they have 'applied a $17 credit' to my account but so far there is nothing.
angelo-railway
Hi all, we've been responding to individual threads over the past few days with domain-specific traffic data and incident details. If you haven't received a response yet, please check your thread or open a private one at [station.railway.com](http://station.railway.com) and we'll follow up directly. We're handling each case individually as the impact varies per domain. If you open a private thread at [station.railway.com](http://station.railway.com) we can gather your specific traffic data from the incident window.
2 months ago
Can you please help me with me deployment? my website is down, while I'm running dozens of ads on Facebook. I now tried to re-deploy old version and still it's stuck.
angelo-railway
Hi all, we've been responding to individual threads over the past few days with domain-specific traffic data and incident details. If you haven't received a response yet, please check your thread or open a private one at [station.railway.com](http://station.railway.com) and we'll follow up directly. We're handling each case individually as the impact varies per domain. If you open a private thread at [station.railway.com](http://station.railway.com) we can gather your specific traffic data from the incident window.
2 months ago
Interesting lack of mention of compensation.
I have received their response via channel Angelo is mentioning, and they just confirmed that incident has happened and affected my domain lol.
Why did some Railway customers receive compensation and some did not? How did you make this distinction?
smartsvgai
I'm with Railway since 2 months and the number of time the app crashed it's not professional. I'm thinking to transfer everything. Really bad experience from my side.
2 months ago
As unfortunate as that is, that is not material to the OP. Please raise a new issue so we can investigate it.
zfermi
Same case here, they sent an email that they have 'applied a $17 credit' to my account but so far there is nothing.
2 months ago
In your case, it's likely that it was sent to an admin of the workspace. For some people in this thread, it turns out that the email was sent solely to the admins in the first wave.
angelo-railway
In your case, it's likely that it was sent to an admin of the workspace. For some people in this thread, it turns out that the email was sent solely to the admins in the first wave.
a month ago
As the administrator of the workspace, I have not received any emails.
a month ago
no emails were sent I dont think.
a month ago
No emails as well
a month ago
No emails for me too.. Initially all my applications could Not connect to my database. After I restarted the databases. My users could see each others data.. i got to know about this when one, my clients informed me. He could say exactly how much is being charged and how much the payout my executive is getting. I had to immediately shut off everything. wasPlanning to scale to enterprise package. But now I'm really worried. can we rely on railway.? or need to move to AWS





