2 months ago
Hey,
So: I really like Railway. As a solo dev, it makes devops borderline pleasant. However, since I started using it ~1 month ago, there have been so many incidents which make it either: a) partially usable, b) unusable.
My questions:
- What is the common root cause of all these incidents?
- Can the cause be addressed if it hasn't already?
- Given how frequently this happens, and the fact that Railway is far from cheap, I think a partial refund of the subscription charges on a monthly basis would not be unreasonable
Looking forward to hearing more from Railway staff.
EDIT: I used to work at Datadog during a crazy-fast growth phase. When a series of incidents permanently eroded user trust, we saw a ton of customers leaving, to the point that a "code red" project was internally initiated by the CTO. Reliability is #1 when it comes to infra-related stuff, I'm sure you guys at Railway know this. Make the right tradeoffs. You're onto something as a business - don't end it prematurely.
16 Replies
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
Agree with that. I'm not only spending more $ on railway than on previous hosting provider, but in addition to that loosing money every time because of outages.
We need postmortem Railway.
2 months ago
Was going to write the same question - thanks!
Would love to see Railway taking accountability. I keep seeing Blog posts about new features and how Railway is growing. Why not one about how what's going to be done against the degradation of the services? At this point I'm going to call CI/CD: Continuous Degradation.
The deployments are not just "slow" as the incident page mentions. They're effectively down.
eluchsinger
Was going to write the same question - thanks! Would love to see Railway taking accountability. I keep seeing Blog posts about new features and how Railway is growing. Why not one about how what's going to be done against the degradation of the services? At this point I'm going to call CI/CD: Continuous Degradation. The deployments are not just "slow" as the incident page mentions. They're effectively down.
2 months ago
Heya, we can confirm that deploys are going through at the moment.
And yes, we're growing pretty fast at the moment which is putting a lot of stress on our systems. Not an excuse though! Our team is working hard to prevent these from happening again.
Status changed to Awaiting User Response Railway • 2 months ago
nico
Heya, we can confirm that deploys _are_ going through at the moment. And yes, we're growing pretty fast at the moment which is putting a lot of stress on our systems. Not an excuse though! Our team is working hard to prevent these from happening again.
2 months ago
if you are growing pretty fast and losing functionality for the most core features daily, that's not growth it is disarray.
it's to the point where paying customers of my application are asking me to look at other options due to the downtime in pushing fixes and occasionally complete downtime of the service.
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
Yeah bumping this...we can't possibly be expected to continue using Railway under these conditions. I can't be the only one thinking about moving everything off Railway considering this level of instability. It's just impossible to get anything done with this happening as often as it does.
nico
Heya, we can confirm that deploys _are_ going through at the moment. And yes, we're growing pretty fast at the moment which is putting a lot of stress on our systems. Not an excuse though! Our team is working hard to prevent these from happening again.
2 months ago
Thanks. Can you address each of my questions?
2 months ago
https://x.com/JustJake/status/2031202549190242632?s=20 a faster horse falls harder when it trips!
2 months ago
Agreed. Having only used Railway for the last month or so, and seeing issues like this continually happen, have already started to think about different alternatives.
2 months ago
Thank you, this is exactly the post I came here to make. I really enjoy Railway when it works, but there has barely been a day without an outage of some sort this past month. We had a critical client demo fail, and had to send out email blasts to clients explaining why services are down. This period of outages has negatively impacted my business.
As much as I love Railway's developer-first approach, it must be understood that 2 features that work are better than 100 that don't - particularly when there are no shortage of competitors with impressive track records of reliability. When it comes time to cut something, DX is going to go before reliability 10 times out of 10.
I really hope that Railway is able to sort this out soon. I know I'm not the biggest account in the world, but I hope it means something to them that I am strongly considering leaving because of the reliability issues.
2 months ago
seems like they should have a waitlist. that would be responsible if they can't handle the load.
2 months ago
All heard there, we hit a new record day today. To provide some color, we had an unexpected build spike (and sign up spike) that overwhelmed the queue.
What we have done to protect the Pro experience is now separating out the build resources per plan which is why we were able to keep some of the builds but we weren't out entirely.
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
Are more resources going to be delegated toward this issue? Will there be a proper postmortem? I don't care much about external features like MCP/skills/AI-diagnosis of build error if critical infrastructure is not working.
Status changed to Awaiting Railway Response Railway • 2 months ago
Status changed to Awaiting User Response nico • 2 months ago
2 months ago
Why was the status changed to "Status changed to Awaiting User Response itsrems" when I am still waiting for answers to 2 of my questions:
2. Can the cause be addressed if it hasn't already?
- Given how frequently this happens, and the fact that Railway is far from cheap, I think a partial refund of the subscription charges on a monthly basis would not be unreasonable
and other users' questions (such as benanthony961's) also haven't been addressed?
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
Apologies for the premature status change. To address your remaining questions: the root cause of recent incidents has been build queue congestion from rapid growth in sign-ups and deploy volume. This is actively being addressed and further infrastructure changes underway. For a refund, you can submit a request directly from your workspace billing page by following the steps at https://docs.railway.com/reference/pricing/refunds#requesting-a-refund.
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • about 2 months ago
a month ago
To provide an update to everyone: it took me ~30 mins. using Claude Code to move to Hetzner with a nice, custom domain. Cost slashed in half, deployment and monitoring set up using Claude. More overhead setup but more control later on. Win-win for me.
Status changed to Awaiting Railway Response Railway • about 1 month ago
Status changed to Solved Railway • about 1 month ago




