Constant unreliability - root causes and plans to address
slazien
HOBBYOP

2 months ago

Hey,

So: I really like Railway. As a solo dev, it makes devops borderline pleasant. However, since I started using it ~1 month ago, there have been so many incidents which make it either: a) partially usable, b) unusable.

My questions:

  1. What is the common root cause of all these incidents?
  2. Can the cause be addressed if it hasn't already?
  3. Given how frequently this happens, and the fact that Railway is far from cheap, I think a partial refund of the subscription charges on a monthly basis would not be unreasonable

Looking forward to hearing more from Railway staff.

EDIT: I used to work at Datadog during a crazy-fast growth phase. When a series of incidents permanently eroded user trust, we saw a ton of customers leaving, to the point that a "code red" project was internally initiated by the CTO. Reliability is #1 when it comes to infra-related stuff, I'm sure you guys at Railway know this. Make the right tradeoffs. You're onto something as a business - don't end it prematurely.

Solved

16 Replies

Status changed to Awaiting Railway Response Railway 2 months ago


ivryb
PRO

2 months ago

Agree with that. I'm not only spending more $ on railway than on previous hosting provider, but in addition to that loosing money every time because of outages.

We need postmortem Railway.


2 months ago

Was going to write the same question - thanks!

Would love to see Railway taking accountability. I keep seeing Blog posts about new features and how Railway is growing. Why not one about how what's going to be done against the degradation of the services? At this point I'm going to call CI/CD: Continuous Degradation.

The deployments are not just "slow" as the incident page mentions. They're effectively down.


eluchsinger

Was going to write the same question - thanks! Would love to see Railway taking accountability. I keep seeing Blog posts about new features and how Railway is growing. Why not one about how what's going to be done against the degradation of the services? At this point I'm going to call CI/CD: Continuous Degradation. The deployments are not just "slow" as the incident page mentions. They're effectively down.

2 months ago

Heya, we can confirm that deploys are going through at the moment.

And yes, we're growing pretty fast at the moment which is putting a lot of stress on our systems. Not an excuse though! Our team is working hard to prevent these from happening again.


Status changed to Awaiting User Response Railway 2 months ago


nico

Heya, we can confirm that deploys _are_ going through at the moment. And yes, we're growing pretty fast at the moment which is putting a lot of stress on our systems. Not an excuse though! Our team is working hard to prevent these from happening again.

benanthony961
PRO

2 months ago

if you are growing pretty fast and losing functionality for the most core features daily, that's not growth it is disarray.

it's to the point where paying customers of my application are asking me to look at other options due to the downtime in pushing fixes and occasionally complete downtime of the service.


Status changed to Awaiting Railway Response Railway 2 months ago


adro21
HOBBY

2 months ago

Yeah bumping this...we can't possibly be expected to continue using Railway under these conditions. I can't be the only one thinking about moving everything off Railway considering this level of instability. It's just impossible to get anything done with this happening as often as it does.


davidptrssvea
HOBBY

2 months ago

Quite worried too about this! Happens every day at the moment!


nico

Heya, we can confirm that deploys _are_ going through at the moment. And yes, we're growing pretty fast at the moment which is putting a lot of stress on our systems. Not an excuse though! Our team is working hard to prevent these from happening again.

slazien
HOBBYOP

2 months ago

Thanks. Can you address each of my questions?


benanthony961
PRO

2 months ago

https://x.com/JustJake/status/2031202549190242632?s=20 a faster horse falls harder when it trips!


genxer24
HOBBY

2 months ago

Agreed. Having only used Railway for the last month or so, and seeing issues like this continually happen, have already started to think about different alternatives.


smoke3785
PRO

2 months ago

Thank you, this is exactly the post I came here to make. I really enjoy Railway when it works, but there has barely been a day without an outage of some sort this past month. We had a critical client demo fail, and had to send out email blasts to clients explaining why services are down. This period of outages has negatively impacted my business.

As much as I love Railway's developer-first approach, it must be understood that 2 features that work are better than 100 that don't - particularly when there are no shortage of competitors with impressive track records of reliability. When it comes time to cut something, DX is going to go before reliability 10 times out of 10.

I really hope that Railway is able to sort this out soon. I know I'm not the biggest account in the world, but I hope it means something to them that I am strongly considering leaving because of the reliability issues.


benanthony961
PRO

2 months ago

seems like they should have a waitlist. that would be responsible if they can't handle the load.


All heard there, we hit a new record day today. To provide some color, we had an unexpected build spike (and sign up spike) that overwhelmed the queue.

What we have done to protect the Pro experience is now separating out the build resources per plan which is why we were able to keep some of the builds but we weren't out entirely.


Status changed to Awaiting User Response Railway 2 months ago


benanthony961
PRO

2 months ago

Are more resources going to be delegated toward this issue? Will there be a proper postmortem? I don't care much about external features like MCP/skills/AI-diagnosis of build error if critical infrastructure is not working.


Status changed to Awaiting Railway Response Railway 2 months ago


Status changed to Awaiting User Response nico 2 months ago


slazien
HOBBYOP

2 months ago

Why was the status changed to "Status changed to Awaiting User Response itsrems" when I am still waiting for answers to 2 of my questions:

2. Can the cause be addressed if it hasn't already?

  1. Given how frequently this happens, and the fact that Railway is far from cheap, I think a partial refund of the subscription charges on a monthly basis would not be unreasonable

and other users' questions (such as benanthony961's) also haven't been addressed?


Status changed to Awaiting Railway Response Railway 2 months ago


2 months ago

Apologies for the premature status change. To address your remaining questions: the root cause of recent incidents has been build queue congestion from rapid growth in sign-ups and deploy volume. This is actively being addressed and further infrastructure changes underway. For a refund, you can submit a request directly from your workspace billing page by following the steps at https://docs.railway.com/reference/pricing/refunds#requesting-a-refund.


Status changed to Awaiting User Response Railway 2 months ago


Railway
BOT

2 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway about 2 months ago


slazien
HOBBYOP

a month ago

To provide an update to everyone: it took me ~30 mins. using Claude Code to move to Hetzner with a nice, custom domain. Cost slashed in half, deployment and monitoring set up using Claude. More overhead setup but more control later on. Win-win for me.


Status changed to Awaiting Railway Response Railway about 1 month ago


Status changed to Solved Railway about 1 month ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...