Why wasn’t I notified before the service disruption
henryo
PROOP

2 months ago

I received an email regarding a service disruption, however the service had already been disrupted before the notification was sent.

It would be more helpful if disruption notices were sent in advance, rather than after the issue has already occurred. This would allow users to prepare accordingly.

I would also like to understand why this maintenance or disruption was scheduled during peak hours, as this significantly impacts usability.

At the moment, the timing and communication around this disruption do not make much sense from a user perspective.

I would appreciate clarification and hope this can be improved in the future.

73 Replies

planning1234
HOBBY

2 months ago

Agree, I have the same issue. My service is down since 60m+

Received the same information after 30m downtime.

Upgraded to Pro for support, yet no answers.

Personal highlight is the agent suggesting to delete my database as a solution.


sejodjoseon
PRO

2 months ago

I completely agree with @henryo and @planning1234. We are currently facing a total service stoppage across our environments due to these persistent network timeouts.

Despite upgrading to a Paid tier for reliability, we are seeing 60+ minutes of downtime with zero proactive communication. Our entire operation is currently halted because neither the internal Service Mesh nor the Public TCP Proxies are responding.

Performing unscheduled maintenance or allowing such disruptions during peak hours without prior notice is unacceptable for a production-grade hosting service.

We expect a rapid resolution and, more importantly, a commitment to better communication protocols. We shouldn't have to find out about infrastructure failures through forum posts while our services are already dark.


zielgestalt
PRO

2 months ago

Same for us.


chandrika
EMPLOYEE

2 months ago

Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work.

I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either.

The affected infrastructure has been recovered and your services should be back online or very close to it.

I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.


chandrika

Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work. I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either. The affected infrastructure has been recovered and your services should be back online or very close to it. I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.

joelmoss
PRO

2 months ago

Why does your status page say all is ok then?


zed077
PRO

2 months ago

This is unacceptable. Clients were testing on the environment that went down. The point of a Cloud provider is that this does not happen.


chandrika

Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work. I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either. The affected infrastructure has been recovered and your services should be back online or very close to it. I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.

henryo
PROOP

2 months ago

Thanks for the clarification. I understand unexpected host failures happen.

My main concern was that the notification came quite a while after the outage started. Faster incident communication would really help users understand what’s going on.


chandrika
EMPLOYEE

2 months ago

@joelmoss: Fair question. This issue was localized to a single host, not a platform-wide outage, which is why it didn't trigger a status page update. That said, I hear you — if your services are down, that distinction doesn't really matter from your perspective and this is valid feedback.

@zed077: I understand, and I'm really sorry your clients were impacted. No excuses — hardware can and does fail, and our job is to recover as fast as possible and communicate clearly when it does. We're working on both.

@planning1234 re: "Personal highlight is the agent suggesting to delete my database as a solution." I'll make sure to raise this internally to the team working on that feature, sorry the agent said that, we'll still working on making it excellent


zielgestalt
PRO

2 months ago

Are your services online again? I'm still waiting...


chrisangele
PRO

2 months ago

It is still down for me, is the data corrupted? I did not have any backups :-(


maximilian-schwarz
PRO

2 months ago

I'm still waiting, too.


azuyah
PRO

2 months ago

Still nothing, Google even denied our app because they "couldn't log in" this was sent to us like 40 minutes ago so now I have to submit the app again and possibly wait another week.


maximilian-schwarz

I'm still waiting, too.

directsyndikat
PRO

2 months ago

same


joshibbotson
PRO

2 months ago

My Redis instance is still down and my customers are emailing me as we speak. When will this be fixed? The email said half an hour it's been over an hour now.


zed077
PRO

2 months ago

Can we spin another server from our backups? Or are the backups on the same machine?


pooyahrtn
HOBBY

2 months ago

Same here. It's totally fine if you were consider and communicate it as an incident, but framing it as

> Your Railway service will be affected by an infrastructure maintenance event.

makes it really unprofessional. As if you want to not affect your SLA. The fact you didn't mention in the status page didn't really help, since coming to this thread wasn't the first thing you'd be expecting.


maxipizzasa
PRO

2 months ago

same problem, my posqgresql service is down. Any updates?


jouleetech
PRO

2 months ago

me the same, @railway whats wrong?


dalmeidat
PRO

2 months ago

Sorry to say but you basically lied to your customers by saying it was a maintenance thing and when you say it's all back up again it turns out it is not. This makes the whole thing look very unprofessional apart from the outage itself


chrisangele
PRO

2 months ago

Please give us an update or fix ASAP


chandrika

@joelmoss: Fair question. This issue was localized to a single host, not a platform-wide outage, which is why it didn't trigger a status page update. That said, I hear you — if your services are down, that distinction doesn't really matter from your perspective and this is valid feedback. @zed077: I understand, and I'm really sorry your clients were impacted. No excuses — hardware can and does fail, and our job is to recover as fast as possible and communicate clearly when it does. We're working on both. @planning1234 re: "Personal highlight is the agent suggesting to delete my database as a solution." I'll make sure to raise this internally to the team working on that feature, sorry the agent said that, we'll still working on making it excellent

henryo
PROOP

2 months ago

It's over an hour now, would appreciate an update on recovery ETA.


jaspspain
PRO

2 months ago

me the same... since more than 1h 30min .....


sjotie
PRO

2 months ago

Can we get an update please? This has been taking a long time and I need to plan around this


chandrika
EMPLOYEE

2 months ago

Hey all, thank you for your patience and I' hear every single one of you.

A quick update: the recovery is still in progress. Services are coming back online but some take a little longer than others, especially those running databases. Your data is safe, this was not a data loss event.

To give some context on the scope: this affected a small subset of all workloads on Railway, which is why we sent targeted notifications to the specific users affected rather than a public status page update. I completely understand the feedback from @pooyahrtn and others that the "infrastructure maintenance event" framing felt wrong, you're right, this was not a scheduled maintenance, and that's fair criticism. I'll raise this internally.

@chrisangele: your data should be intact, this was not a data loss event.

@zed077: backups are stored separately and are not affected.

@azuyah: I'm really sorry about the Google review timing. If there's anything we can provide to help with the resubmission, please let us know.

@joshibbotson: your service should come back shortly as things recover. If it doesn't, let me know and I'll look into it directly.

For anyone still waiting, I'm here monitoring the recovery with the infrastructure team and will update this thread. I'll also be sending out another wave of notification emails as the recovery is taking longer than expected.

Again, genuinely sorry for the disruption. All the feedback here on communication, framing, and the status page is noted and I'll raise it


chandrika

@joelmoss: Fair question. This issue was localized to a single host, not a platform-wide outage, which is why it didn't trigger a status page update. That said, I hear you — if your services are down, that distinction doesn't really matter from your perspective and this is valid feedback. @zed077: I understand, and I'm really sorry your clients were impacted. No excuses — hardware can and does fail, and our job is to recover as fast as possible and communicate clearly when it does. We're working on both. @planning1234 re: "Personal highlight is the agent suggesting to delete my database as a solution." I'll make sure to raise this internally to the team working on that feature, sorry the agent said that, we'll still working on making it excellent

jaspspain
PRO

2 months ago

iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system.

I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.


jouleetech
PRO

2 months ago

@chandrika, well i didnt get any notification? so thats also not true


chandrika

Hey all, thank you for your patience and I' hear every single one of you. A quick update: the recovery is still in progress. Services are coming back online but some take a little longer than others, especially those running databases. Your data is safe, this was not a data loss event. To give some context on the scope: this affected a small subset of all workloads on Railway, which is why we sent targeted notifications to the specific users affected rather than a public status page update. I completely understand the feedback from @pooyahrtn and others that the "infrastructure maintenance event" framing felt wrong, you're right, this was not a scheduled maintenance, and that's fair criticism. I'll raise this internally. @chrisangele: your data should be intact, this was not a data loss event. @zed077: backups are stored separately and are not affected. @azuyah: I'm really sorry about the Google review timing. If there's anything we can provide to help with the resubmission, please let us know. @joshibbotson: your service should come back shortly as things recover. If it doesn't, let me know and I'll look into it directly. For anyone still waiting, I'm here monitoring the recovery with the infrastructure team and will update this thread. I'll also be sending out another wave of notification emails as the recovery is taking longer than expected. Again, genuinely sorry for the disruption. All the feedback here on communication, framing, and the status page is noted and I'll raise it

zed077
PRO

2 months ago

So please tell us how to spin another server and restore our backups to that.


jaspspain

iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system. I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.

henryo
PROOP

2 months ago

I agree with you on this.


jaspspain

iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system. I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.

zed077
PRO

2 months ago

Agreed


chandrika

Hey all, thank you for your patience and I' hear every single one of you. A quick update: the recovery is still in progress. Services are coming back online but some take a little longer than others, especially those running databases. Your data is safe, this was not a data loss event. To give some context on the scope: this affected a small subset of all workloads on Railway, which is why we sent targeted notifications to the specific users affected rather than a public status page update. I completely understand the feedback from @pooyahrtn and others that the "infrastructure maintenance event" framing felt wrong, you're right, this was not a scheduled maintenance, and that's fair criticism. I'll raise this internally. @chrisangele: your data should be intact, this was not a data loss event. @zed077: backups are stored separately and are not affected. @azuyah: I'm really sorry about the Google review timing. If there's anything we can provide to help with the resubmission, please let us know. @joshibbotson: your service should come back shortly as things recover. If it doesn't, let me know and I'll look into it directly. For anyone still waiting, I'm here monitoring the recovery with the infrastructure team and will update this thread. I'll also be sending out another wave of notification emails as the recovery is taking longer than expected. Again, genuinely sorry for the disruption. All the feedback here on communication, framing, and the status page is noted and I'll raise it

r-1-6
PRO

2 months ago

please bring postgres db's up asap, lots of customers wondering wtf is going on


gzorzi
PRO

2 months ago

Any news? offline Any updates? My production service has been offline for over an hour now. This is becoming critical.


pawsty
PRO

2 months ago

Just received another email that says Duration: 1 hour from time of this notification, so 2.5h from the initial notification that said this will be a 30min downtime. This is insane and unacceptable.


jaspspain

iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system. I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.

sjotie
PRO

2 months ago

This poor reliability lately has really left a sour taste for me. I'm strongly considering switching too. Love the platform and how everything works, but reliability is the foundation of it all.


melulekig
PRO

2 months ago

I just receive another email

Hello,

Your Railway service will be affected by an infrastructure maintenance event.

  • Duration: 1 hour from time of this notification
  • Impact: Service will be offline

1 hour !!! No No this is unacceptable


pawsty

Just received another email that says **Duration**: 1 hour from time of this notification, so 2.5h from the initial notification that said this will be a 30min downtime. This is insane and unacceptable.

gzorzi
PRO

2 months ago

same here


sjotie

This poor reliability lately has really left a sour taste for me. I'm strongly considering switching too. Love the platform and how everything works, but reliability is the foundation of it all.

zed077
PRO

2 months ago

Agreed. There is not a week without an issue.


jaspspain
PRO

2 months ago

First, I received an email saying my server would be down for about 30 minutes... then two hours later I received another email saying it would be down for another hour... seriously? Are you kidding me or what's going on here?


jaspspain
PRO

2 months ago

Incredible....


jouleetech
PRO

2 months ago

At least you got a notification ... our service with 500 users + is down, we are losing money ...

And jet we have no solution, for this, this is sad .... we cant even ....


jouleetech

At least you got a notification ... our service with 500 users + is down, we are losing money ... And jet we have no solution, for this, this is sad .... we cant even ....

planning1234
HOBBY

2 months ago

Well the notification is useless, just got another one that there is again a "scheduled maintenance" for 1 hour. What the hell


jouleetech
PRO

2 months ago

Well at least, can we have a backup spin up another service so we could not lose any of our business ..


r-1-6
PRO

2 months ago

i just wish they put something on the service status page, as soon as it went down i went to check that and it didn't have any problems reported so i dived in and started tinkering with things to try fix it thinking it was my own problem


Anonymous
HOBBY

2 months ago

Same for me. Going back to a real cloud probably. Lost trust completely.


jouleetech

Well at least, can we have a backup spin up another service so we could not lose any of our business ..

zed077
PRO

2 months ago

YES. Can Railway please tell us how to start another database from backups


pooyahrtn
HOBBY

2 months ago

I wish we could just download the volume/backup and switch to another provider already.


azuyah
PRO

2 months ago

This might be the breaking point for us. The past month has not been good and this multi-hour long unexpected downtime during peak hours just can't happen. Once every 4 years yeah maybe, but we haven't been using Railway for 4 months and it's been countless of issues already, but nothing of this magnitude. Thinking about switching before it gets worse and our company and services are too big to comfortably switch.


zielgestalt
PRO

2 months ago

2 and a half hours now... they updated the status page.


azuyah

This might be the breaking point for us. The past month has not been good and this multi-hour long unexpected downtime during peak hours just can't happen. Once every 4 years yeah maybe, but we haven't been using Railway for 4 months and it's been countless of issues already, but nothing of this magnitude. Thinking about switching before it gets worse and our company and services are too big to comfortably switch.

henryo
PROOP

2 months ago

I am actually considering doing this too.


jouleetech
PRO

2 months ago

This is something, we also consider right now, i cant understand how this as such a big hoster is possible?


chandrika
EMPLOYEE

2 months ago

!t


chandrika
EMPLOYEE

2 months ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response chandrika 2 months ago


henryo
PROOP

2 months ago

So it was not escalated before? I am confused


2 months ago

It was; we were already on it. That command just escalates the Discord thread itself so we can keep track of impact. You can ignore that


Status changed to Awaiting User Response Railway 2 months ago


daniel-ddtech
PRO

2 months ago

so whats ETA for a solution?


Status changed to Awaiting Railway Response Railway 2 months ago


2 months ago

No ETA yet, sorry. We're still working on it.


Status changed to Awaiting User Response Railway 2 months ago


chandrika
EMPLOYEE

2 months ago

As Ray mentioned, we don't have a specific ETA yet. The incident is still being actively worked on and we've got several engineers on a call resolving this. I'll keep the incident updated and you can track live updates at https://status.railway.com/incident/cmmui0c7z012icp7ebcd1a3zv


pooyahrtn
HOBBY

2 months ago

just to be clear, I think it started way earlier that what's mentioned in the status page.


Status changed to Awaiting Railway Response Railway 2 months ago


chandrika
EMPLOYEE

2 months ago

Unfortunately, I'm unable to update the existing one but we're also working on a project to improve our status page: https://station.railway.com/feedback/improved-status-page-experience-78854616


Status changed to Awaiting User Response Railway 2 months ago


joshibbotson
PRO

2 months ago

My server failing is more of an issue on me not having adequate horizontal distribution across multiple regions.

The response to this issue however I find unacceptable. The status page not showing as down, and then having to find answers here is insane.

I'm paying a premium to Railway for things to be smooth, I'll definitely need to move to something like AWS after this. Extremely disappointed


Status changed to Awaiting Railway Response Railway 2 months ago


Anonymous
PRO

2 months ago

Same here. I am about to go live with my service. Now I think about evaluating other providers. Why does it take so long to just run the images on a new machine?


chrismarsden1
HOBBY

2 months ago

I was in a meeting about to demo our new app to a customer and couldn't get on to the app. That didn't go down well!! Ton many issues lately, i've never known a service to be down so much


joshibbotson
PRO

2 months ago

Had 10 new users sign up to the app since this occurred, none will have been to able to do their initial call to action which is reliant on my redis instance, none have had their "magic moment" probably never use my app again at best, at worst they'll leave a review saying it does not work.


chandrika
EMPLOYEE

2 months ago

The response to this issue however I find unacceptable. The status page not showing as down, and then having to find answers here is insane.

We've update our incident page here: https://station.railway.com/feedback/improved-status-page-experience-78854616 and I'm keeping it updated and am in a meeting with the infrastructure engineers that are working to resolve this as we speak


chrismarsden1

I was in a meeting about to demo our new app to a customer and couldn't get on to the app. That didn't go down well!! Ton many issues lately, i've never known a service to be down so much

coffeeforadoctor
PRO

2 months ago

can imagine the pain, I just went live we some serious marketing, next day you know - everything is down 😄


jouleetech
PRO

2 months ago

We are now at 3H+, and we have lost a significant amount of money. We lack access to the backups, which prevents us from spinning up another service without losing data. The most frustrating part is that our beta environment and database are up and running, while only our production database is affected. What on earth happened? At some point, we must demand a substantial apology to continue holding you accountable. This is simply unacceptable; other companies will face legal consequences for such negligence.


jouleetech

We are now at 3H+, and we have lost a significant amount of money. We lack access to the backups, which prevents us from spinning up another service without losing data. The most frustrating part is that our beta environment and database are up and running, while only our production database is affected. What on earth happened? At some point, we must demand a substantial apology to continue holding you accountable. This is simply unacceptable; other companies will face legal consequences for such negligence.

coffeeforadoctor
PRO

2 months ago

I guess migration process will be long and painful but well worth it


jouleetech

We are now at 3H+, and we have lost a significant amount of money. We lack access to the backups, which prevents us from spinning up another service without losing data. The most frustrating part is that our beta environment and database are up and running, while only our production database is affected. What on earth happened? At some point, we must demand a substantial apology to continue holding you accountable. This is simply unacceptable; other companies will face legal consequences for such negligence.

henryo
PROOP

2 months ago

I am guessing that what they will come up with is an apology and no form of compensation for us.

I expect compensation and a clear assurance that this will be handled properly moving forward.


chandrika
EMPLOYEE

2 months ago

The response to this issue however I find unacceptable. The status page not showing as down, and then having to find answers here is insane.

We've update our incident page here: https://station.railway.com/feedback/improved-status-page-experience-78854616 and I'm keeping it updated and am in a meeting with the infrastructure engineers that are working to resolve this as we speak


Status changed to Awaiting User Response Railway 2 months ago


chandrika
EMPLOYEE

2 months ago

We've resolved the incident https://status.railway.com/cmmui0c7z012icp7ebcd1a3zv. If your service has not automatically recovered, please try redeploying. If you're still experiencing issues after that, please let us know here and we'll help. Again, sorry for the disruption.


brent
HOBBY

2 months ago

I just now received an email:

[NOTICE] Temporary Service Disruption

Hello,

Your Railway service will be affected by an infrastructure maintenance event.

....

On my production environment! And it's not the first time I've received this mail! How can a hosting service send an email to let me know my production environment is down without any heads up. wtf? If I get 1 more unplanned service disruption I will leave Railway for good. This is really unacceptable.


Status changed to Awaiting Railway Response Railway 2 months ago


georgecollier-nqu
PRO

2 months ago

Same as above from Brent ^

Really bad


gadatos
PRO

2 months ago

It’s honestly frustrating that there was no clear communication from Railway regarding today’s disruptions—especially considering how severe the database issues were.

According to your status page, everything appeared mostly operational, yet in reality, it was practically impossible to get any meaningful work done throughout the entire day. The graphs and metrics shown there feel completely disconnected from the actual experience on the ground.

Transparency matters a lot in situations like this. Even a brief acknowledgment of ongoing issues would have made a big difference and helped set realistic expectations. Right now, it just feels misleading rather than informative.

Could you clarify what actually happened today and why it wasn’t properly reflected on the status page?


mickiestorm
PRO

2 months ago

Try to monitor your service with a service like uptime robot etc., then you'll realize how unreliable Railway is. Great for development and testing, but useless for any serious production use. Such a shame as it's honestly the easiest and best experience to use for rapid deployments, but the infrastructure is simply too bad. This month alone we filled and treated 16 major IT incidents in our ISMS from railway. Zero from AWS.


chandrika

Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work. I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either. The affected infrastructure has been recovered and your services should be back online or very close to it. I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.

callmefredcom
PRO

2 months ago

Unrelated to this specific incident but since you are an employee you may have an answer: could you share with us some high-level explanations of the recurring downtimes experienced on Railway recently?


Welcome!

Sign in to your Railway account to join the conversation.

Loading...