2 months ago
I received an email regarding a service disruption, however the service had already been disrupted before the notification was sent.
It would be more helpful if disruption notices were sent in advance, rather than after the issue has already occurred. This would allow users to prepare accordingly.
I would also like to understand why this maintenance or disruption was scheduled during peak hours, as this significantly impacts usability.
At the moment, the timing and communication around this disruption do not make much sense from a user perspective.
I would appreciate clarification and hope this can be improved in the future.
73 Replies
2 months ago
Agree, I have the same issue. My service is down since 60m+
Received the same information after 30m downtime.
Upgraded to Pro for support, yet no answers.
Personal highlight is the agent suggesting to delete my database as a solution.
2 months ago
I completely agree with @henryo and @planning1234. We are currently facing a total service stoppage across our environments due to these persistent network timeouts.
Despite upgrading to a Paid tier for reliability, we are seeing 60+ minutes of downtime with zero proactive communication. Our entire operation is currently halted because neither the internal Service Mesh nor the Public TCP Proxies are responding.
Performing unscheduled maintenance or allowing such disruptions during peak hours without prior notice is unacceptable for a production-grade hosting service.
We expect a rapid resolution and, more importantly, a commitment to better communication protocols. We shouldn't have to find out about infrastructure failures through forum posts while our services are already dark.
2 months ago
Same for us.
2 months ago
Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work.
I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either.
The affected infrastructure has been recovered and your services should be back online or very close to it.
I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.
chandrika
Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work. I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either. The affected infrastructure has been recovered and your services should be back online or very close to it. I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.
2 months ago
Why does your status page say all is ok then?
2 months ago
This is unacceptable. Clients were testing on the environment that went down. The point of a Cloud provider is that this does not happen.
chandrika
Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work. I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either. The affected infrastructure has been recovered and your services should be back online or very close to it. I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.
2 months ago
Thanks for the clarification. I understand unexpected host failures happen.
My main concern was that the notification came quite a while after the outage started. Faster incident communication would really help users understand what’s going on.
2 months ago
@joelmoss: Fair question. This issue was localized to a single host, not a platform-wide outage, which is why it didn't trigger a status page update. That said, I hear you — if your services are down, that distinction doesn't really matter from your perspective and this is valid feedback.
@zed077: I understand, and I'm really sorry your clients were impacted. No excuses — hardware can and does fail, and our job is to recover as fast as possible and communicate clearly when it does. We're working on both.
@planning1234 re: "Personal highlight is the agent suggesting to delete my database as a solution." I'll make sure to raise this internally to the team working on that feature, sorry the agent said that, we'll still working on making it excellent
2 months ago
It is still down for me, is the data corrupted? I did not have any backups :-(
2 months ago
I'm still waiting, too.
2 months ago
Still nothing, Google even denied our app because they "couldn't log in" this was sent to us like 40 minutes ago so now I have to submit the app again and possibly wait another week.
maximilian-schwarz
I'm still waiting, too.
2 months ago
same
2 months ago
My Redis instance is still down and my customers are emailing me as we speak. When will this be fixed? The email said half an hour it's been over an hour now.
2 months ago
Can we spin another server from our backups? Or are the backups on the same machine?
2 months ago
Same here. It's totally fine if you were consider and communicate it as an incident, but framing it as
> Your Railway service will be affected by an infrastructure maintenance event.
makes it really unprofessional. As if you want to not affect your SLA. The fact you didn't mention in the status page didn't really help, since coming to this thread wasn't the first thing you'd be expecting.
2 months ago
me the same, @railway whats wrong?
2 months ago
Sorry to say but you basically lied to your customers by saying it was a maintenance thing and when you say it's all back up again it turns out it is not. This makes the whole thing look very unprofessional apart from the outage itself
2 months ago
Please give us an update or fix ASAP
chandrika
@joelmoss: Fair question. This issue was localized to a single host, not a platform-wide outage, which is why it didn't trigger a status page update. That said, I hear you — if your services are down, that distinction doesn't really matter from your perspective and this is valid feedback. @zed077: I understand, and I'm really sorry your clients were impacted. No excuses — hardware can and does fail, and our job is to recover as fast as possible and communicate clearly when it does. We're working on both. @planning1234 re: "Personal highlight is the agent suggesting to delete my database as a solution." I'll make sure to raise this internally to the team working on that feature, sorry the agent said that, we'll still working on making it excellent
2 months ago
It's over an hour now, would appreciate an update on recovery ETA.
2 months ago
Can we get an update please? This has been taking a long time and I need to plan around this
2 months ago
Hey all, thank you for your patience and I' hear every single one of you.
A quick update: the recovery is still in progress. Services are coming back online but some take a little longer than others, especially those running databases. Your data is safe, this was not a data loss event.
To give some context on the scope: this affected a small subset of all workloads on Railway, which is why we sent targeted notifications to the specific users affected rather than a public status page update. I completely understand the feedback from @pooyahrtn and others that the "infrastructure maintenance event" framing felt wrong, you're right, this was not a scheduled maintenance, and that's fair criticism. I'll raise this internally.
@chrisangele: your data should be intact, this was not a data loss event.
@zed077: backups are stored separately and are not affected.
@azuyah: I'm really sorry about the Google review timing. If there's anything we can provide to help with the resubmission, please let us know.
@joshibbotson: your service should come back shortly as things recover. If it doesn't, let me know and I'll look into it directly.
For anyone still waiting, I'm here monitoring the recovery with the infrastructure team and will update this thread. I'll also be sending out another wave of notification emails as the recovery is taking longer than expected.
Again, genuinely sorry for the disruption. All the feedback here on communication, framing, and the status page is noted and I'll raise it
chandrika
@joelmoss: Fair question. This issue was localized to a single host, not a platform-wide outage, which is why it didn't trigger a status page update. That said, I hear you — if your services are down, that distinction doesn't really matter from your perspective and this is valid feedback. @zed077: I understand, and I'm really sorry your clients were impacted. No excuses — hardware can and does fail, and our job is to recover as fast as possible and communicate clearly when it does. We're working on both. @planning1234 re: "Personal highlight is the agent suggesting to delete my database as a solution." I'll make sure to raise this internally to the team working on that feature, sorry the agent said that, we'll still working on making it excellent
2 months ago
iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system.
I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.
2 months ago
@chandrika, well i didnt get any notification? so thats also not true
chandrika
Hey all, thank you for your patience and I' hear every single one of you. A quick update: the recovery is still in progress. Services are coming back online but some take a little longer than others, especially those running databases. Your data is safe, this was not a data loss event. To give some context on the scope: this affected a small subset of all workloads on Railway, which is why we sent targeted notifications to the specific users affected rather than a public status page update. I completely understand the feedback from @pooyahrtn and others that the "infrastructure maintenance event" framing felt wrong, you're right, this was not a scheduled maintenance, and that's fair criticism. I'll raise this internally. @chrisangele: your data should be intact, this was not a data loss event. @zed077: backups are stored separately and are not affected. @azuyah: I'm really sorry about the Google review timing. If there's anything we can provide to help with the resubmission, please let us know. @joshibbotson: your service should come back shortly as things recover. If it doesn't, let me know and I'll look into it directly. For anyone still waiting, I'm here monitoring the recovery with the infrastructure team and will update this thread. I'll also be sending out another wave of notification emails as the recovery is taking longer than expected. Again, genuinely sorry for the disruption. All the feedback here on communication, framing, and the status page is noted and I'll raise it
2 months ago
So please tell us how to spin another server and restore our backups to that.
jaspspain
iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system. I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.
2 months ago
I agree with you on this.
jaspspain
iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system. I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.
2 months ago
Agreed
chandrika
Hey all, thank you for your patience and I' hear every single one of you. A quick update: the recovery is still in progress. Services are coming back online but some take a little longer than others, especially those running databases. Your data is safe, this was not a data loss event. To give some context on the scope: this affected a small subset of all workloads on Railway, which is why we sent targeted notifications to the specific users affected rather than a public status page update. I completely understand the feedback from @pooyahrtn and others that the "infrastructure maintenance event" framing felt wrong, you're right, this was not a scheduled maintenance, and that's fair criticism. I'll raise this internally. @chrisangele: your data should be intact, this was not a data loss event. @zed077: backups are stored separately and are not affected. @azuyah: I'm really sorry about the Google review timing. If there's anything we can provide to help with the resubmission, please let us know. @joshibbotson: your service should come back shortly as things recover. If it doesn't, let me know and I'll look into it directly. For anyone still waiting, I'm here monitoring the recovery with the infrastructure team and will update this thread. I'll also be sending out another wave of notification emails as the recovery is taking longer than expected. Again, genuinely sorry for the disruption. All the feedback here on communication, framing, and the status page is noted and I'll raise it
2 months ago
please bring postgres db's up asap, lots of customers wondering wtf is going on
2 months ago
Any news? offline Any updates? My production service has been offline for over an hour now. This is becoming critical.
2 months ago
Just received another email that says Duration: 1 hour from time of this notification, so 2.5h from the initial notification that said this will be a 30min downtime. This is insane and unacceptable.
jaspspain
iIt's unacceptable to leave customers stranded with applications used by hundreds of people, without even giving them time to prepare. Even if the hardware fails, we're paying for a premium service that isn't cheap, and we expect a much faster response, not more than an hour and a half of downtime. Every server I've worked on has redundant services that can be activated when the primary ones fail. I don't know why Railway doesn't have this system. I think that's reason enough to drop all the projects and switch to a more reliable provider. I expect a proper apology and a price discount or a bonus because I've made a very bad impression on several clients.
2 months ago
This poor reliability lately has really left a sour taste for me. I'm strongly considering switching too. Love the platform and how everything works, but reliability is the foundation of it all.
2 months ago
I just receive another email
Hello,
Your Railway service will be affected by an infrastructure maintenance event.
- Duration: 1 hour from time of this notification
- Impact: Service will be offline
1 hour !!! No No this is unacceptable
pawsty
Just received another email that says **Duration**: 1 hour from time of this notification, so 2.5h from the initial notification that said this will be a 30min downtime. This is insane and unacceptable.
2 months ago
same here
sjotie
This poor reliability lately has really left a sour taste for me. I'm strongly considering switching too. Love the platform and how everything works, but reliability is the foundation of it all.
2 months ago
Agreed. There is not a week without an issue.
2 months ago
First, I received an email saying my server would be down for about 30 minutes... then two hours later I received another email saying it would be down for another hour... seriously? Are you kidding me or what's going on here?
2 months ago
At least you got a notification ... our service with 500 users + is down, we are losing money ...
And jet we have no solution, for this, this is sad .... we cant even ....
jouleetech
At least you got a notification ... our service with 500 users + is down, we are losing money ... And jet we have no solution, for this, this is sad .... we cant even ....
2 months ago
Well the notification is useless, just got another one that there is again a "scheduled maintenance" for 1 hour. What the hell
2 months ago
Well at least, can we have a backup spin up another service so we could not lose any of our business ..
2 months ago
i just wish they put something on the service status page, as soon as it went down i went to check that and it didn't have any problems reported so i dived in and started tinkering with things to try fix it thinking it was my own problem
jouleetech
Well at least, can we have a backup spin up another service so we could not lose any of our business ..
2 months ago
YES. Can Railway please tell us how to start another database from backups
2 months ago
I wish we could just download the volume/backup and switch to another provider already.
2 months ago
This might be the breaking point for us. The past month has not been good and this multi-hour long unexpected downtime during peak hours just can't happen. Once every 4 years yeah maybe, but we haven't been using Railway for 4 months and it's been countless of issues already, but nothing of this magnitude. Thinking about switching before it gets worse and our company and services are too big to comfortably switch.
azuyah
This might be the breaking point for us. The past month has not been good and this multi-hour long unexpected downtime during peak hours just can't happen. Once every 4 years yeah maybe, but we haven't been using Railway for 4 months and it's been countless of issues already, but nothing of this magnitude. Thinking about switching before it gets worse and our company and services are too big to comfortably switch.
2 months ago
I am actually considering doing this too.
2 months ago
This is something, we also consider right now, i cant understand how this as such a big hoster is possible?
2 months ago
This thread has been escalated to the Railway team.
Status changed to Awaiting Railway Response chandrika • 2 months ago
2 months ago
It was; we were already on it. That command just escalates the Discord thread itself so we can keep track of impact. You can ignore that
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
so whats ETA for a solution?
Status changed to Awaiting Railway Response Railway • 2 months ago
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
As Ray mentioned, we don't have a specific ETA yet. The incident is still being actively worked on and we've got several engineers on a call resolving this. I'll keep the incident updated and you can track live updates at https://status.railway.com/incident/cmmui0c7z012icp7ebcd1a3zv
2 months ago
just to be clear, I think it started way earlier that what's mentioned in the status page.
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
Unfortunately, I'm unable to update the existing one but we're also working on a project to improve our status page: https://station.railway.com/feedback/improved-status-page-experience-78854616
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
My server failing is more of an issue on me not having adequate horizontal distribution across multiple regions.
The response to this issue however I find unacceptable. The status page not showing as down, and then having to find answers here is insane.
I'm paying a premium to Railway for things to be smooth, I'll definitely need to move to something like AWS after this. Extremely disappointed
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
Same here. I am about to go live with my service. Now I think about evaluating other providers. Why does it take so long to just run the images on a new machine?
2 months ago
I was in a meeting about to demo our new app to a customer and couldn't get on to the app. That didn't go down well!! Ton many issues lately, i've never known a service to be down so much
2 months ago
Had 10 new users sign up to the app since this occurred, none will have been to able to do their initial call to action which is reliant on my redis instance, none have had their "magic moment" probably never use my app again at best, at worst they'll leave a review saying it does not work.
The response to this issue however I find unacceptable. The status page not showing as down, and then having to find answers here is insane.
We've update our incident page here: https://station.railway.com/feedback/improved-status-page-experience-78854616 and I'm keeping it updated and am in a meeting with the infrastructure engineers that are working to resolve this as we speak
chrismarsden1
I was in a meeting about to demo our new app to a customer and couldn't get on to the app. That didn't go down well!! Ton many issues lately, i've never known a service to be down so much
2 months ago
can imagine the pain, I just went live we some serious marketing, next day you know - everything is down 😄
2 months ago
We are now at 3H+, and we have lost a significant amount of money. We lack access to the backups, which prevents us from spinning up another service without losing data. The most frustrating part is that our beta environment and database are up and running, while only our production database is affected. What on earth happened? At some point, we must demand a substantial apology to continue holding you accountable. This is simply unacceptable; other companies will face legal consequences for such negligence.
jouleetech
We are now at 3H+, and we have lost a significant amount of money. We lack access to the backups, which prevents us from spinning up another service without losing data. The most frustrating part is that our beta environment and database are up and running, while only our production database is affected. What on earth happened? At some point, we must demand a substantial apology to continue holding you accountable. This is simply unacceptable; other companies will face legal consequences for such negligence.
2 months ago
I guess migration process will be long and painful but well worth it
jouleetech
We are now at 3H+, and we have lost a significant amount of money. We lack access to the backups, which prevents us from spinning up another service without losing data. The most frustrating part is that our beta environment and database are up and running, while only our production database is affected. What on earth happened? At some point, we must demand a substantial apology to continue holding you accountable. This is simply unacceptable; other companies will face legal consequences for such negligence.
2 months ago
I am guessing that what they will come up with is an apology and no form of compensation for us.
I expect compensation and a clear assurance that this will be handled properly moving forward.
2 months ago
The response to this issue however I find unacceptable. The status page not showing as down, and then having to find answers here is insane.
We've update our incident page here: https://station.railway.com/feedback/improved-status-page-experience-78854616 and I'm keeping it updated and am in a meeting with the infrastructure engineers that are working to resolve this as we speak
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
We've resolved the incident https://status.railway.com/cmmui0c7z012icp7ebcd1a3zv. If your service has not automatically recovered, please try redeploying. If you're still experiencing issues after that, please let us know here and we'll help. Again, sorry for the disruption.
2 months ago
I just now received an email:
[NOTICE] Temporary Service Disruption
Hello,
Your Railway service will be affected by an infrastructure maintenance event.
....
On my production environment! And it's not the first time I've received this mail! How can a hosting service send an email to let me know my production environment is down without any heads up. wtf? If I get 1 more unplanned service disruption I will leave Railway for good. This is really unacceptable.
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
Same as above from Brent ^
Really bad
2 months ago
It’s honestly frustrating that there was no clear communication from Railway regarding today’s disruptions—especially considering how severe the database issues were.
According to your status page, everything appeared mostly operational, yet in reality, it was practically impossible to get any meaningful work done throughout the entire day. The graphs and metrics shown there feel completely disconnected from the actual experience on the ground.
Transparency matters a lot in situations like this. Even a brief acknowledgment of ongoing issues would have made a big difference and helped set realistic expectations. Right now, it just feels misleading rather than informative.
Could you clarify what actually happened today and why it wasn’t properly reflected on the status page?
2 months ago
Try to monitor your service with a service like uptime robot etc., then you'll realize how unreliable Railway is. Great for development and testing, but useless for any serious production use. Such a shame as it's honestly the easiest and best experience to use for rapid deployments, but the infrastructure is simply too bad. This month alone we filled and treated 16 major IT incidents in our ISMS from railway. Zero from AWS.
chandrika
Hi folks, completely understand the frustration. This wasn't a scheduled maintenance, what happened was the physical host running your services went down unexpectedly. As soon as we detected it, our infra team jumped on it to recover the services. I was paged as support oncall and I send out the notifications you received. The notification was us letting you know as quickly as we could that something was wrong, not advance notice of planned work. I 100% also know that distinction doesn't make the disruption any less impactful, and I'm really sorry for the inconvenience (especially during peak hours for some of you). Unfortunately with unexpected outages like this, we don't get to choose the timing either. The affected infrastructure has been recovered and your services should be back online or very close to it. I also genuinely appreciate the feedback on communication and timing, it's something I'll raise internally.
2 months ago
Unrelated to this specific incident but since you are an employee you may have an answer: could you share with us some high-level explanations of the recurring downtimes experienced on Railway recently?




