340 Replies
a month ago
Team is aware and looking into it
I noticed this in my mysql db's logs:
> 2026-02-11T14:55:06.361425Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 9.4.0).
Is this mainly related to internal networking, otherwise I will start rerouting via public network?
a month ago
It seems related to processes being terminated, team is looking into it, but no harm in trying.
a month ago
Can you share your service URL? The one that appears in the URL bar (the one that starts with railway.com…) when you have the service in focus.
a month ago
Seeing this too with my Caddy server. Keeps getting shutdown
This MongoDB indeed seems to have been shutdown: https://railway.com/project/e843ddc7-071d-430c-be13-52ab4fe102c0/service/d130e10f-dd87-45a2-aa5e-cc3ce0cd8798?groupId=6e585a96-f33a-482d-817b-417a7dd8eb42&environmentId=40e37c2c-937f-4965-89a2-76389262b957&id=7d81b9d3-3be6-40bd-9517-db468ec6a018#deploy
I've tried redeployment, restart, different instances, nothing fixes it
a month ago
Same here, the same day we launched a campaign :(
a month ago
Attachments
if I restart my mysql db it seems to work for a minute or two, then it's shut down again
I'm getting massive response times suddenly, nothing seems to connect to my services either
a month ago
Also having issues
networking looks bust and we've randomly got containers that have been terminated.
a month ago
same problem here.
a month ago
🚨|incidents has been updated, we just have to be patient while the team fixes it
a month ago
None yet, and for updates be sure to look into the incident: https://status.railway.com/cmli5y9xt056zsdts5ngslbmp.
I have an uptime kuma app with sqlite db for monitoring and also this is going down. so it looks like it is related to volume mounting.
a month ago
For reference, the issue seems related to processes being killed randomly, not related to volumes.
a month ago
None at the moment, sorry.
a month ago
all our databases are down and our clients cant acces systems
Each time i stick with railway cuz i like the service but i can't fucking trust it if it keep going down every month
We're experiencing similar issues with our internal networking when connecting to both MongoDB and Postgres. All instances were running without any problems until approximately two hours ago, when we started encountering connection timeouts.
Same here!
Attachments
a month ago
It's not related to internal networking.
Is your postgres server actually running though? In all of my services it doesn't seem to be networking related, it's that the services are receiving a shutdown command after a few minutes of being up.
now the services is just crashed randomly
for some reason, it become crashed, just crashed
without any logs
a month ago
The team will post updates on here: https://status.railway.com/cmli5y9xt056zsdts5ngslbmp. Really sorry for that.
a month ago
It truly is, again really sorry for that
Yeah, the status update is deceptive. All our services are hard down, period.
fair. i would say that builds failing doesn't fully describe the issue. some existing services went down as well
which is why i saw the status page and assumed it didn't impact us
Yeah this is not just deployments this is a major service outage effecting my customers
a month ago
Can agree here, told the team if they can update it
a month ago
There's an incident called: https://status.railway.com/cmli5y9xt056zsdts5ngslbmp. Updates will be posted there.
a month ago
A new update has been posted, they're still looking into that.
since June last year, after migrate to the METAL stuff,
it start on and off having the outrage
average 1 month per incident
a month ago
From the issues I've been observing, it seems related to the new influx of users to the platform. Been awhile since we got a major outage that affected running workloads (not builds).
a month ago
i just restarted my MySQL db and it appears to be online again.
yeah this one is different, it affects any services, even the already-running ones.
A bit more than 1 month ago, all my services went offline, node js mongo etc etc
Internal routing is down as notionally my app is up but can't route to the DB. Nothing to do with deployments
a month ago
Team is trying their hardest to fix it and I can agree that it's unacceptable, they'll be posting a post-mortem on this too.
I believe the services have been restored. Please 👍🏻 or 👎🏻 this message if it works for you or not
Things happen, as long as it gets fixed and there is an understanding on it and how to avoid it again, all is good - at least from my pov
a month ago
It's not related to internal networking, the processes themselves seem to be killed randomly.
What a joke. Responding quickly is not enough. It's one outage a month at this point. Not at all usable.
Ah! correct, my db was showing online but clicking into it shows deployment is offline - my node service then can't route to it for obvious reasons!
Can you please update the status. "We're investigating issues impacting deployments" is very incorrect. This is a complete outage. It's deceptive to write it this way when it's clear to everyone the impact is far wider than deployments.
O problema aparentemente é apenas no Build com Metal, desabilitei o uso e meus sistemas voltaram a funcionar normalmente, apenas ao desabilitar o Use Metal Build.
been thinking is it worth it for migrate to the METAL thing. before this is just fine. lolz
a month ago
I've passed this down to the team to update it, they're looking into it!
it has become a major outage
Attachments
Railway's postmortems are normally pretty good! Certainly more detailed than many other providers cough AWS cough.
December 16 outage was exactly the same @ThallesComH 'minor outage' - when everything was down. This is systematic dishonesty.
if it helps anybody, my servers are on Legacy, not on Metal! And they're no longer dying every 5 mins.
What GPT had spotted in the logs was that 270GB of physical storage were assigned to the service, so I do believe that as all our projects grow, once in a while Railway will be running out of storage
a month ago
I remember Railway having an issue with elevated networking latency, but nothing as critical as this, but I can understand your frustation if that affected your workload, sorry for that.
damn, i still remember that,
the connection is suddenly go for 3s after migrate,
super lagging, damn scary
I have an email from December 16 saying the following, and it definitely wasn't just build failures, perhaps because it was European time but everything was down
"Hello,
We recently experienced a major outage that impacted your Railway workloads. We're very sorry for the disruption this caused.
Our goal at Railway is to provide a best-in-class experience, and this incident fell short of that standard. To help make things right, we've applied a credit based on your average bill from the last three months (20) to your workspace, Projects. You can view your available credits here.
For a detailed breakdown of what happened and the steps we're taking to prevent this in the future, you can read the full post-mortem here."
allow to download database backups, we need an anternative to get back to work
a month ago
For reference, we run our entire company on Railway and we weren't affected by that issue (to the point of our services going down). I apologize if I gave a different impression, it was not my intention to be dishonest.
really, it dont have that. sad
Attachments
can this be changed at least? certainly not caused by the application, it's the fault in hosting provider. 🙁
Attachments
a month ago
Every workload on Railway runs on metal, there's no "non-metal" option.
a month ago
That's a great feedback, will check with the team!
Idt we even received credits even though all our workspaces were down then
when there's a major outage like this, it would be great to at least be notified by railway so I can at least push messaging to my client notifying paying users about our service outage
my services are back now, i'm not using metal, but my apps werent connecting to my database, now they are able
yes please, a link pointing to the status page would be great. that will at least mitigate the damage for end users a bit, at least they know that the fault is happening on the hosting provider rather than the developers.
a month ago
You can subscribe to incidents on and, in case you use Discord, you can follow the #🚨|incidents channel.
We've got clients calling and are pretty annoyed that it's the second time in the past months. It's looking like we might have to change provider in good faith.
Even subscribing to it wouldn't do much, clients will probably call before that
a month ago
This is a monthly occurrence now. Your customers deserve a clear write up of what happened, and what steps you're taking to fix it.
a month ago
I can understand your frustration. Our company also runs on Railway, and we're just as upset as you are about this situation. However, we also recognize that outages are inevitable in any service and we love Railway way too much to leave.
We genuinely don't care if some stuff on your timeline takes longer to get added, we just don't want to have a major outage each month
yes, I did that once I found out about the incident. but I only found out about the incident 30min after it started because I happened to open my site to test something
a month ago
I can guarantee that the team will prioritize stability over features at any time. For reference, the build issues we were having was their top one priority.
Do you have a plan for resolving the problem? Will the data be accessible? Answers are needed; several critical systems are currently down. I need to know if the data is secure.
The benefit to Railway as a developer is amazing, to the client, they just don't want their website to go down and we provide client services so our opinions are less important. It makes it difficult to justify to client why to continue using an unstable service just because I personally like it.
a month ago
Our redis service shutdown and therefor got wiped.
a month ago
Can agree here, but I don't think we would have gotten our product to the point it is now without Railway. Every time we don't have to worry about infrastructure is time we get to ship. I can guarantee you that Railway is actively working on improving stability.
RESOLUTION turn this OFF.
DONE. 🫰
Attachments
it seems to be working again, now I have to deal with a lot of angry customers :|
a month ago
I don't think that's guaranteed, since some folks (including myself) saw improvement turning it on. In general, I'd use the Metal build environment
a month ago
Have you restarted it?
2026-02-11T15:52:23.923886Z 1 [ERROR] [MY-012574] [InnoDB] Unable to lock ./ibdata1 error: 11
2026-02-11T15:52:24.081102Z 1 [ERROR] [MY-012574] [InnoDB] Unable to lock ./ibdata1 error: 11
a month ago
Pretty sure that disabling metal build isn't doing much here, as your service received a redeploy which seems to be temporarily fixing the issues for some users. Please keep metal build enabled, it's much faster.
I don't doubt that the team is working on fixing issues on Railway but the fact is that major outages are occurring every month
Railway can of course fail, like every service. But at least they hear out and actively fix our concerns. AWS would just put all of us on voicemail…
a month ago
I can assure you they're aware. We have an internal chat with the team and receive updates on any issues they're actively fixing. One of the examples was the GitHub authentication problem where they told what was happening and the steps they're taking to fix it.
And when it does it's so bad that our clients are more accepting of it not being our fault
That's what I did in my database and applications, and all 6 had Metal enabled for testing. After disabling it, they all started working again right now. I'm telling you what I did and it worked. What you think based on guesswork without testing is just guesswork.
This can be re-enabled again after the problem is solved, and the problem is with Metal; it's on the resolution page!
Attachments
a month ago
Every workload runs on Railway Metal, which is Railway's own server infrastructure. Service builds were previously running on GCP, and now they're experimenting with running these builds on their own servers as well. Whether your server is built on Metal or GCP should not affect the outage.
a month ago
Unfortunately we've to wait until the incident is fixed.
Excuse me, I don't understand. I don't have any Metal services; they're all legacy, and my MySQL database wont starts. Is anyone else having problems with the database?
a month ago
A one-hour outage is unacceptable...when will the problem be fixed and will a detailed incident report be released?
a month ago
Have you run railway services on GCP? Metal's much more stable.
a month ago
"Legacy" services are still running on Metal iirc, it's just the runtime that's legacy
What I did worked, and I shared it. I'm fine, do whatever you want. For me, it's already resolved. I don't care if it's just guesswork.
Yes, but only the applications on metal were affected. I disabled them until they fix it; it's a temporary solution. It doesn't matter what's best right now, what matters is what's working.
a month ago
Hey, just got confirmation from the team that redeploying the affected services should fix the issue for some users (this explains why @Gandalf service is working, which is not related to Metal Builds). Just doing a redeployment on each service for example your API and database should fix the issue for some.
a month ago
Not a great solution, I can agree, but the team is still working on a fix for everyone else
I did a simple "restart" of the already deployed SQL service and everything came back for me
a month ago
Yep, restart also works
I redeployed however I am still running into 502's with my caddy proxy to my services
a month ago
after restart mysql service it started working again
a month ago
All my apps are down, servers and job applications are out. @Railway address promptly.
a month ago
Hey, can you try redeploying each of your services as mentioned above? That should fix for some users.
I cant redeploy the database, it would erase it… can only restart it I presume.
I did restart it
Attachments
a month ago
same here. databses down
a month ago
If you've a volume attached to your service, that shouldn't happen. If you don't know what a volume is, it's this small "card" displayed below your service.
Attachments
a month ago
Hey, can you try redeploying each of your services as mentioned above? That should fix for some users.
for me the data wasn't corrupted or removed. I'm still checking the integrity, but based on what I see - data seems to be okay
this? @ThallesComH
Attachments
a month ago
Stuck with development postgres server not being able to be connected to via backend. Prod still up.
Database Connection
Attempting to connect to the database...
a month ago
Yep, you can redeploy it no problem.
a month ago
Hey, can you try redeploying each of your services as mentioned above? That should fix for some users.
Ok let's go
Attachments
Whoever has a database and the data in a single deployment should be able to SSH in to it right? Though no one should be doing that ever.
a month ago
Hey, can you try redeploying each of your services as mentioned above? That should fix for some users.
a month ago
Maybe, but I don't see why you would need to SSH now though.
Just had three client websites crash and the automatic restart policies didn't work at all.
Just an idea but people who use Railway are people who are not comfortable with infrastructure or understand it completely. Railway creates this abstraction and makes it visually easy to understand. But this doesn't mean you can have completely no knowledge of infrastructure like Vercel, It's still involved.
https://svelte.dev/tutorial/svelte/welcome-to-svelte
Svelte has this really nice interactive tutorial… Imagine if Railway had this.
a month ago
For people still having issues after restart/redeploying, please open a #✋|help thread and I'll escalate it to the team.
Confirming that redeploying my Postgres and app services brought them back online. Thanks @ThallesComH 👍
I don't think https://discord.com/channels/713503345364697088/1471157274314539098/1471171555592638698 is the resolution at all.
Can you please confirm we can safely redeploy a Postgres DB without affecting it?
a month ago
Railway has an SSH option by using the CLI, just type railway ssh.
a month ago
Can you share a screenshot of your service on the canvas?
I just did and it's fine - the service mounts a persistent volume that survives the redeploy.
a month ago
You can blur it out but shouldn't be a problem.
a month ago
or temporarily renaming it to another name
redeploying the web app part
Attachments
a month ago
You can redeploy it, no problem.
a month ago
Yep
a month ago
Error de aplicación: se produjo una excepción del lado del cliente al cargar station.railway.com (consulte la consola del navegador para obtener más información) .
Help, i can't redeploy
Attachments
a month ago
Hey, can you try switching to using railpack instead of nixpacks? You can change by that going to your service settings
Still
Attachments
This is really bad, it's effecting all our services that use MYSQL, tried restarting MYSQL, which seems to fix the problem. Any updates? Please focus on reliability, I really love Railway and don't want to migrate to another platform. Client's are starting to notice that things are not working
can i turn on my redis again already? it was really slow/refusing connection.
But it shows that the service is still online
Attachments
It has been more than 10' since I hit redeploy. Aborting and trying again. What a mess.
a month ago
Something like this is unacceptable, I've been offline for 2 hours
@ThallesComH Still down, cannot redeploy the DB
Attachments
a month ago
Ya estoy perdiendo clientes, muy enojados. Que va hacer railway al respecto? ahora como vuelvo a convener a los clientes?
I'm already losing customers, and they're very angry. What is Railway going to do about this? How am I going to win them back?
a month ago
Ya estoy perdiendo clientes, muy enojados. Que va hacer railway al respecto? ahora como vuelvo a convener a los clientes?
I'm already losing customers, and they're very angry. What is Railway going to do about this? How am I going to win them back?
a month ago
Hey folks! Can you all try triggering redeploys on your services or anything dependent that's running into this? Say you have an API that depends on Redis and Postgres—try triggering a redeploy on Postgres, Redis, and the API service. That should fix it.
To make sure we get back to all of you, can you please create your own station thread? Want to make sure we miss nobdoy and everyone gets in a good place
It has been like that for ages
Attachments
a month ago
For reference, Noah will be replying to this thread from now on, I really need to get back to work, sorry if I didn't reply to someone before!
Cannot redeploy a mysql DB located here
Attachments
Attachments
a month ago
I am still down. My POSTGRESS DB is un reachable in 2 of my 3 environments. A major client of mine is offline..... Restart doesnt work and a redeploy just hangs..
My services are backup but I'm getting, which isn't true, it's back up. Though that's just a UI thing.
Attachments
Attachments
I dont have a mysql db myself, only a redis service. I managed to redeploy it fine.
a month ago
Ya, lots stuck for me too (stuck ones seem to be ones with a pre-deploy script)
Attachments
Is there some official rep from Railway in this thread who can update us on the actual issue?
im having the same problem, both services are online but unable to communicate each other and there's no log nothing only in the db says
Attachments
Attachments
a month ago
on "Creating Containers"? That normally means a busy queue from people redeploying after the incident, it will eventually deploy.
Attachments
any help with this?
https://discord.com/channels/713503345364697088/1471181106668765204
a month ago
Yep, sorry for the incovenience, this can happen after an incident, people spam builds and the queue gets crowded
a month ago
Please use restart, not redeploy.
Yeah, i a restarting redebploying for like 2 and a half hours. Still not working
Anyone can help me i have pro plan
Attachments
a month ago
Restart and Redeploy not working for my MySQL. + 2 hour offline!!!!!
FAILED after 17'
Attachments
a month ago
Nope no need to delete.
Can you please select the stuck/offline service, abort any running "redeploys" or "builds" and press "command/ctrl + k" and select "Deploy latest commit" or if a docker image, "redeploy source image"
a month ago
That will deploy a fresh variation.
a month ago
can you link?
a month ago
restarting the affected services worked for me. Thanks for the fix.
a month ago
If you select that deploy and copy the URL that is in your browsers search bar that'd be what I'm looking for
a month ago
So sorry y'all hit this. We'll be providing a very detailed post mortem of what happened
a month ago
a month ago
i tried to redeploy and restart, but nothing happened
a month ago
Restart and redeploy not working for MYSQL!
3 hours offline.. bye bye railway
a month ago
Frustrating. When is it going to be fixed?
This may be the thing to push people over the edge. This is the second such outage in 12 weeks. Especially frustrating that Railway have recently raised £100 MILLION dollars Series B funding. I can't build a business around these kinds of failures. Appreciate all Railway are doing to mitigate but this is costing folks livelihoods.
a month ago
regardless of the ReDeploy apps dont work. This S4cks. Cannot be happenning. I need to launch something extremely important and you guys are down.
a month ago
Nothing has been working for three hours now, and this situation occurs consistently once a month. What is the problem? I am losing customers.
a month ago
Restart and redeploy isnt working.. My clients are all DOWN. This isnt resolved.
a month ago
At least one of my services is still down, and cant redeploy (have tried restart multiple times).
It fails whenever it needs to talk to its Postgress
https://railway.com/project/5dfc53a8-dea9-46af-b0a0-536e51368af3/service/eca68ad2-7ea4-4e75-9918-6597e05b4fca/database?groupId=b412cd06-ef5e-487f-82a1-e12ff2af0268&environmentId=2d2473e5-9346-4932-9ab3-7c2c014f8ba6
a month ago
People that are still having this issue, please open your own separated help thread and provide affected service links.
Anyone using Directus and tried to redeploy their directus service from a docker image with :latest I ran into a failed migration. Rookie mistake! Redis wouldn't start up, directus wouldn't redeploy.
To fix:
Remove redis from directus variables
Change directus service cache variables from 'redis' to 'memory'
This will remove redis from directus so it doesn't need it to redeploy.
Optionally fix the botched migration. Solution to that below.
When you confirm directus service is up, redeploy redis
When you confirm redis is back up, add back the directus service cache variables. Add back the redis variable too.
Redeploy directus.
Optionally fix any botched migrtations:
You'll need to manually mark the migration as complete in Directus.
Run in psql:
INSERT INTO directus_migrations (version, name, timestamp) VALUES ('20251014A', 'Add Project Owner', NOW());I did this using railway cli: railway login -> railway link -> project/service/postgres -> psql 'railway postres url'
a month ago
Hey, can you open your own help thread? I will help you there, I maintain a Directus template.
a month ago
Or are you just sharing a how-to
Sharing how-to. Sorry if that wasn't clear! And used your template before… great work!
a month ago
Thanks!
a month ago
Not yet.
Just want to say that despite the understandable anxiety we have all experienced today, I still highly value Railway team members for their availability, seriousness and kindness. In a world with so many nasty figures dominating the news, it is refreshing to deal with people who truly care. Thank you.
a month ago
a month ago
@Celengan Babi ^
a month ago
In my opinion, Railway takes incidents very seriously. From what I’ve seen, this is the first incident in over a year that has affected running deployments. Most other incidents tend to involve slow builds, upstream provider outages, or related issues that don’t impact already running services.
It also seems like the number of instances affected in this incident was relatively small, and the team responded quickly to users.
Think my take from this is that first railway staff never seen to rise once to anyones frustrations, including my own! That has to be acknowledged. Secondly as administrators (new and seasoned) we should be building products that account for infra failures gracefully (at least as far as is possible, if it's down, it's just down). I've made some pretty big changes to a few codebases belonging to high stake clients in the last few hours. Finally I was among the many who paniced (likely because of clients shouting down the phone!) but once I relaxed and disected the problem and thought about mitigation properly afterwards, things settled down. My knee-jerk reaction was to redeploy, redeploy, redeploy. Big mistake. Tech is always always going to fail… at some point. No matter the provider. Question is do you prefer someone who takes shit seriously and apologises and changes policy on the spot to stop it happening or someone who will just tell you to put up and shut up. I'm staying.
Yeh I have a back up version of one project on North Flank. Much more expensive and the DX is pretty complex but it's there if needed. I also seperate my frontend with vercel and netlify to keep concerns separate for public facing websites and some apps. All backend stuff though you can't beat railway for DX, simplicity and price. AWS you'd need a million or two to run the stuff I run! Oooffft!
I'm sure as Railway use their series b funding to expand (what it's there for) we'll see even greater things and more stability.
a month ago
Hello Railway Team,
I am writing to request immediate assistance with a critical production outage for my project "Javi Ride." My database service has shut down, and I currently have no live environment serving my customers.
The Issue:
Database Shutdown: My production database crashed/shutdown unexpectedly.
Restore Failure: I have attempted to restore from yesterday’s backup snapshot multiple times. Although the restore process completes, the resulting database is empty and contains no tables.
Data Loss Risk: Because the database is down, I am unable to access today's transactions. I urgently need to know if there is a way to recover the data from today (pre-crash) or if you can investigate why the yesterday's backup is appearing as an empty volume.
justus-otundo
Hello Railway Team,I am writing to request immediate assistance with a critical production outage for my project "Javi Ride." My database service has shut down, and I currently have no live environment serving my customers.The Issue:Database Shutdown: My production database crashed/shutdown unexpectedly.Restore Failure: I have attempted to restore from yesterday’s backup snapshot multiple times. Although the restore process completes, the resulting database is empty and contains no tables.Data Loss Risk: Because the database is down, I am unable to access today's transactions. I urgently need to know if there is a way to recover the data from today (pre-crash) or if you can investigate why the yesterday's backup is appearing as an empty volume.
a month ago
Please open your own thread!
medim
Please open your own thread!
a month ago
Done
Thanks for sharing this. There are a few red flags here that are not addressed and honestly concerns me deeply:
"It was dry run in production and showed correct and accurate abuse identification. Only when turned on, via staged rollout in production, did false positives end up being observed." - This is basically like writing "Dunno, it worked in dev". Perhaps this is coming later, but I am very curious to hear why this was not caught in dry-run.
There's no mention here of a rollback strategy. Was one devised and tested before a changed was rolled out? Was it followed?
[Biggest Concern] The incident report says "<3% of our fleet was impacted during this staged rollout." and also says "then initiated a staged, fleet-wide rollout." and "After the rollout was complete". While I understand that battling fraud is hard, and speed is of the essence, I am concerned with how quickly and broadly this was rolled out. It appears to have been rolled out with no isolation between regions, or any other dimension (all regions were impacted). No bake periods in the rollout? No automated blockers for the rollout driven by metrics/canaries/alarms?
I see the incident report says "Staged rollout, by tier," as an improvement, but honestly, this scares me. This is operational excellence 101. Given Railway's scale, I just assumed OE practices like this were SOP. Reading this makes me question what other OE practices are lacking.
I know you all are very busy, and I don't expect answers here. I do appreciate the work you all do. It really has been a ton of fun working with Railway in my spare time.
Also, thank you for being transparent with your incident report culture. This does help earn trust.
I hope you accept this as constructive feedback.
a month ago
I just fixed my application by redeploying my backend and restarting all services including the db. Agreed with the above that transparency is appreciated.
a month ago
Hevent be helped, and business is running out
justus-otundo
Hevent be helped, and business is running out
a month ago
Can you share your thread link here?
medim
Can you share your thread link here?
a month ago
I also don't understand why this part took so long
Attachments
Thank you for the update and the transparancy. I really hope this does not happen again in the future.
To be fair, this incident was the 1st major outage Railway had affecting running services. I just wish that it could have been prevented by thorough testing when you guys switched from dry run to live run.
On the postmortem page it says:
After the rollout was complete, engineers noticed the enforcement logic was overly broad in its targeting criteria. Rather than isolating only the intended workloads, the system incorrectly matched certain legitimate user processes, including some databases and application services. As a result, the enforcement system sent SIGTERM signals to legitimate user workloads.
Why wasn't it detected during the dry run? The dry run should have produced exact same output as the live run.
a month ago
Abuse actors have paid accounts too.
Status changed to Solved medim • about 1 month ago
