there is disk I/O performance. my app is currently down in prod. can't even restart or spin a new db service

my postgre db is down. - Railway Central Station

paddyfink

PROOP

4 months ago

Postgres deployment stuck on CREATE_CONTAINER for 10+ minutes

Checkpoint sync times are 18.7 seconds (should be <1s)

Started around 2026-03-16 09:20 UTC

Project: 476842f9-1c85-4fd0-9ca0-af9d9249b5b9

zed077

PRO

4 months ago

Our postgres DB is down too. Does not want to redeploy either

sjotie

PRO

4 months ago

Experiencing the exact same issue, has been down for close to an hour now and no way to redeploy. Extremely annoying.

directsyndikat

PRO

4 months ago

issue is not on your side

momoragno

PRO

4 months ago

mine too....

programmerdavvy

HOBBY

4 months ago

mine too is down

r-1-6

PRO

4 months ago

mine also, lots of angry customers emailing in

pawsty

PRO

4 months ago

This is insane. Received [NOTICE] Temporary Service Disruption email that my DB will be down for 30mins, it's now hour and a half my service is down because DB is failing. How is this remotely ok?

chandrika

EMPLOYEE

4 months ago

Hi everyone, I hear you and I'm really sorry you're dealing with this.

What happened was a host went down unexpectedly, which affected a subset of services on Railway including yours. This was not scheduled maintenance, as soon as we detected it, our infra team jumped on it to recover the host. The notifications you received were us letting you know as quickly as we could that something was wrong, not advance notice of planned work.

Unfortunately with unexpected outages like this, we don't get to choose the timing either, and I know that doesn't make the disruption any less impactful.

Database services in particular take a bit longer to come back as they need to safely initialize before accepting connections (your data is safe)

chandrika

Hi everyone, I hear you and I'm really sorry you're dealing with this. What happened was a host went down unexpectedly, which affected a subset of services on Railway including yours. This was not scheduled maintenance, as soon as we detected it, our infra team jumped on it to recover the host. The notifications you received were us letting you know as quickly as we could that something was wrong, not advance notice of planned work. Unfortunately with unexpected outages like this, we don't get to choose the timing either, and I know that doesn't make the disruption any less impactful. Database services in particular take a bit longer to come back as they need to safely initialize before accepting connections (your data is safe)

budivoogt

PRO

4 months ago

Hi Chandrika, thank you for the update. For production grade applications this is unacceptable. What guidance can you provide to make sure that we are not affected by such downtime by having redundancy? Would it be to deploy a service to different regions or to have more replicas? I presume replicas on the same server? Regions might be clustered together on the same set of servers and thus be affected by downtime equally.

chandrika

Hi everyone, I hear you and I'm really sorry you're dealing with this. What happened was a host went down unexpectedly, which affected a subset of services on Railway including yours. This was not scheduled maintenance, as soon as we detected it, our infra team jumped on it to recover the host. The notifications you received were us letting you know as quickly as we could that something was wrong, not advance notice of planned work. Unfortunately with unexpected outages like this, we don't get to choose the timing either, and I know that doesn't make the disruption any less impactful. Database services in particular take a bit longer to come back as they need to safely initialize before accepting connections (your data is safe)

pawsty

PRO

4 months ago

It is a shitty situation but our applications depend on your infrastructure resiliency. Expected 2.5h+ downtime because it's impossible to get databases back online quickly shouldn't be the case. This is something you should anticipate and prepare actions for.

tankilevitch

PRO

4 months ago

my db isn't available as well, its fails to start causing the application to not be functional

chandrika

EMPLOYEE

4 months ago

We've called an incident for regarding this here: https://status.railway.com/cmmui0c7z012icp7ebcd1a3zv

chandrika

EMPLOYEE

4 months ago

!t

chandrika

EMPLOYEE

4 months ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response chandrika • 4 months ago

Status changed to Awaiting User Response ray-chen • 4 months ago

chandrika

We've called an incident for regarding this here: <https://status.railway.com/cmmui0c7z012icp7ebcd1a3zv>

chandrika

EMPLOYEE

4 months ago

Just a quick note: if your service does not have a volume attached, please try re-deploying it

budivoogt

Hi Chandrika, thank you for the update. For production grade applications this is unacceptable. What guidance can you provide to make sure that we are not affected by such downtime by having redundancy? Would it be to deploy a service to different regions or to have more replicas? I presume replicas on the same server? Regions might be clustered together on the same set of servers and thus be affected by downtime equally.

coffeeforadoctor

PRO

4 months ago

But the Postgres cannot be scaled to another regions as far as I see

Status changed to Awaiting Railway Response Railway • 4 months ago

chandrika

EMPLOYEE

4 months ago

Quick incident update: identified the issue as a hardware failure on a single host in EU West. The affected infrastructure is being brought back online and workloads are recovering, some services have already been restored. Services with databases and attached storage may take a bit longer to fully come back. We'll continue to provide updates as recovery progresses.

Status changed to Awaiting User Response Railway • 4 months ago

chandrika

EMPLOYEE

4 months ago

We've resolved the incident https://status.railway.com/cmmui0c7z012icp7ebcd1a3zv. If your service has not automatically recovered, please try redeploying. If you're still experiencing issues after that, please let us know here and we'll help. Again, sorry for the disruption.

Railway

BOT

3 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway • 3 months ago

Welcome!

Sign in to your Railway account to join the conversation.

Login