22 Replies

efstajas
PROOP

3 months ago

our uptimekuma monitor is also failing to ping several services via private networking due to ECONNREFUSED - most are fine though


3 months ago

Hey, can you try redeploying the service in question?


3 months ago

for people having issues, please create your own help thread, we'll help you there


kenchoong
PRO

3 months ago

yes me too.

rabbitmq all down, unable to connect

SIGTERM received - shutting down

getting this


efstajas
PROOP

3 months ago

did that - redeploys get stuck on migrations in pre-deploy. it seems it cannot connect to the DB (pg) instance either



ashakibp
PRO

3 months ago

Yes me too network is borked atm


3 months ago

Maybe the PG service is the problem here? Any chance you could try a redeploy on your PG instance? that will cause a downtime, but no problem if you can't. That will help us debug the problem more.


efstajas
PROOP

3 months ago

definitely not - it's 3 different PG instances and 2 Redises that cannot be reached by different services atm, plus our uptimekuma cannot ping a number of servers via private networking


3 months ago

Team is aware and looking into it


efstajas
PROOP

3 months ago

it seems like now all our services are affected, completely down.


efstajas
PROOP

3 months ago

not true - almost all.



efstajas
PROOP

3 months ago

After the outage almost everything recovered, most after a manual restart.

Unfortunately now we need to urgently deploy a script to replay missed webhooks on a critical service and the deployment has been stuck on "Running pre-deploy command" for 21+ minutes after logging success.

https://railway.com/project/56cafcfa-394c-46c9-a811-dc3207bad3dc/service/8c317411-97a8-4123-a568-83113fff997f?groupId=be2427f6-4308-48fd-b5a2-991a2eb8cf18&environmentId=ec032274-ccb3-4372-907d-acc4f1ca967f


3 months ago

Are you able to abort your deployment and then create a new one?


efstajas
PROOP

3 months ago

already tried that, the second one is stuck since 6 minutes as well. Usually it takes less than 15 seconds to apply migrations on this service.


3 months ago

cc @Noah can you take a look into this?


3 months ago

this is most likely due to the current running incident. We have this tracked and will get back to you as soon as I get any info on this


efstajas
PROOP

3 months ago

gonna take a desperate measure to remove the pre deploy command temporarily to push this through since the change does not contain a migration & hope it doesn't make things worse


efstajas
PROOP

3 months ago

By the way, and this is not a problem for us, but just in case it somehow helps diagnose on your end: we have a cron job that's been "running" but actually finished for over 20 minutes here https://railway.com/project/56cafcfa-394c-46c9-a811-dc3207bad3dc/service/29f1876c-1230-4779-8d8b-265f7b71aaff?groupId=be2427f6-4308-48fd-b5a2-991a2eb8cf18&environmentId=ec032274-ccb3-4372-907d-acc4f1ca967f&id=cc16a24c-deee-40d4-b9f6-ad1818eb1346&start=2026-02-11T16%3A00%3A24.970Z&returnTo=cron-schedule

seems like maybe same root cause of somehow missing exits 🤷


efstajas
PROOP

3 months ago

This worked to fix our acute problem. we're back in sync 🙌 Thank you all for the assistance and good luck with the fallout 🦾


3 months ago

So sorry you hit this and glad you're back!


Welcome!

Sign in to your Railway account to join the conversation.

Loading...