a year ago
Are you experiencing any issues? We're in Singapore. Just checking the public channel since private support aren't responding. Builds aren't working and 5 different environments are down
3c08e827-8d73-4a37-bbe9-9af9757bd354
21 Replies
a year ago
Same here - nothing is getting deployed at the moment
a year ago
please check <#846875565357006878> for updates
a year ago
Thanks david, adding some context where I have it in case it helps debugging
a year ago
We came back online ~30 mins ago. Now we're back offline as of ~4 mins ago
a year ago
Our apps and services are still down as well, tried migrating to US region, no luck.
a year ago
Pls help, I can't connect to postgres db any more
a year ago
Still down for us too.
a year ago
Do you have backup, I'm thinking of migrate database to other provider
a year ago
Don't be too hasty – this should be resolved soon (given how long it took last time) though I'm not aware of your requirements. At a certain point that'd have to be an option but for us we won't as yet.
a year ago
Thanks partbot, trying to redeploy but no luck as yet. I'll also check in when we're up
a year ago
Update: Partial recovery, 50% of capacity restored. Actively working on the rest. Thanks for your patience, on-call team working as swiftly as possible to restore service.
a year ago
Thanks David and team
Time for another update? Just a reminder that people have production infrastructure that is affected.
I just deployed services in the Singapore region and encountered a similar issue. Unable to deploy service successfully
a year ago
Still down, i'm trying regularly to re-deploy to no avail
The level of communication from Railway on this incident is totally unacceptable. I hope processes can be improved as a result of the post-mortem. Even just a “we are continuing to work on it” would give some confidence an on-call team is actually working on this…
My production systems have been down 4 hours in this downtime, and in total 6 hours 15 mins today. So far
a year ago
Update: The core issue has been identified and a resolution is in progress to restore service. The on-call team is working to roll it out.
a year ago
4 out of my 5 services redeployed properly. One more still haven't recovered. Might take a while more for the fix to be rolled out
Almost 11pm here, going to be a nervous night's sleep given the day of issues.
Thanks for getting it resolved team. Echoing jtechbit – not enough comms given the severity
a year ago
Thanks for the feedback, acknowledged. That's on me personally for not communicating more. We've had the full on-call team on this (with several additional engineers joining) for as many hours as service has been down.
a year ago
We've published a full incident retro here: https://blog.railway.app/p/2024-05-04-incident-report