2 months ago
I'm not able to deploy. Stuck in queue for 5+minutes
Attachments
71 Replies
2 months ago
Same here, infinite queue (more than 10+ minutes in one of my apps)
2 months ago
My deploy got picked up after 13 minutes, but looks like the first step is almost stuck (so no good news so far)
2 months ago
Same here, waiting since 25+ minutes
2 months ago
Same here
2 months ago
same
2 months ago
Already looking into it. Apologies for the inconvenience.
2 months ago
same
2 months ago
same!
2 months ago
Same
Guys, this is happening a lot. This breaks our CI every time it happens. Metal builds really don't seem reliable at all. We've had quite a few issues over the last few months. The suggestion you all gave me last time was to move away from Metal builds. Apologies for the frustration here, but I am frustrated.
not even using metal build and still got deployment problems almost every 2. day...
The last big outage caused us to lose one investor, and I'm getting hit with a CI issue now preventing me from deploying to our beta stage to run integ for a critical change we need to push for prod before a launch of a new service. My confidence in Railway really is low these days. I know issues happen, but we've not even gotten a single bill credit for the large outage last time (compare that to Supabase who gave bill credits to everyone -- we had a small experiment running there when they had an outage and we got an automatic bill credit). I know issues happen. I've been in this industry for 20+ years and we've got stuff across Vercel, Supabase, Cloudflare, AWS, Railway, and our own K8s. Railway has been the most problematic for us 🙁
Even took the time to leave some detailed feedback on their last RCA and no responses: https://discord.com/channels/713503345364697088/1471157274314539098/1471259076355686442 -- I'm trying here.
Railway has had this issue for the past few weeks. Why is this still happening with this frequency?
2 months ago
Railway folks, what is the deal here? We need reliability. You need to get this under control.
2 months ago
over an hour on my deploy..
Same here, the last few weeks they have had this issue with deployments, they solved it and the next week there are again outages
My business partners and I are pretty sure our next move here is to move away from Railway. This is painful. We suffered through the last massive outage (Feb 11), tons of deploy related issues, private networking issues, recent postgres connectivity issues, build issues again now. Railway has had more operational incidents than any service I think I've ever used. My laptop has a higher uptime and reliability. We can't do this anymore. We have one large release with a few investors hosted out of Railway, but I don't think this will be long lived. I am honestly terrified. Ocassional issues is one thing, this is constant. And yeah, very litte comunication from the team.
Railway Team, can you explain why your platform has been so unstable lately?
2 months ago
2 months ago
What should we do now with our stuck deploys?
I dont know any market options for that level of product quality, unfortunally i'm stucked at railway because the product ideia is kind unique
Attachments
2 months ago
Is there anyone from Support around here?
callmefredcom
Is there anyone from Support around here?
2 months ago
Ongoing incident, see https://status.railway.com/cmmjqqstk0013kv669biadfa4.
There's no ETA currently, this incident should not impact your currently running workloads.
2 months ago
It is impacting my patience. I've been waiting for 2 deploys for 1h and for 2 others for 45'+
callmefredcom
Is there anyone from Support around here?
2 months ago
The Railway team is actively working on the incident which is likely why there hasn't been any response from the Railway team
2 months ago
On Feb 11, I lost 4h of activity and 3h of my time on the incident. And did not get a penny in compensation...
dev
The Railway team is actively working on the incident which is likely why there hasn't been any response from the Railway team
2 months ago
They do not have a single person in a dedicated customer support role?
2 months ago
I love the elegance of the service but it's been far too unstable lately.
2 months ago
At this time of the day, I should be sleeping, not waiting for 2 stuck deploys to complete.
callmefredcom
They do not have a single person in a dedicated customer support role?
2 months ago
They do, they're likely hands on in slack channels with companies right now, 2 million users are a lot to handle for Railway's small team and it'll take a second to get to everyone
2 months ago
Deployments stuck We have identified the incident.
but it all resolve from the github, yet the am unable to deploy still as my hobby plan was paused
2 months ago
Should we abort deploys and redeploy?
callmefredcom
Should we abort deploys and redeploy?
2 months ago
No, likely case is a surge of deployments came through and so now they have to slowly work through that surge to clear the queue, redeploying will either not do anything or place you in the back of the queue again
dev
No, likely case is a surge of deployments came through and so now they have to slowly work through that surge to clear the queue, redeploying will either not do anything or place you in the back of the queue again
2 months ago
Too much slop on the web these days probably... By the way, I have just found out that polsia reads aislop when read backwards...
2 months ago
Some outages are ok if they happen once a month, but every week having the same issue is unacceptable. Why is this unstable?
2 months ago
Alleluia, unstuck (1/2, the other one Building)
mateo
Some outages are ok if they happen once a month, but every week having the same issue is unacceptable. Why is this unstable?
2 months ago
Good question. Would love to know.
2 months ago
From what I can understand the instability mostly comes from scale, Railway's had very steep growth recently and it's difficult to scale at the same pace as the growth. For example, I believe roughly a third of the incidents this year were from Railway hitting the GitHub rate-limit ceiling causing new deployments to fail , that is a direct result of Railway's scale.
I imagine once Railway's growth steadies and they catch up to their scale then they'll be much more stable.
yeah for sure, we as developers understand better then no one that problemas are there at any time
For example, I believe 28% of the downtime this year was from Railway hitting the GitHub rate-limit ceiling causing deployments to fail, that is a direct result of Railway's scale.
LOL no, that's not how operational resilience works. You don't blame your provider's limits. You plan for them. Totally understandable if one sneaks by that was perhaps not well documented. But, if this is true, and this has happened before, then there's honestly no excuse for this.
For our serious products, we track how we're tracking against limits and set alarms at 50% and 75%. At 75%, our on-call gets paged. At 50%, we get a ticket. We don't just magically get to 100% without knowing we're on our way there.
2 months ago
Sorry for the confusion there friend, I don't work for Railway, it's not my place to say how things should and shouldn't be - I only say how they are. The fact is that growth impacts Railway's stability. The extent of it and whether it's avoidable is unfortunately not something you or I can know.
What I do know: Railway does have alarms where their on-call personnel do get paged, they do have extensive monitoring and I don't think it's their intention to "magically" get anywhere. Every incident reveals a bottleneck that gets patched or fixed which brings Railways systematically closer to being more stable.
2 months ago
not fixed
2 months ago
this is nuts. how much longer can we stay with railway if this keeps happening every few days?
2 months ago
Why does this happen so much?
2 months ago
44 minutes for a deployment... and still counting
2 months ago
Railway posted a mini-retro here for those interested:
2 months ago
