a month ago
Hi Railway team,
I have a worker service in production where old containers remain alive and connected to Redis even after multiple redeploys, and even after fully deleting and recreating the service from the dashboard.
The dashboard shows 1 worker service, 1 replica, 1 active deployment, but Redis CLIENT LIST shows 2 BullMQ Worker processes from different internal IPs (10.159.251.112 is the new one and 10.253.170.90 is an orphan), both actively long-polling the queue with BRPOPLPUSH idle around 4 seconds. The orphan has been alive about 17 minutes since my last forced CLIENT KILL, but earlier today I had orphans that were 65+ hours old. Deleting the entire worker service from the dashboard did not kill the orphan containers — they are still in the same Railway internal network responding to TCP.
This is causing incorrect job routing in production because the orphan runs older code that consumes BullMQ jobs alongside the new worker. Could you please force-terminate the orphaned containers in our worker service, and investigate why service delete plus redeploy is not cleaning them up?
I can share Redis CLIENT LIST output, the orphan IPs, and our docker-entrypoint.sh setup if helpful.
Thanks!
15 Replies
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
This is a known platform issue we're tracking — service deletion should unconditionally terminate associated containers, and it's not doing so in some cases. We'll investigate your stacker to force-terminate the orphan containers.
In the meantime, your workarounds (direct exec entrypoint, queue rename, fingerprinting) are solid. One note: the original issue was partly caused by npx not forwarding SIGTERM to the child process, which you've already fixed — good catch. We'll follow up once the orphans are cleaned up.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Thanks, if you can share an update here or a general broadcasting on status page, it would be great.
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
Hi, we are also having the same problem in one of our service. It's still answering even tho we kill it😄
a month ago
Any updateS?
a month ago
I mean how long it can take man? it has been 5 hours and yet our zombi container still living somewhere. we cannot deploy anything new.
a month ago
More customers facing this here: https://station.railway.com/questions/orphaned-deployment-support-portal-dow-6d387ee6
a month ago
Still
a month ago
What is this bullshit fucking act I don't get any fucking reply I pay for this service. ITS BEEN 2 DAYS
a month ago
And I even pay processing power for the stale instance you don't manage to kill
a month ago
Hi there, apologies for the delay here. I've escalated this issue to our platform team to investigate.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Any update on the issue, did we stop and kill all zombie containers of workers??
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
Thanks for your patience here, and apologies for the delay.
I've checked your current deployment and can only see one active container for your worker service. Could you run CLIENT LIST on your Redis again and let us know if you're still seeing multiple worker connections from different IPs? That will help us confirm whether the orphan containers have been cleaned up or if we need to track them down.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Hey Sam-a,
Unfortunately we confirmed the orphan containers are still running. After killing their Redis connections via CLIENT KILL, they reconnected within seconds (age=0.0h), proving the containers are alive and actively reconnecting.
Current deployment (healthy): 10.133.60.187 is our worker service with 10 connections, and 10.171.94.247 is our web service with 4 connections. Orphan containers (should be terminated): 10.244.2.119 has 3 connections from a deployment about 6 days ago, and 10.253.170.90 has 5 connections from a deployment about 6 days ago and is still listening on a renamed/deprecated queue.
Could you please terminate the containers at those two orphan IPs?
Eray
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
We've removed the orphaned containers and are working on a long-term fix for the underlying cleanup issue. If you're still seeing orphaned containers, please open a new thread so we can investigate your specific case.
Status changed to Awaiting User Response Railway • about 1 month ago
Status changed to Solved brody • about 1 month ago
