Old worker containers not killed after deploy + service delete/recreate

eraysaltik

PROOP

3 months ago

Hi Railway team,

I have a worker service in production where old containers remain alive and connected to Redis even after multiple redeploys, and even after fully deleting and recreating the service from the dashboard.

The dashboard shows 1 worker service, 1 replica, 1 active deployment, but Redis CLIENT LIST shows 2 BullMQ Worker processes from different internal IPs (10.159.251.112 is the new one and 10.253.170.90 is an orphan), both actively long-polling the queue with BRPOPLPUSH idle around 4 seconds. The orphan has been alive about 17 minutes since my last forced CLIENT KILL, but earlier today I had orphans that were 65+ hours old. Deleting the entire worker service from the dashboard did not kill the orphan containers — they are still in the same Railway internal network responding to TCP.

This is causing incorrect job routing in production because the orphan runs older code that consumes BullMQ jobs alongside the new worker. Could you please force-terminate the orphaned containers in our worker service, and investigate why service delete plus redeploy is not cleaning them up?

I can share Redis CLIENT LIST output, the orphan IPs, and our docker-entrypoint.sh setup if helpful.

Thanks!

Solved

15 Replies

Status changed to Awaiting Railway Response Railway • 3 months ago

angelo-railway

EMPLOYEE

3 months ago

This is a known platform issue we're tracking — service deletion should unconditionally terminate associated containers, and it's not doing so in some cases. We'll investigate your stacker to force-terminate the orphan containers.

In the meantime, your workarounds (direct exec entrypoint, queue rename, fingerprinting) are solid. One note: the original issue was partly caused by npx not forwarding SIGTERM to the child process, which you've already fixed — good catch. We'll follow up once the orphans are cleaned up.

Status changed to Awaiting User Response Railway • 3 months ago

eraysaltik

PROOP

3 months ago

Thanks, if you can share an update here or a general broadcasting on status page, it would be great.

Status changed to Awaiting Railway Response Railway • 3 months ago

Anonymous

PRO

3 months ago

Hi, we are also having the same problem in one of our service. It's still answering even tho we kill it😄

Anonymous

PRO

3 months ago

Any updateS?

erhanvarlik

PRO

3 months ago

I mean how long it can take man? it has been 5 hours and yet our zombi container still living somewhere. we cannot deploy anything new.

PRO

3 months ago

Why is this not on the status page yet?

PRO

3 months ago

More customers facing this here: https://station.railway.com/questions/orphaned-deployment-support-portal-dow-6d387ee6

Anonymous

PRO

3 months ago

Still

clashtradecoc10

PRO

3 months ago

What is this bullshit fucking act I don't get any fucking reply I pay for this service. ITS BEEN 2 DAYS

clashtradecoc10

PRO

3 months ago

And I even pay processing power for the stale instance you don't manage to kill

chandrika

EMPLOYEE

3 months ago

Hi there, apologies for the delay here. I've escalated this issue to our platform team to investigate.

Status changed to Awaiting User Response Railway • 3 months ago

eraysaltik

PROOP

3 months ago

Any update on the issue, did we stop and kill all zombie containers of workers??

Status changed to Awaiting Railway Response Railway • 3 months ago

sam-a

EMPLOYEE

3 months ago

Thanks for your patience here, and apologies for the delay.

I've checked your current deployment and can only see one active container for your worker service. Could you run CLIENT LIST on your Redis again and let us know if you're still seeing multiple worker connections from different IPs? That will help us confirm whether the orphan containers have been cleaned up or if we need to track them down.

Status changed to Awaiting User Response Railway • 3 months ago

eraysaltik

PROOP

3 months ago

Hey Sam-a,

Unfortunately we confirmed the orphan containers are still running. After killing their Redis connections via CLIENT KILL, they reconnected within seconds (age=0.0h), proving the containers are alive and actively reconnecting.

Current deployment (healthy): 10.133.60.187 is our worker service with 10 connections, and 10.171.94.247 is our web service with 4 connections. Orphan containers (should be terminated): 10.244.2.119 has 3 connections from a deployment about 6 days ago, and 10.253.170.90 has 5 connections from a deployment about 6 days ago and is still listening on a renamed/deprecated queue.

Could you please terminate the containers at those two orphan IPs?

Eray

Status changed to Awaiting Railway Response Railway • 3 months ago

brody

EMPLOYEE

3 months ago

We've removed the orphaned containers and are working on a long-term fix for the underlying cleanup issue. If you're still seeing orphaned containers, please open a new thread so we can investigate your specific case.

Status changed to Awaiting User Response Railway • 3 months ago

Status changed to Solved brody • 3 months ago

Welcome!