Ghost Container
zanewithspoon
PROOP

a month ago

It seems there's an old deployment of one of my services that's invisible from the dashboard.

I believe it's an orphaned container at IP fd12:ac45:f77c:0:a000:39:8689:5594 (container hostname 613b964be9c3) that's been running for 5+ days and is invisible to the dashboard.

It seems to be one of my old celery beat processed connected to my Redis, that's now duplicating all of my beats.

Solved

3 Replies

Status changed to Awaiting Railway Response Railway about 1 month ago


sam-a
EMPLOYEE

a month ago

Hey! I searched across our infrastructure for that container (613b964be9c3) but wasn't able to locate it on the hosts I checked.

A couple of questions to help track this down: is the duplicate beat behavior still happening right now? If so, could you share a recent log snippet showing it? Also, how did you identify that container hostname and IP - was it from your celery logs, Redis CLIENT LIST, or somewhere else?

If you're able to run CLIENT LIST on your Redis and that IP (fd12:ac45:f77c:0:a000:39:8689:5594) still shows as connected, the container is definitely still running and I can dig deeper. If it's gone, the orphan may have already been cleaned up.


Status changed to Awaiting User Response Railway about 1 month ago


zanewithspoon
PROOP

a month ago

Yes, the duplicate beat is still actively happening right now.

Here's a fresh log snippet from our celery workers showing health_check (scheduled every 2 minutes) being executed twice — once from our legitimate beat and once from the ghost:

14:16:47 health_check[0dd1de1d...] — ghost beat (:47 offset)

14:17:03 health_check[1b75bee0...] — our beat (:03 offset)

14:18:47 health_check[03d87d22...] — ghost beat

14:19:03 health_check[a21e41cf...] — our beat

Two different task IDs, ~16 seconds apart.

The ghost IP is still connected to Redis right now.

Here's the CLIENT LIST output showing its two active connections:

id=4 addr=[fd12:ac45:f77c:0:a000:39:8689:5594]:45538 age=41147s (11.4 hours) idle=46s cmd=lpush

id=269 addr=[fd12:ac45:f77c:0:a000:39:8689:5594]:44426 age=40188s (11.2 hours) idle=46s cmd=unsubscribe sub=1

These connection ages (11+ hours) correspond to a Redis restart we did last night. The container reconnected immediately after the restart — it has valid Redis credentials and auto-reconnects when killed.

How we identified it:

We traced the container hostname 613b964be9c3 from Celery error logs on the worker side. When the ghost beat dispatches drop_check_ready_batches (a task the current workers don't have registered), the worker logs the full task headers including 'origin': 'gen1@613b964be9c3'. The gen1@ prefix is Celery beat's process identifier. We then correlated the ghost's dispatch timing with CLIENT LIST to identify the IP. We've tried CLIENT KILL multiple times — it reconnects within seconds. We also redeployed all three app services (backend, celery-workers, celery-beat) and restarted Redis — the ghost survived all of it.


Status changed to Awaiting Railway Response Railway about 1 month ago


sam-a
EMPLOYEE

a month ago

Good news - I tracked down the ghost container. It was an orphaned deployment instance (09a3f174-03a4-49bb-890f-4cf6f2c48dc8) from April 6 that was running on production-stacker-dc4a-metal-016-03. It should have been terminated when newer deployments came up, but the cleanup never happened - it kept running with valid Redis credentials for 10 days.

The container stopped about 4 hours ago and is no longer present on the stacker. Can you check if the duplicate beats have stopped? You should see only one health_check task per 2-minute interval now, and CLIENT LIST on your Redis should no longer show connections from fd12:ac45:f77c:0:a000:39:8689:5594.

If the duplicates are still happening, let me know and I'll dig deeper - but based on our metrics the orphan appears to be gone.


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 30 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...