Old containers not terminated on deploy — parallel execution despite numReplicas=1
anuj-hrcg
FREEOP

a month ago

Project: HRCG Tech (7b7f699e-deed-4626-bfd5-6a22259b4c4d)

Service: Scraper | Environment: production | Region: europe-west4-drams3a

Active deployment: 43e6e8bb-30f3-4cf6-977e-086564c90ed7 (2026-04-12 10:24 UTC)

Configured: numReplicas: 1

Problem

Old deployments are marked REMOVED in your control plane, but their containers are still alive and running our code. Despite

numReplicas: 1, at least two containers are executing our nightly cron in parallel. This is a platform-level issue — we've

exhausted application-level fixes.

Evidence

1. Two simultaneous scraper.run.completed events with different durations (Better Stack, 2026-04-13 23:42:45 UTC):

23:42:45.000 scraper_articles_scraped=117 run_duration=2564.99s errors=1

23:42:45.000 scraper_articles_scraped=36 run_duration=2568.27s errors=1

Different totals and different durations in the same second — only possible if two independent Node processes executed the cron.

2. Every portal emitted completion twice with different durations:

23:41:00 energy-storage.news articles=26 dur=971.24 | articles=6 dur=969.36

23:41:30 balkangreenenergynews.com articles=0 dur=2.14 | articles=0 dur=2.53

23:42:15 ess-news.com articles=6 dur=7.79 | articles=6 dur=9.54

3. Active deployment's stdout shows only ONE cron start per day:

2026-04-12T23:00:07.931Z scraper.run.started

2026-04-13T23:00:02.807Z scraper.run.started

The second scraper.run.completed in Evidence #1 has no matching scraper.run.started in this deployment's log stream → the second

run is executing in a container outside the active deployment.

4. Deployment history — many REMOVED, numReplicas=1:

43e6e8bb-... SUCCESS 2026-04-12T10:24Z (active)

a24521d6-... REMOVED 2026-04-12T07:36Z

e0659296-... REMOVED 2026-04-12T06:16Z commit: "redeploy: kill zombie container"

efe8c5e5-... REMOVED 2026-04-11T15:49Z

f01f3d7f-... REMOVED 2026-04-11T15:38Z

(10+ more REMOVED entries)

Our own commit "redeploy: kill zombie container" confirms this has happened before.

Already tried (all failed to kill the zombie)

- SIGTERM handlers with graceful shutdown

- Running node as PID 1

- MongoDB distributed lock with TTL + lock stealing

- Cron de-duplication guards

- Forced cache-busting redeploys

Ask

1. Forcibly terminate all non-active containers for this service now.

2. Confirm only the active deployment's container remains.

3. Investigate why REMOVED deployments leave their containers running — this is the platform bug.

0 Replies

Welcome!

Sign in to your Railway account to join the conversation.

Loading...