a month ago
Project: HRCG Tech (7b7f699e-deed-4626-bfd5-6a22259b4c4d)
Service: Scraper | Environment: production | Region: europe-west4-drams3a
Active deployment: 43e6e8bb-30f3-4cf6-977e-086564c90ed7 (2026-04-12 10:24 UTC)
Configured: numReplicas: 1
Problem
Old deployments are marked REMOVED in your control plane, but their containers are still alive and running our code. Despite
numReplicas: 1, at least two containers are executing our nightly cron in parallel. This is a platform-level issue — we've
exhausted application-level fixes.
Evidence
1. Two simultaneous scraper.run.completed events with different durations (Better Stack, 2026-04-13 23:42:45 UTC):
23:42:45.000 scraper_articles_scraped=117 run_duration=2564.99s errors=1
23:42:45.000 scraper_articles_scraped=36 run_duration=2568.27s errors=1
Different totals and different durations in the same second — only possible if two independent Node processes executed the cron.
2. Every portal emitted completion twice with different durations:
23:41:00 energy-storage.news articles=26 dur=971.24 | articles=6 dur=969.36
23:41:30 balkangreenenergynews.com articles=0 dur=2.14 | articles=0 dur=2.53
23:42:15 ess-news.com articles=6 dur=7.79 | articles=6 dur=9.54
3. Active deployment's stdout shows only ONE cron start per day:
2026-04-12T23:00:07.931Z scraper.run.started
2026-04-13T23:00:02.807Z scraper.run.started
The second scraper.run.completed in Evidence #1 has no matching scraper.run.started in this deployment's log stream → the second
run is executing in a container outside the active deployment.
4. Deployment history — many REMOVED, numReplicas=1:
43e6e8bb-... SUCCESS 2026-04-12T10:24Z (active)
a24521d6-... REMOVED 2026-04-12T07:36Z
e0659296-... REMOVED 2026-04-12T06:16Z commit: "redeploy: kill zombie container"
efe8c5e5-... REMOVED 2026-04-11T15:49Z
f01f3d7f-... REMOVED 2026-04-11T15:38Z
(10+ more REMOVED entries)
Our own commit "redeploy: kill zombie container" confirms this has happened before.
Already tried (all failed to kill the zombie)
- SIGTERM handlers with graceful shutdown
- Running node as PID 1
- MongoDB distributed lock with TTL + lock stealing
- Cron de-duplication guards
- Forced cache-busting redeploys
Ask
1. Forcibly terminate all non-active containers for this service now.
2. Confirm only the active deployment's container remains.
3. Investigate why REMOVED deployments leave their containers running — this is the platform bug.
0 Replies