5 days ago
Summary
New backend deployments build OK, container starts (Railway logs "Starting Container"), but the application produces zero stdout afterwards. Healthcheck fails for 10 min ("service unavailable") and the deploy is marked failed. The previously deployed container (running since 2026-05-17 04:17 AM GMT-5) is still Online and serving production normally.
Smoking gun: The same image with the same env vars boots fully when I run it locally via railway run --service backend -- node dist/main.js. So the bug is not the code or env vars — it appears to be how the new container instance is being created/run on Railway's side.
Evidence
-
Failed deployment:
6d185348-0dd3-4780-9de9-78e923966d1f -
Image digest:
sha256:b61b8093814ba998d3e497dfd48bdbd79c22357db77f13b4c6bd5168c803818f -
Container instance status:
EXITED -
Captured Deploy Logs (full):
2026-05-17 15:36:40 Starting Container (nothing else for 10 minutes) -
startCommand:
echo '>>> starting nest (no migrate)' ; node dist/main.js— the echo line never appears either. -
Build Logs: build succeeds end-to-end (Docker multi-stage, image exported and pushed).
-
Healthcheck: 18 attempts, all "service unavailable", retry window 10 min.
Railway's auto-diagnose said "process stayed alive but idle" — based on stale Coupon-table errors in old container logs (those migrations have since been applied to the DB; not the cause).
Local proof the binary works
$ railway run --service backend -- node dist/main.js
[Boot] main.ts top — node version=v20.18.1
[Boot] instrument.ts top / done
[Boot] >>> NestFactory.create
[Nest] LOG [NestFactory] Starting Nest application...
[Nest] LOG [InstanceLoader] AppConfigModule dependencies initialized
... (50+ modules all OK)
[Boot] >>> NestFactory.create OK
[Boot] >>> app.listen 4949
[Nest] LOG [PrismaService] Prisma connected · multi-tenant middleware activo
[Nest] LOG [QueueService] BullMQ habilitado contra redis://...@tramway.proxy.rlwy.net:54113
[Nest] LOG [NestApplication] Nest application successfully started
$ curl http://localhost:4949/api/health
{"ok":true,...}Same image, same env vars (verified via railway variables --service backend --kv). The binary boots fine outside Railway's runtime; it doesn't boot inside.
What we already tried (no effect)
- Reset
DATABASE_URLusing template refs:postgresql://${{Postgres-Nq8w.PGUSER}}:${{Postgres-Nq8w.POSTGRES_PASSWORD}}@${{Postgres-Nq8w.RAILWAY_PRIVATE_DOMAIN}}:${{Postgres-Nq8w.PGPORT}}/${{Postgres-Nq8w.PGDATABASE}} - Switched
DATABASE_URLto public proxy:${{Postgres-Nq8w.DATABASE_PUBLIC_URL}}→ resolves totramway.proxy.rlwy.net:39155. - Switched
REDIS_URLto public proxy:${{Redis.REDIS_PUBLIC_URL}}→ resolves totramway.proxy.rlwy.net:54113. - Applied all pending Prisma migrations to DB (MenuTranslation, Card.minAmountPerStamp, drop Coupon, QuotePlan enum).
- TCP reachability check from outside:
nc -zv tramway.proxy.rlwy.net 39155→ ok;nc -zv tramway.proxy.rlwy.net 54113→ ok. - No env var in backend service references
*.railway.internal(other thanRAILWAY_PRIVATE_DOMAINwhich is the backend's own private name).
What's not the cause
- App code: local proves it.
- DB schema: all migrations applied, verified.
- Env vars: identical to what local uses successfully.
- Public proxies: reachable from outside Railway.
- Image: builds cleanly, same digest works locally.
Questions for Railway
- Why does the container for
6d185348-0dd3-4780-9de9-78e923966d1fproduce zero stdout after "Starting Container"? Is the process actually running? Is its stdout connected to the log pipe? - Is there an egress / network policy preventing this service's new containers from reaching
tramway.proxy.rlwy.net:39155and:54113? - Should we recreate the
backendservice from scratch with new IDs? If so, can theapi.soyclubify.comcustom domain be moved without downtime? - Anything in your kernel / container runtime logs that would explain a silent stall when our process boots in ~5 seconds locally?
Production is not affected (old container, uptime ~20h, still serving). We can wait — but the new code (which depends on these migrations and includes a translation feature) can only ship once we get a new container up.
Thanks!
2 Replies
Status changed to Open Railway • 5 days ago
5 days ago
Your application won't be started until the healthcheck passes. That's why you don't see any logs.
The most common cause for healthcheck service unavailable error is not listening on the PORT variable or omitting it when using target ports which can result in your health check returning a service unavailable error.
You can read more about it here: https://docs.railway.com/deployments/healthchecks#configure-the-healthcheck-port
5 days ago
Most likely causes are a stuck container runtime, broken stdout attachment, or networking/init failure during container startup; recreating the backend service (new service ID/container lineage) is probably the fastest path while Railway investigates deployment 6d185348-0dd3-4780-9de9-78e923966d1f at the platform level.
this is a Railway runtime/container issue rather than an application problem