Deployment shows SUCCESS but container silently dies with zero runtime logs — restart always fixes it

im-el-bigote

PROOP

4 months ago

Summary

Every deployment to my service builds successfully, reports SUCCESS status, but the container immediately becomes

unresponsive (502) with absolutely zero runtime logs captured. Running railway restart on the exact same deployment

always brings the app back perfectly. This is 100% reproducible across 15+ consecutive deploys, two different regions,

and multiple Dockerfile configurations.

ProjectDetails

- ProjectID: ad064a9e-1014-45d4-8a18-01a19ab78ffc

- Service: concierge (service ID: a02a89ba-84fc-4cee-9ee4-0bcc663dd174)

- Plan: Pro

- Builder: Dockerfile (Python 3.11-slim, multi-stage)

- App: Python FastAPI / uvicorn

- Runtimememory: ~141MB (flat/declining, well within limits)

ReproductionSteps

1. git push origin main (triggers auto-deploy)

2. Build completes successfully (~5-35s depending on cache)

3. Deployment status shows SUCCESS

4. All requests to the service return 502

5. railway logs shows zeroruntimeoutput — not a single line

6. Run railway restart --yes

7. App comes up healthy within 60 seconds, logs appear normally

This cycle repeats on every single deploy.

EvidenceThatThisIsaPlatformIssue

1. Memoryisnottheissue: Runtime usage is ~141MB, flat/declining. No spike visible in metrics during deploy. Pro

plan with plenty of headroom.

2. Theapplicationcodeworksperfectly: railway restart proves the exact same Docker image runs fine. Same container,

same dependencies, same env vars — just a process restart vs a container swap.

3. Zerologs=processkilledbeforetheentrypointruns: My startup script (start.py) prints as its very first action

with flush=True, and I have ENV PYTHONUNBUFFERED=1 in the Dockerfile. The first print statement never appears in

deploy logs, meaning the process is SIGKILL'd before the Python interpreter executes a single line.

4. DeployvsRestartistheonlyvariable: Deploy creates a NEW container from the built image. Restart restarts the

process within the EXISTING container. The container creation/swap step is where the failure occurs.

5. Notregion-specific: Reproduced in both us-west1 and us-east4.

6. Notabuildissue: Build always succeeds. Docker layers are valid. The image is the same one that runs successfully

after railway restart.

WhatI'veTried(noneresolvedit)

- Multi-stage Dockerfile (separate build/runtime stages to reduce runtime memory)

- Single-stage Dockerfile

- Adding overlapSeconds=15 and drainingSeconds=10 to railway.toml

- Removing overlap settings (back to defaults)

- Adding a 3-second startup delay before Python imports

- Changing healthcheck to return 503 until DB is ready

- Reverting healthcheck to always return 200

- Adding EXPOSE 8080 to Dockerfile

- Removing duplicate config files (Procfile, extra railway.toml, extra Dockerfile)

- Deleting orphan services from the project (went from 3 services to 1)

- Switching region from us-west1 to us-east4

CurrentConfiguration

railway.toml:

[build]

builder = "dockerfile"

dockerfilePath = "Dockerfile"

[deploy]

healthcheckPath = "/health"

healthcheckTimeout = 60

restartPolicyType = "ON_FAILURE"

restartPolicyMaxRetries = 5

drainingSeconds = 10

Dockerfile:

FROM python:3.11-slim AS builder

WORKDIR /app

COPY backend/requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim

ENV PYTHONUNBUFFERED=1

WORKDIR /app

COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages

COPY --from=builder /usr/local/bin /usr/local/bin

COPY backend/ .

EXPOSE 8080

CMD ["python", "start.py"]

start.py(firstlines):

import time, os

if os.getenv("RAILWAY_ENVIRONMENT"):

print("Railway detected, waiting 3s for container setup...", flush=True)

time.sleep(3)

print("=== PBJ Start Wrapper ===", flush=True)

# ... imports and uvicorn.run()

None of these print statements appear in deploy logs. They all appear after railway restart.

Request

Could you investigate the container-level events (OOM kills, cgroup limits, SIGKILL signals) for the deployment IDs

listed above? Specifically:

1. What is killing the process before the entrypoint executes during a fresh deploy?

2. Why does the deployment report SUCCESS if the container immediately dies?

3. Is there a difference in resource allocation between deploy-created containers and restart-recycled containers?

4. Are there any known issues with the Dockerfile builder and container swap that could explain this?

The current workaround (railway restart after every deploy) works but is not sustainable. Happy to provide any

additional information needed.

Solved$20 Bounty

Pinned Solution

pavankumar2812

FREE

4 months ago

I think the issue may be caused by the container failing the health check before the application finishes starting.

During a fresh deploy Railway creates a new container and immediately begins health checks. If the FastAPI/uvicorn process hasn't bound to the PORT yet, Railway may mark the container unhealthy and terminate it before any logs are flushed. This would explain why no runtime logs appear and why railway restart works — the restart happens after the container environment is already initialized so startup completes faster.

A few things to verify:

1. Ensure uvicorn binds to Railway's dynamic port:

uvicorn app:app --host 0.0.0.0 --port $PORT

2. Increase startup tolerance or temporarily disable health checks to confirm:

healthcheckTimeout = 120

3. Confirm the container is not exiting due to missing runtime dependencies. In multi-stage builds it's safer to copy the full Python environment:

COPY --from=builder /usr/local /usr/local

instead of copying only site-packages.

4. Another option is running uvicorn directly as the container entrypoint instead of a Python wrapper script so the server binds earlier.

If the container is being terminated by the platform health check before the process binds to the port, this behavior would match exactly what is happening here.

4 Replies

ray-chen

EMPLOYEE

4 months ago

Can you please explain the issue in one sentence without using an LLM?

Status changed to Awaiting User Response Railway • 4 months ago

ray-chen

Can you please explain the issue in one sentence without using an LLM?

im-el-bigote

PROOP

4 months ago

It crashes when i deploy and i have to manually restart every time.

Status changed to Awaiting Railway Response Railway • 4 months ago

Railway

BOT

4 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway • 4 months ago

pavankumar2812

FREE

4 months ago

I think the issue may be caused by the container failing the health check before the application finishes starting.

A few things to verify:

1. Ensure uvicorn binds to Railway's dynamic port:

uvicorn app:app --host 0.0.0.0 --port $PORT

2. Increase startup tolerance or temporarily disable health checks to confirm:

healthcheckTimeout = 120

3. Confirm the container is not exiting due to missing runtime dependencies. In multi-stage builds it's safer to copy the full Python environment:

COPY --from=builder /usr/local /usr/local

instead of copying only site-packages.

4. Another option is running uvicorn directly as the container entrypoint instead of a Python wrapper script so the server binds earlier.

If the container is being terminated by the platform health check before the process binds to the port, this behavior would match exactly what is happening here.

pavankumar2812

I think the issue may be caused by the container failing the health check before the application finishes starting. During a fresh deploy Railway creates a new container and immediately begins health checks. If the FastAPI/uvicorn process hasn't bound to the PORT yet, Railway may mark the container unhealthy and terminate it before any logs are flushed. This would explain why no runtime logs appear and why `railway restart` works — the restart happens after the container environment is already initialized so startup completes faster. A few things to verify: 1\. Ensure uvicorn binds to Railway's dynamic port: uvicorn app:app --host 0.0.0.0 --port $PORT 2\. Increase startup tolerance or temporarily disable health checks to confirm: healthcheckTimeout = 120 3\. Confirm the container is not exiting due to missing runtime dependencies. In multi-stage builds it's safer to copy the full Python environment: COPY --from=builder /usr/local /usr/local instead of copying only site-packages. 4\. Another option is running uvicorn directly as the container entrypoint instead of a Python wrapper script so the server binds earlier. If the container is being terminated by the platform health check before the process binds to the port, this behavior would match exactly what is happening here.

im-el-bigote

PROOP

4 months ago

Root cause: Container was being killed by Railway health checks before uvicorn could bind to the PORT.

Changes made:

1. Removedstartupdelays — start.py had a 3-second sleep and verbose import checks that delayed port binding.

Stripped it down to just running uvicorn.

2. DockerfileCMDbindsto$PORTdirectly — Switched to shell-form CMD (uvicorn app.main:app --host 0.0.0.0 --port

${PORT:-8080}) so the server reads Railway's dynamic PORT immediately.

3. Fixedmulti-stagebuild — Changed COPY --from=builder to copy the full /usr/local directory instead of just

site-packages and /usr/local/bin, which was missing some Python binaries.

4. Increasedhealthchecktimeout — Bumped healthcheckTimeout from 60s to 120s in railway.toml.

5. Removedblockingstartupwork — Supabase client is created on startup but no longer runs a test query that blocks

the server from being ready. Health endpoint always returns healthy so Railway's liveness check passes immediately.

6. Removedthedeploy-restart.yml GitHubActionsworkaround — No longer needed since deploys now succeed on their own.

Result: Deploy succeeds without needing railway restart.

Status changed to Solved sam-a • 4 months ago

Welcome!