502 Connection Dial Timeout

3 months ago

Problem: Application returns 502 Bad Gateway errors when deployed to Railway, despite successful Docker build.

  Symptoms:
  - Docker container builds successfully
  - Container appears to start but Railway returns 502 errors
  - Multiple deployment attempts with different fixes haven't resolved the issue

  Root Cause Analysis

  After investigating the deployment configuration, we identified several issues that could cause the container to
  crash immediately after startup:

  1. Django Static Files Circular Reference

  The Django settings had STATICFILES_DIRS = [BASE_DIR / 'static'] configured unconditionally. During the Docker
  build, collectstatic copies files to this directory, but then Django tries to collect from it again at runtime,
  potentially causing:
  - File permission errors
  - Circular reference during static file collection
  - Django initialization failures

  2. Django Setup Test Causing Early Crash

  The entrypoint.sh script was running django.setup() as a health check before starting Gunicorn. This test:
  - Requires full Django initialization including database connection
  - Can fail if any app has import errors or missing dependencies
  - Causes the container to exit before Gunicorn even starts
  - Resulted in Railway seeing the container as "crashed" and returning 502

  3. Missing Railway-Specific Configuration

  - ALLOWED_HOSTS didn't include .railway.app in defaults
  - No fallback handling for Railway's PORT environment variable in some paths
  - Potential database connection issues if Railway environment variables weren't properly configured

  Technical Details

  Stack:
  - Python 3.12
  - Django 5.2.8
  - Gunicorn 23.0.0
  - MySQL database
  - WhiteNoise for static files

  Deployment Method:
  - Dockerfile-based deployment (not Nixpacks)
  - Custom entrypoint script handling migrations and Gunicorn startup

  Expected Behavior:
  Container should:
  1. Run database migrations
  2. Start Gunicorn on Railway's provided $PORT
  3. Serve the Django application

  Actual Behavior:
  Container crashes during startup, before Gunicorn binds to the port, causing Railway to return 502 errors.

  Changes Made to Fix

  1. Fixed static files configuration - Made STATICFILES_DIRS conditional to avoid conflicts
  2. Simplified
entrypoint.sh - Removed problematic django.setup() test that could crash the container
  3. Added Railway domain to ALLOWED_HOSTS - Included
.railway.app in defaults
  4. Improved error handling - Migrations now continue on failure instead of crashing

Solved$10 Bounty

Pinned Solution

This ^

From logs, the script seems to have exited or crashed before reaching the part where it execs gunicorn.
I'd test locally and perhaps add more echo in script to see where exactly it crashed during deployment on Railway.

13 Replies

Railway
BOT

3 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


3 months ago

Hey, would you be able to share any logs you're seeing in the railway service before gunicorm fails to bind? Does your container run after building it on your development machine?


3 months ago

Build logs:
[Region: us-west1]

=========================

Using Detected Dockerfile

=========================

context: 3q1w-7JcQ

internal

load build definition from Dockerfile

0ms

internal

load metadata for docker.io/library/python:3.12-slim

136ms

internal

load .dockerignore

0ms

[ 1/10] FROM docker.io/library/python:3.12-slim@sha256:b43ff04d5df04ad5cabb80890b7ef74e8410e3395b19af970dcd52d7a4bff921

11ms

internal

load build context

0ms

1

RUN chmod +x /app/entrypoint.sh cached

0ms

[ 9/10] RUN python manage.py collectstatic --noinput --clear || echo "Static collection skipped" cached

0ms

[ 8/10] RUN mkdir -p /app/static /app/staticfiles cached

0ms

[ 7/10] COPY . /app/ cached

0ms

[ 6/10] RUN pip install -r requirements.txt cached

0ms

[ 5/10] RUN pip install --upgrade pip cached

0ms

[ 4/10] COPY requirements.txt /app/ cached

0ms

[ 3/10] RUN apt-get update && apt-get install -y gcc default-libmysqlclient-dev pkg-config && rm -rf /var/lib/apt/lists/* cached

0ms

[ 2/10] WORKDIR /app cached

0ms

auth

sharing credentials for production-us-west2.railway-registry.com

0ms

Build time: 9.01 seconds

Deploy logs:
You reached the start of the range

Nov 24, 2025, 11:29 AM

Starting Container

Operations to perform:

Apply all migrations: admin, attendance, auth, contenttypes, django_celery_beat, sessions

Running migrations:

No migrations to apply.

HTTP logs:
requestId:

"EO_ccizYRf-Fitimn6XIxQ"

timestamp:

"2025-11-24T00:31:29.072204063Z"

method:

"GET"

path:

"/"

host:

"web-production-5d1e4.up.railway.app"

httpStatus:

502

upstreamProto:

""

downstreamProto:

"HTTP/2.0"

responseDetails:

"Retried single replica"

totalDuration:

15021

upstreamAddress:

""

clientUa:

"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36"

upstreamRqDuration:

15002

txBytes:

4682

rxBytes:

739

srcIp:

"144.137.30.102"

edgeRegion:

"us-west2"

upstreamErrors:

"[{"deploymentInstanceID":"bbc3c5bd-1c3a-455b-996f-c7b9a83f2707","duration":5002,"error":"connection dial timeout"},{"deploymentInstanceID":"bbc3c5bd-1c3a-455b-996f-c7b9a83f2707","duration":5000,"error":"connection dial timeout"},{"deploymentInstanceID":"bbc3c5bd-1c3a-455b-996f-c7b9a83f2707","duration":5000,"error":"connection dial timeout"}]"


Make sure that the exposed ports are the ports your app is listening to.


0x5b62656e5d

Make sure that the exposed ports are the ports your app is listening to.

3 months ago

Yes I have configured them correctly on 8000


Did you ensure that your entrypoint.sh actually starts/executes gunicorn?


3 months ago

# Start Gunicorn on port 8000 (matches EXPOSE in Dockerfile)

echo "=== STARTING GUNICORN ON PORT 8000 ==="

exec gunicorn attendance_system.wsgi:application \

--bind 0.0.0.0:8000 \

--workers 3 \

--timeout 120 \

--access-logfile - \

--error-logfile - \

--log-level info


0x5b62656e5d

Did you ensure that your entrypoint.sh actually starts/executes gunicorn?

I assume this means that the URLs/domains configured in the Networking section are routed to port 8000 as well?

Also, from your deployment logs, it doesn't seem like Gunicorn is starting correctly. The line === STARTING GUNICORN ON PORT 8000 === isn't present. I'd check your script to see if there are any execution errors or any crashes.


0x5b62656e5d

I assume this means that the URLs/domains configured in the Networking section are routed to port 8000 as well?Also, from your deployment logs, it doesn't seem like Gunicorn is starting correctly. The line === STARTING GUNICORN ON PORT 8000 === isn't present. I'd check your script to see if there are any execution errors or any crashes.

3 months ago

It is something like this

I am not sure why I can't see === STARTING GUNICORN ON PORT 8000 ===
The other deployment for another app I did, had no such consequences.

Attachments


tyrafero

It is something like thisI am not sure why I can't see === STARTING GUNICORN ON PORT 8000 ===The other deployment for another app I did, had no such consequences.

3 months ago

Ah, I'm pretty sure you have your networking setup wrong!

You don't need the TCP proxy. You should be able to remove that.

then click "edit" (the square with the pencil in it) on the domain option. If any process is running on a port you should be able to select it from there and it'll bind the port to your domain.

If there's nothing there, I don't think gunicorn is even starting and you'll need to diagnose why your docker image isn't running properly past the migrations step


mykal

Ah, I'm pretty sure you have your networking setup wrong!You don't need the TCP proxy. You should be able to remove that.then click "edit" (the square with the pencil in it) on the domain option. If any process is running on a port you should be able to select it from there and it'll bind the port to your domain.If there's nothing there, I don't think gunicorn is even starting and you'll need to diagnose why your docker image isn't running properly past the migrations step

This ^

From logs, the script seems to have exited or crashed before reaching the part where it execs gunicorn.
I'd test locally and perhaps add more echo in script to see where exactly it crashed during deployment on Railway.


0x5b62656e5d

This ^From logs, the script seems to have exited or crashed before reaching the part where it execs gunicorn.I'd test locally and perhaps add more echo in script to see where exactly it crashed during deployment on Railway.

3 months ago

Agreed with this diagnosis. My first step if I was in your shoes would be to validate that the built image runs just fine on your dev machine with the same environment variables / database connections that you're using on Railway


3 months ago

Hi guys,
This has been solved. So I had entrypoint.sh and procfile in my project. Adding echo in the script helped as it was showing nothing in deploy logs. That's how I figured out.

Thanks for your feedback.


Status changed to Solved brody 3 months ago


Loading...