Internal Hostname Resolution Failure in Django Project (captivating-encouragement) - Name or service not known

tintin2021git
FREE

8 days ago

Subject: Internal Hostname Resolution Failure in Django Project (captivating-encouragement) - Name or service not known

Hello Railway Team,

I am experiencing persistent issues with internal service communication in my Django project, captivating-encouragement, deployed on Railway.

Project Overview:

  • Django application (Python 3.12)

  • Three services deployed on Railway: web, worker, beat

  • Using PostgreSQL for the database and Redis for cache/broker

Current Problem:

  • While the web service is active, both worker and beat services are unable to connect to the database (PostgreSQL) and Redis.

  • The specific error indicates a failure to resolve internal hostnames (e.g., postgres.railway.internal, redis.railway.internal).

Service Status and Errors:

  • web service: Currently Active.

  • worker service:

    • Continuously logs the following error, indicating a failure to connect to Redis: consumer: Cannot connect to redis://default:**@redis.railway.internal:6379//: Error -2 connecting to redis.railway.internal:6379. Name or service not known.

  • beat service:

    • Currently Crashed.

    • Continuously logs the following error, indicating a failure to connect to PostgreSQL: django.db.utils.OperationalError: could not translate host name "postgres.railway.internal" to address: Name or service not known

Troubleshooting Steps Taken (all changes pushed to GitHub and reflected on Railway):

  1. Environment Variables:

    • DATABASE_URL and REDIS_URL have been manually set as environment variables for all services (web, worker, beat) in the Railway dashboard.

    • Screenshot of web service variables tab (Screenshot from 2025-06-18 20-17-36.png) is attached/available.

  2. settings.py Adjustments:

    • Configured DATABASE_URL and REDIS_URL to be read from environment variables using os.environ.get(), with internal Railway URLs as fallback default values.

    • conn_health_checks for DATABASES is set to False.

    • CELERY_WORKER_STATE_DB is also configured.

  3. Procfile Adjustments:

    • Added python /home/tintin/prog/py/kokkai/manage.py migrate --noinput and python /home/tintin/prog/py/kokkai/manage.py collectstatic --noinput to the web service startup command to ensure they run on deploy.

    • Added -b ${REDIS_URL} option to Celery startup commands for worker and beat services to explicitly specify the broker URL.

  4. Redeployment Operations:

    • Initiated automatic deployments via git push after every code change.

    • Performed "Rebuild Cache" and "Redeploy" for the web service from the dashboard.

Problem Summary: All services, the PostgreSQL database, and Redis are within the same Railway project. Despite this, service containers are unable to resolve each other's internal hostnames. This strongly suggests an underlying issue with Railway's internal DNS resolution or network routing.

Could you please assist in investigating and resolving this issue?

Thank you for your time and support.

$10 Bounty

3 Replies

chandrika
EMPLOYEE

8 days ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open chandrika 8 days ago


quadstrikesecurity
FREE

8 days ago

Try this to fix your problem:

Immediate Solutions

1. Use Service References (Recommended)

Replace manual environment variables with Railway's service references:

In Railway Dashboard → Variables:

# Remove manual DATABASE_URL and REDIS_URL

# Add service references instead:

DATABASE_URL=${{Postgres.DATABASE_URL}}

REDIS_URL=${{Redis.REDIS_URL}}

2. Update Django Settings

# settings.py

import dj_database_url

import os

# Database

DATABASES = {

'default': dj_database_url.parse(

os.environ.get('DATABASE_URL', 'sqlite:///db.sqlite3')

)

}

# Redis/Celery

REDIS_URL = os.environ.get('REDIS_URL', 'redis://localhost:6379')

CELERY_BROKER_URL = REDIS_URL

CELERY_RESULT_BACKEND = REDIS_URL

3. Fix Procfile

web: python manage.py migrate --noinput && python manage.py collectstatic --noinput && gunicorn kokkai.wsgi

worker: celery -A kokkai worker --loglevel=info

beat: celery -A kokkai beat --loglevel=info


tintin2021git
FREE

8 days ago

Hello team, I am still experiencing the "OperationalError: could not translate host name 'postgres.railway.internal' to address: Name or service not known" for my beat service. Here's an update on the current situation and what I've tried after your last advice: * Current Service Status:web and worker services are now running successfully (green checkmark). They are able to connect to PostgreSQL and Redis respectively. * Issue Scope: The problem is now exclusively with the beat service, which consistently crashes due to the PostgreSQL hostname resolution failure. * Troubleshooting Attempted: * Verified DATABASE_URL and REDIS_URL in Railway dashboard are set to direct internal URLs (`postgresql://...`, redis://...) for all services. * Updated settings.py to robustly parse DATABASE_URL and REDIS_URL using os.environ.get() as discussed. * Added a sleep 5 to the beat service command in the Procfile (as seen in the latest logs, the sleep executed, but the error persisted). * Confirmed that worker service now appears to be connecting to Redis successfully. It seems beat service specifically struggles with postgres.railway.internal while other services in the same project can connect. This suggests a deeper platform-level network/DNS issue affecting only the beat container or its interaction with PostgreSQL. Could you please re-evaluate this persistent issue, as all application-level configurations recommended have been applied, and the problem seems to be infrastructure-related? Thank you for your continued patience and support.


sim
FREETop 5% Contributor

5 days ago

Can you try restart the beat service and see if it picks up the postres.railway.internal?