Web service deploy stuck on "Starting Container" after networking outage — both private and public DB URLs timeout
hwhelchel
PROOP

a month ago

Hi Railway team,

My web service deploys have been failing since today's private networking/traffic outage (incident started ~12:24 PM). The builds complete successfully but deploys hang indefinitely at "Starting Container" with no further output.

Project details:

  • Region: us-east4

  • Service: Web (Rails app using Dockerfile.web)

  • Database: Railway-managed PostgreSQL

What's happening:

  • The container starts and runs bin/rails db:migrate via the Docker entrypoint

  • The migration step attempts to connect to Postgres and the TCP connection hangs forever — no timeout error, no output, just silence

  • The build step completes fine (9 seconds cached, 331 seconds uncached)

What I've tried:

  1. Restarted the Postgres service (shows healthy/online)

  2. Switched DATABASE_URL to the private internal URL (fd12:...) — hangs

  3. Switched DATABASE_URL to ${{database.DATABASE_PUBLIC_URL}} ([redacted].proxy.rlwy.net:[port]) — also hangs

  4. Multiple redeploys after each change

  5. Waited 10+ minutes per deploy attempt

Error from earlier deploy (when it did eventually timeout on private URL):

ActiveRecord::ConnectionNotEstablished: connection to server at "fd12:xxxx:xxxx:...", port 5432 failed: Connection timed out

Timeline:

  • App was last deployed successfully ~3 months ago and was running fine

  • No code changes — this started during today's networking outage

  • Status page shows the incident as resolved, but connectivity is still broken for my project on both private and public endpoints

It appears the networking outage recovery didn't fully restore connectivity for my project/region. Could someone look into this? Happy to provide project ID or any other details via DM.

Thanks!

Solved

5 Replies

hikieadmin
PRO

a month ago

ı have the same problem in mongodb, connection is timed out


hwhelchel
PROOP

a month ago

Update 2: Attempted to create a new web service as a workaround. Received error:

> Error updating private network - automatic repair triggered

This confirms private networking is still broken for my project despite the incident being marked resolved. The automatic repair has not succeeded — deploys remain stuck.


hwhelchel
PROOP

a month ago

Update 3: Created a brand new web service as a workaround. Build fails immediately with:

> Cache mount ID is not prefixed with cache key

This is a Railway build infrastructure error on Docker BuildKit cache mounts. Combined with the "Error updating private network - automatic repair triggered" message when creating the service, it appears my project's infrastructure (networking + build cache) is still degraded from the earlier outage.

Project region: us-east4

This is blocking all deployments. Any help expediting the repair would be appreciated.


chandrika
EMPLOYEE

a month ago

Hi there, I'm sorry you had to attempt all those workarounds. I'm seeing the services are up and green now, the web service is currently sleeping and I was able to access it at the web service's URL (it was sleeping at first)

Are you still running into any issues?


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 26 days ago


chandrika

Hi there, I'm sorry you had to attempt all those workarounds. I'm seeing the services are up and green now, the web service is currently sleeping and I was able to access it at the web service's URL (it was sleeping at first)Are you still running into any issues?

hwhelchel
PROOP

16 days ago

Experiencing a similar issue right now — deploys failing with 'no permission to execute start command' on worker and pre-deploy failure on web. Commands work fine inside running containers. Started ~18 hours ago, no code changes to Dockerfiles or permissions. Could this be related?

https://station.railway.com/questions/deploy-failing-we-don-t-have-permissi-48db9af1


Status changed to Awaiting Railway Response Railway 16 days ago


Status changed to Solved brody 15 days ago


Loading...