Production site is not working

Anonymous
PROOP

17 days ago

The Production environment on Railway is currently not working, while the QA environment is functioning correctly.

To debug the issue, I attempted to connect both environment databases to my local setup:

  • The QA database connects successfully and works as expected.

  • The Production database fails to connect from the local environment.

However, when connecting the Production database via pgAdmin, the connection is successful and queries execute without issues.

This indicates that the problem is likely not with the Production database itself, but possibly related to network access, environment configuration, or connection settings used by the Production service or local environment.

Solved$20 Bounty

12 Replies

Railway
BOT

17 days ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


17 days ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody 17 days ago


monuit
PROTop 10% Contributor

17 days ago

hey, did you test any potential mismatch between qa/prod related to SSL settings? (e.g., sslmode=require vs verify-full or missing CA)

when you say fails to connect from local env - do you have any logs? is it a connection refusal, is it timing out, etc?


Anonymous
PROOP

17 days ago

In local i am getting connection timeout issue


monuit

hey, did you test any potential mismatch between qa/prod related to SSL settings? (e.g., sslmode=require vs verify-full or missing CA)when you say fails to connect from local env - do you have any logs? is it a connection refusal, is it timing out, etc?

Anonymous
PROOP

17 days ago

In local i am getting connection timeout issue


monuit

hey, did you test any potential mismatch between qa/prod related to SSL settings? (e.g., sslmode=require vs verify-full or missing CA)when you say fails to connect from local env - do you have any logs? is it a connection refusal, is it timing out, etc?

Anonymous
PROOP

17 days ago

We deployed to production yesterday, and everything was working fine last night during our sanity testing. However, since this morning, none of the APIs are responding—they take a long time to process and eventually time out.

To investigate, we first tried connecting the production database from the local environment, but the connection also timed out. Interestingly, the production database works fine when accessed via pgAdmin.

Next, we checked the QA environment, and it’s working as expected. The QA database connection from local is also successful.

Additionally, I created a duplicate of the production backend server but configured it to use the QA database, and it worked fine as well.

Based on these tests, the most likely cause seems to be an issue with the production database, which is a PostgreSQL instance deployed on Railway. We’re continuing to debug to identify the exact root cause.


We deployed to production yesterday, and everything was working fine last night during our sanity testing. However, since this morning, none of the APIs are responding—they take a long time to process and eventually time out.To investigate, we first tried connecting the production database from the local environment, but the connection also timed out. Interestingly, the production database works fine when accessed via pgAdmin.Next, we checked the QA environment, and it’s working as expected. The QA database connection from local is also successful.Additionally, I created a duplicate of the production backend server but configured it to use the QA database, and it worked fine as well.Based on these tests, the most likely cause seems to be an issue with the production database, which is a PostgreSQL instance deployed on Railway. We’re continuing to debug to identify the exact root cause.

monuit
PROTop 10% Contributor

17 days ago

did you try to just redeploy your latest commits on prod for the services? im potentially thinking it's that (if you duplicated the backend services and it worked fine)


monuit

did you try to just redeploy your latest commits on prod for the services? im potentially thinking it's that (if you duplicated the backend services and it worked fine)

Anonymous
PROOP

17 days ago

Yes i redeployed the server but it didn't work.


monuit
PROTop 10% Contributor

17 days ago

i think youd likely have to troubleshoot exactly the difference between qa/prod. if you happen to run:

psql "$PROD_DATABASE_URL" -c "select inet_server_addr(), inet_server_port(), now();"

and it winds up timing out, its a network/DNS issue. if it returns, something else on the client side is different. you can also look at whether your db has connection pool exhaustion, so you can run something like

SELECT state, count(*) FROM pg_stat_activity GROUP BY 1;.

and either fix pooling or raise it (if it's near max connections).

check the I/O, see if you have any queries causing the db to be slow as well select * from pg_stat_bgwriter; if they are, youd need to vacum the tables.

if it happens to be a network/DNS issue, you can proxy through a TCP port if it's through a public network (if it's already there, id disable it and re-configure it), or force IPv4 with hostaddr by doing something like:

dig +short prod-db.internal.railway

psql "postgresql://postgres:mysecret@prod-db.internal.railway:5432/mydb?sslmode=require&hostaddr=123.45.67.89"


monuit

i think youd likely have to troubleshoot exactly the difference between qa/prod. if you happen to run:psql "$PROD_DATABASE_URL" -c "select inet_server_addr(), inet_server_port(), now();"and it winds up timing out, its a network/DNS issue. if it returns, something else on the client side is different. you can also look at whether your db has connection pool exhaustion, so you can run something likeSELECT state, count(*) FROM pg_stat_activity GROUP BY 1;.and either fix pooling or raise it (if it's near max connections).check the I/O, see if you have any queries causing the db to be slow as well select * from pg_stat_bgwriter; if they are, youd need to vacum the tables.if it happens to be a network/DNS issue, you can proxy through a TCP port if it's through a public network (if it's already there, id disable it and re-configure it), or force IPv4 with hostaddr by doing something like:dig +short prod-db.internal.railwaypsql "postgresql://postgres:mysecret@prod-db.internal.railway:5432/mydb?sslmode=require&hostaddr=123.45.67.89"

Anonymous
PROOP

17 days ago

I noticed another issue: on Railway, within the database architecture view, the database itself isn’t connecting — it just keeps showing a loading spinner.

Attachments


monuit

i think youd likely have to troubleshoot exactly the difference between qa/prod. if you happen to run:psql "$PROD_DATABASE_URL" -c "select inet_server_addr(), inet_server_port(), now();"and it winds up timing out, its a network/DNS issue. if it returns, something else on the client side is different. you can also look at whether your db has connection pool exhaustion, so you can run something likeSELECT state, count(*) FROM pg_stat_activity GROUP BY 1;.and either fix pooling or raise it (if it's near max connections).check the I/O, see if you have any queries causing the db to be slow as well select * from pg_stat_bgwriter; if they are, youd need to vacum the tables.if it happens to be a network/DNS issue, you can proxy through a TCP port if it's through a public network (if it's already there, id disable it and re-configure it), or force IPv4 with hostaddr by doing something like:dig +short prod-db.internal.railwaypsql "postgresql://postgres:mysecret@prod-db.internal.railway:5432/mydb?sslmode=require&hostaddr=123.45.67.89"

Anonymous
PROOP

17 days ago

I checked connection issue and it shows:
current_connections: 21
max_connections: 500

Statistics related to the background writer:
"checkpoints_timed","checkpoints_req","checkpoint_write_time","checkpoint_sync_time","buffers_checkpoint","buffers_clean","maxwritten_clean","buffers_backend","buffers_backend_fsync","buffers_alloc","stats_reset" "30","1","186537","199","1869","0","0","4","0","7088","2025-10-31 06:56:22.187314+00"

The above results are good i guess.

New thing:
I am able to connect prod DB via PGAdmin and connecting string through terminal


caullenomdahl
HOBBY

17 days ago

The loading spinner in Railway's dashboard is the smoking gun - Railway's health check is failing to reach your database. This breaks private network routing, which is why your backend services timeout but external tools (pgAdmin, terminal) work fine.

Quick fix: Switch your backend to use DATABASE_PUBLIC_URL instead of DATABASE_PRIVATE_URL temporarily, then restart the database service from Railway dashboard. The health check should recover and private networking will come back.

If that doesn't work, check if you have a custom healthcheck configured that's pointing to the wrong port or path.


I noticed another issue: on Railway, within the database architecture view, the database itself isn’t connecting — it just keeps showing a loading spinner.

16 days ago

Hello!
That was a bug that should have been fixed as of now.


Status changed to Awaiting User Response Railway 16 days ago


Railway
BOT

9 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 9 days ago


Loading...