Database service locked up

10 months ago

Unreachable over the private or public network?

10 months ago

over private networking as it seems but weirdely enough I can reach it over Tailscale

10 months ago

does this help?

1418658781155561654

10 months ago

What is the source service that is trying to access the database

https://railway.com/project/357a7a66-a372-47f0-b2ae-2d8e2b6f1f32/service/8812873e-b82c-46c5-82d3-e121bbc8580e?environmentId=86853589-64b5-48b3-9eda-2174a7ce26b2

10 months ago

10 months ago

Can the data tab access it if you add a TCP proxy

10 months ago

Yeah I'm able

10 months ago

moving it over to us-west also didn't make any difference, healthchecks don't even go through

10 months ago

I'm not seeing other reports, and the database and backend aren't using the beta IPv4 networking, so I'm not sure of the issue.

I'm also not seeing any errors in the logs besides the failing health check?

10 months ago

let me try railway ssh

10 months ago

psql is able to connect, yeah might be our fault

10 months ago

will investigate more

10 months ago

psql over the private network?

10 months ago

did ssh into the service container, installed psql there and a connection was made

10 months ago

just weird that we're getting these errors from the database

1418663592391348335

10 months ago

even tho no deploy was made and the postgres metrics is normal

10 months ago

What is your timeout set to?

10 months ago

whatever typeorm uses by default

10 months ago

i'll try increasing it but doubt its that

10 months ago

even satellite services, with totally different source code than ours, are also unable to connect to our databse

ghaithzamrik

PRO

10 months ago

I don't know if it's related but I am having something somewhat similar, one of my services stopped working and when restarting the deploy fails on the health check. it looks like it might be unable to connect to the pg db that I have running, but I can connect to it over public network, (maybe private network issue?) nothing have changed in the service in the last few days no new deployments no changes. Any support would be appreaciated

10 months ago

same here, still unable to debug

10 months ago

seems like that some connections go through

10 months ago

now the problem is also affecting our other project, completely unrelated

ghaithzamrik

PRO

10 months ago

One strange thing I noticed is that the "Architecture" UI for a PG DB usually show how much of the db storage is used, and it does for the project that I still have running fine, but not longer does that for the one that is having the problem

see the difference in the screenshots

1418679224494981130

1418679224822403092

10 months ago

ohh same here

10 months ago

wish I could dettach the volume re-attach to another service

10 months ago

tried to do a backup and restore from it, still having issues

10 months ago

all of our major providers are still up and no issues whatsover

godiexk

PRO

10 months ago

I have the same problem. I can access it internally from my Node app, but it's inaccessible from an external app. It's not possible to access it from DBeaver or a Java connection.

10 months ago

can anyone from the Railway team confirm that they're looking into it? would keep me calm

godiexk

PRO

10 months ago

HELP!! railway team, conexion not found

10 months ago

dumping the database and restoring it into another service solved my issue for one of my projects

volume size appears ok without any problems

10 months ago

When did you all first see errors?

10 months ago

14:30-14:50 Brazilian time

10 months ago

my only issue now is with this database:

10 months ago

I gotta start asking for timestamps in UTC

10 months ago

in your timezone:

postgres-production-5471.up.railway.app

10 months ago

Please provide a direct link to your database.

godiexk

PRO

10 months ago

monorail.proxy.rlwy.net:50215

postgresql://postgres:e5aC6EAaBGGeFG22fDB6e32EDbf13cgf@monorail.proxy.rlwy.net:50215/railway

10 months ago

I'm sorry but that's not quite what I asked for, please provide the URL of your browser's omni bar while opened to the database.

Railway

BOT

10 months ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!

https://railway.com/project/9ad07647-f3d4-4040-b3c5-a8d4fd9e7f64/service/2c00e877-9da2-4a5e-ac4c-16c54454df8c/database?environmentId=ed012bfc-147a-4f70-a11f-f17a625850cb&state=table&table=persona

10 months ago

I've rasied this to the infra team.

godiexk

PRO

10 months ago

godiexk

PRO

10 months ago

Please, my job depends on this, I have clients working who can't use the service.

10 months ago

people that highly depends on their service, do a pg_dump and pg_restore to another service, I'm in the middle of doing it for another project of ours.

10 months ago

also, use an ubuntu container and railway ssh for a faster dump

godiexk

PRO

10 months ago

How do you connect? I can't connect.

10 months ago

just did a pg_restore and pg_dump for both of our databases and they're back up again, feel free to do anything to those services (well, as long as you don't delete them)

10 months ago

make sure to increase your connections count to a really high value and then try to connect

10 months ago

our connections were pilling up and thus we were getting too many clients

10 months ago

We are actively looking into the cause.

10 months ago

and obviously, run a railway backup just to be sure

godiexk

PRO

10 months ago

it already works!! thanks

9 months ago

Hi, can I know what happened?

9 months ago

A host's networking locked up.

Railway

BOT

9 months ago

✅ The ticket Database performance issue has been marked as completed.

9 months ago

great to know, would a high availability pg cluster prevent that from happening in the future or was that happening on the service itself? looking for ways to prevent that from happening again.

9 months ago

Unlikely, since something could go wrong with the pooler service, there's still a single point of failure.

9 months ago

there's probably someway to replicate that, for the service would replicas do the trick? i dont know if they're deployed to the same host

9 months ago

They are not deployed on the same host, but then your own code would have to handle fallback to another pooler if one isn't available, since we don't handle that on the private network

9 months ago

probably i would also need an API gateway to automatically fail over in case a service replica goes down, damn HA is hard 💀

9 months ago

fair enough, will look into ways, thanks brody