Bug with query statistics feature in Priority Boarding
craftzman7
PROOP

a month ago

When enabling query statistics from the new metrics panel, the database restarts and throws itself into a boot loop. I found a previous help thread where someone managed to trigger this manually and to resolve it, a Railway engineer had to intervene. https://discord.com/channels/713503345364697088/1119792911819751566/1119792911819751566

Solved

70 Replies

craftzman7
PROOP

a month ago

Project ID is 750a6ff7-b90f-4885-88b9-e69f7ba105be


a month ago

Hmmm, I wonder if this is now something you could technically change yourself using SSH with the Railway CLI.


a month ago

@Crafter could you try to SSH into your service and run cat /var/lib/postgresql/data/pgdata/postgresql.auto.conf and see if timescaledb,pg_stat_statements is listed there?


a month ago

I am also going to escalate this so that it doesn't happen with other users.


a month ago

!t


a month ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response uxuz about 1 month ago


Noted, I am going to quickly fast follow so that we can have a "reset" in the metrics panel so you (and others) can recover. For our information, can you tell me when the bootloop happened, when you enabled it?


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!


craftzman7
PROOP

a month ago

Unable to SSH


craftzman7
PROOP

a month ago

the container crashes entirely


craftzman7
PROOP

a month ago

Another note, I did change the image from the old railway timescale image that was formerly the default to the new one to see if that fixed it but it did not


craftzman7
PROOP

a month ago

The boot loop happened as soon as I confirmed that the container would be restarted


a month ago

One way to SSH into the container would be to attach the volume to another running service that doesn't have a volume and SSH into the container this way. (You might as well just use the file browser template at this point.)


craftzman7
PROOP

a month ago

1467324331456729000


a month ago

Oh I see the issue, really sorry that happened, please try to remove the inner quotes and it should work again. I will fix it


craftzman7
PROOP

a month ago

🫡


a month ago

I asked in the PB thread, but how did you get timescale in the binary and preloaded in the db if you use ghcr.io/railwayapp-templates/postgres-ssl:15 as the source?


a month ago

That should not be possible


a month ago

Or at least not without some source trickery and a very old service


craftzman7
PROOP

a month ago

This is quite an old service



craftzman7
PROOP

a month ago

The former default was the timescale image


a month ago

But did you change the source?


a month ago

That's really risky tbh


a month ago

Ahh


craftzman7
PROOP

a month ago

Yep


a month ago

Honestly, surprised it worked


craftzman7
PROOP

a month ago

I think I deployed this in 2023?


craftzman7
PROOP

a month ago

It's been on Railway for quite a while. I've kind of been in limbo in terms of updating the database it because it's so damn old.


craftzman7
PROOP

a month ago

Looks like I have to revert back now as it can't find Timescale


a month ago

Yeah, the new source doesnt have it


a month ago

What was the old source?


craftzman7
PROOP

a month ago

lemme find it


craftzman7
PROOP

a month ago

It's on the Railway app templates Github


craftzman7
PROOP

a month ago

ghcr.io/railwayapp-templates/timescale-postgis-ssl:pg15-ts2.12


a month ago

Cause timescale shouldnt have the Data UI working, at least new deployments dont have (although I would love to support, its not something we test)


a month ago

interesting


craftzman7
PROOP

a month ago

I don't actually use any of the Timescale features


craftzman7
PROOP

a month ago

If that matters


a month ago

so you can just remove the timescale from preloaded libraries tbh


a month ago

actually dont do that


a month ago

im not sure how timescale affects the data folder, so let's try to decrease entropy and return to the original state



craftzman7
PROOP

a month ago

Uhhhh


craftzman7
PROOP

a month ago

the UI won't let me change the image


craftzman7
PROOP

a month ago

I press enter and it doesn't update


craftzman7
PROOP

a month ago

I'm stupid, nevermind


craftzman7
PROOP

a month ago

erm

1467327662400733400


craftzman7
PROOP

a month ago

Any clue on how to fix that 😅


a month ago

i think you mounted the wrong path


a month ago

but honestly unsure

1467328047785967600


craftzman7
PROOP

a month ago

/var/lib/postgresql/data


a month ago

<:hmm:1228339204048814080>


craftzman7
PROOP

a month ago

It said it was used 6 months ago


craftzman7
PROOP

a month ago

¯_(ツ)_/¯


craftzman7
PROOP

a month ago

I will note, this only happened after changing the image back


a month ago

Sorry I'm not sure what is going on here, I don't have a lot of details on how the timescale template worked, so I may have to leave it for oncall


a month ago

You could try removing timescale from preloaded, but honestly please take a backup before doing any risky operation on your volume


craftzman7
PROOP

a month ago

Should I just download the entire data directory? I'm not on Pro so I don't have backups


a month ago

Changing the source has a high risk of corrupting the data


craftzman7
PROOP

a month ago

I could also just upgrade tbf


craftzman7
PROOP

a month ago

Is the "create backup" button under the database service sufficient?


a month ago

yes! that said it wont auto recover service config

so that backup will work on the specific image source + volume mount path it was made to work on, if you further change things, the backup is only guaranteed to work if you revert


craftzman7
PROOP

a month ago

ah okay


craftzman7
PROOP

a month ago

Alrighty I removed Timescale from the shared libraries and switched to the newer image. I had to purge the certs directory as it was causing permission issues, Postgres regenerated them.


craftzman7
PROOP

a month ago

!s


craftzman7
PROOP

a month ago

oh is that not a thing anymore


craftzman7
PROOP

a month ago

P.S does this make me eligible for Bug Basher 🥺


a month ago

Ahhh makes sense that the certs were conflicting between the two sources


a month ago

Great to know it's fixed, I've updated the logic to prevent that


a month ago

And Brody gave you the role!


Status changed to Solved brody about 1 month ago


Loading...