Bug with query statistics feature in Priority Boarding
craftzman7
PROOP

4 months ago

When enabling query statistics from the new metrics panel, the database restarts and throws itself into a boot loop. I found a previous help thread where someone managed to trigger this manually and to resolve it, a Railway engineer had to intervene. https://discord.com/channels/713503345364697088/1119792911819751566/1119792911819751566

Solved

70 Replies

craftzman7
PROOP

4 months ago

Project ID is 750a6ff7-b90f-4885-88b9-e69f7ba105be


4 months ago

Hmmm, I wonder if this is now something you could technically change yourself using SSH with the Railway CLI.


4 months ago

@Crafter could you try to SSH into your service and run cat /var/lib/postgresql/data/pgdata/postgresql.auto.conf and see if timescaledb,pg_stat_statements is listed there?


4 months ago

I am also going to escalate this so that it doesn't happen with other users.


4 months ago

!t


4 months ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response uxuz 4 months ago


Noted, I am going to quickly fast follow so that we can have a "reset" in the metrics panel so you (and others) can recover. For our information, can you tell me when the bootloop happened, when you enabled it?


Status changed to Awaiting User Response Railway 4 months ago


Railway
BOT

4 months ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!


craftzman7
PROOP

4 months ago

Unable to SSH


craftzman7
PROOP

4 months ago

the container crashes entirely


craftzman7
PROOP

4 months ago

Another note, I did change the image from the old railway timescale image that was formerly the default to the new one to see if that fixed it but it did not


craftzman7
PROOP

4 months ago

The boot loop happened as soon as I confirmed that the container would be restarted


4 months ago

One way to SSH into the container would be to attach the volume to another running service that doesn't have a volume and SSH into the container this way. (You might as well just use the file browser template at this point.)


craftzman7
PROOP

4 months ago

1467324331456729108


4 months ago

Oh I see the issue, really sorry that happened, please try to remove the inner quotes and it should work again. I will fix it


craftzman7
PROOP

4 months ago

🫡


4 months ago

I asked in the PB thread, but how did you get timescale in the binary and preloaded in the db if you use ghcr.io/railwayapp-templates/postgres-ssl:15 as the source?


4 months ago

That should not be possible


4 months ago

Or at least not without some source trickery and a very old service


craftzman7
PROOP

4 months ago

This is quite an old service



craftzman7
PROOP

4 months ago

The former default was the timescale image


4 months ago

But did you change the source?


4 months ago

That's really risky tbh


4 months ago

Ahh


craftzman7
PROOP

4 months ago

Yep


4 months ago

Honestly, surprised it worked


craftzman7
PROOP

4 months ago

I think I deployed this in 2023?


craftzman7
PROOP

4 months ago

It's been on Railway for quite a while. I've kind of been in limbo in terms of updating the database it because it's so damn old.


craftzman7
PROOP

4 months ago

Looks like I have to revert back now as it can't find Timescale


4 months ago

Yeah, the new source doesnt have it


4 months ago

What was the old source?


craftzman7
PROOP

4 months ago

lemme find it


craftzman7
PROOP

4 months ago

It's on the Railway app templates Github


craftzman7
PROOP

4 months ago

ghcr.io/railwayapp-templates/timescale-postgis-ssl:pg15-ts2.12


4 months ago

Cause timescale shouldnt have the Data UI working, at least new deployments dont have (although I would love to support, its not something we test)


4 months ago

interesting


craftzman7
PROOP

4 months ago

I don't actually use any of the Timescale features


craftzman7
PROOP

4 months ago

If that matters


4 months ago

so you can just remove the timescale from preloaded libraries tbh


4 months ago

actually dont do that


4 months ago

im not sure how timescale affects the data folder, so let's try to decrease entropy and return to the original state



craftzman7
PROOP

4 months ago

Uhhhh


craftzman7
PROOP

4 months ago

the UI won't let me change the image


craftzman7
PROOP

4 months ago

I press enter and it doesn't update


craftzman7
PROOP

4 months ago

I'm stupid, nevermind


craftzman7
PROOP

4 months ago

erm

1467327662400733340


craftzman7
PROOP

4 months ago

Any clue on how to fix that 😅


4 months ago

i think you mounted the wrong path


4 months ago

but honestly unsure

1467328047785967656


craftzman7
PROOP

4 months ago

/var/lib/postgresql/data


4 months ago

<:hmm:1228339204048814080>


craftzman7
PROOP

4 months ago

It said it was used 6 months ago


craftzman7
PROOP

4 months ago

¯_(ツ)_/¯


craftzman7
PROOP

4 months ago

I will note, this only happened after changing the image back


4 months ago

Sorry I'm not sure what is going on here, I don't have a lot of details on how the timescale template worked, so I may have to leave it for oncall


4 months ago

You could try removing timescale from preloaded, but honestly please take a backup before doing any risky operation on your volume


craftzman7
PROOP

4 months ago

Should I just download the entire data directory? I'm not on Pro so I don't have backups


4 months ago

Changing the source has a high risk of corrupting the data


craftzman7
PROOP

4 months ago

I could also just upgrade tbf


craftzman7
PROOP

4 months ago

Is the "create backup" button under the database service sufficient?


4 months ago

yes! that said it wont auto recover service config

so that backup will work on the specific image source + volume mount path it was made to work on, if you further change things, the backup is only guaranteed to work if you revert


craftzman7
PROOP

4 months ago

ah okay


craftzman7
PROOP

4 months ago

Alrighty I removed Timescale from the shared libraries and switched to the newer image. I had to purge the certs directory as it was causing permission issues, Postgres regenerated them.


craftzman7
PROOP

4 months ago

!s


craftzman7
PROOP

4 months ago

oh is that not a thing anymore


craftzman7
PROOP

4 months ago

P.S does this make me eligible for Bug Basher 🥺


4 months ago

Ahhh makes sense that the certs were conflicting between the two sources


4 months ago

Great to know it's fixed, I've updated the logic to prevent that


4 months ago

And Brody gave you the role!


Status changed to Solved brody 4 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...