Bug with query statistics feature in Priority Boarding
craftzman7
PROOP

12 days ago

When enabling query statistics from the new metrics panel, the database restarts and throws itself into a boot loop. I found a previous help thread where someone managed to trigger this manually and to resolve it, a Railway engineer had to intervene. https://discord.com/channels/713503345364697088/1119792911819751566/1119792911819751566

Solved

70 Replies

craftzman7
PROOP

12 days ago

Project ID is 750a6ff7-b90f-4885-88b9-e69f7ba105be


uxuz
MODERATOR

12 days ago

Hmmm, I wonder if this is now something you could technically change yourself using SSH with the Railway CLI.


uxuz
MODERATOR

12 days ago

@Crafter could you try to SSH into your service and run cat /var/lib/postgresql/data/pgdata/[postgresql.auto](postgresql.auto).conf and see if timescaledb,pg_stat_statements is listed there?


uxuz
MODERATOR

12 days ago

I am also going to escalate this so that it doesn't happen with other users.


uxuz
MODERATOR

12 days ago

!t


uxuz
MODERATOR

12 days ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response uxuz 12 days ago


Noted, I am going to quickly fast follow so that we can have a "reset" in the metrics panel so you (and others) can recover. For our information, can you tell me when the bootloop happened, when you enabled it?


Status changed to Awaiting User Response Railway 12 days ago


Railway
BOT

12 days ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!


craftzman7
PROOP

12 days ago

Unable to SSH


craftzman7
PROOP

12 days ago

the container crashes entirely


craftzman7
PROOP

12 days ago

Another note, I did change the image from the old railway timescale image that was formerly the default to the new one to see if that fixed it but it did not


craftzman7
PROOP

12 days ago

The boot loop happened as soon as I confirmed that the container would be restarted


uxuz
MODERATOR

12 days ago

One way to SSH into the container would be to attach the volume to another running service that doesn't have a volume and SSH into the container this way. (You might as well just use the file browser template at this point.)


craftzman7
PROOP

12 days ago

1467324331456729000


paulo
EMPLOYEE

12 days ago

Oh I see the issue, really sorry that happened, please try to remove the inner quotes and it should work again. I will fix it


craftzman7
PROOP

12 days ago

🫡


paulo
EMPLOYEE

12 days ago

I asked in the PB thread, but how did you get timescale in the binary and preloaded in the db if you use ghcr.io/railwayapp-templates/postgres-ssl:15 as the source?


paulo
EMPLOYEE

12 days ago

That should not be possible


paulo
EMPLOYEE

12 days ago

Or at least not without some source trickery and a very old service


craftzman7
PROOP

12 days ago

This is quite an old service



craftzman7
PROOP

12 days ago

The former default was the timescale image


paulo
EMPLOYEE

12 days ago

But did you change the source?


paulo
EMPLOYEE

12 days ago

That's really risky tbh


paulo
EMPLOYEE

12 days ago

Ahh


craftzman7
PROOP

12 days ago

Yep


paulo
EMPLOYEE

12 days ago

Honestly, surprised it worked


craftzman7
PROOP

12 days ago

I think I deployed this in 2023?


craftzman7
PROOP

12 days ago

It's been on Railway for quite a while. I've kind of been in limbo in terms of updating the database it because it's so damn old.


craftzman7
PROOP

12 days ago

Looks like I have to revert back now as it can't find Timescale


paulo
EMPLOYEE

12 days ago

Yeah, the new source doesnt have it


paulo
EMPLOYEE

12 days ago

What was the old source?


craftzman7
PROOP

12 days ago

lemme find it


craftzman7
PROOP

12 days ago

It's on the Railway app templates Github


craftzman7
PROOP

12 days ago

ghcr.io/railwayapp-templates/timescale-postgis-ssl:pg15-ts2.12


paulo
EMPLOYEE

12 days ago

Cause timescale shouldnt have the Data UI working, at least new deployments dont have (although I would love to support, its not something we test)


paulo
EMPLOYEE

12 days ago

interesting


craftzman7
PROOP

12 days ago

I don't actually use any of the Timescale features


craftzman7
PROOP

12 days ago

If that matters


paulo
EMPLOYEE

12 days ago

so you can just remove the timescale from preloaded libraries tbh


paulo
EMPLOYEE

12 days ago

actually dont do that


paulo
EMPLOYEE

12 days ago

im not sure how timescale affects the data folder, so let's try to decrease entropy and return to the original state



craftzman7
PROOP

12 days ago

Uhhhh


craftzman7
PROOP

12 days ago

the UI won't let me change the image


craftzman7
PROOP

12 days ago

I press enter and it doesn't update


craftzman7
PROOP

12 days ago

I'm stupid, nevermind


craftzman7
PROOP

12 days ago

erm

1467327662400733400


craftzman7
PROOP

12 days ago

Any clue on how to fix that 😅


paulo
EMPLOYEE

12 days ago

i think you mounted the wrong path


paulo
EMPLOYEE

12 days ago

but honestly unsure

1467328047785967600


craftzman7
PROOP

12 days ago

/var/lib/postgresql/data


paulo
EMPLOYEE

12 days ago

<:hmm:1228339204048814080>


craftzman7
PROOP

12 days ago

It said it was used 6 months ago


craftzman7
PROOP

12 days ago

¯_(ツ)_/¯


craftzman7
PROOP

12 days ago

I will note, this only happened after changing the image back


paulo
EMPLOYEE

12 days ago

Sorry I'm not sure what is going on here, I don't have a lot of details on how the timescale template worked, so I may have to leave it for oncall


paulo
EMPLOYEE

12 days ago

You could try removing timescale from preloaded, but honestly please take a backup before doing any risky operation on your volume


craftzman7
PROOP

12 days ago

Should I just download the entire data directory? I'm not on Pro so I don't have backups


paulo
EMPLOYEE

12 days ago

Changing the source has a high risk of corrupting the data


craftzman7
PROOP

12 days ago

I could also just upgrade tbf


craftzman7
PROOP

12 days ago

Is the "create backup" button under the database service sufficient?


paulo
EMPLOYEE

12 days ago

yes! that said it wont auto recover service config

so that backup will work on the specific image source + volume mount path it was made to work on, if you further change things, the backup is only guaranteed to work if you revert


craftzman7
PROOP

12 days ago

ah okay


craftzman7
PROOP

12 days ago

Alrighty I removed Timescale from the shared libraries and switched to the newer image. I had to purge the certs directory as it was causing permission issues, Postgres regenerated them.


craftzman7
PROOP

12 days ago

!s


craftzman7
PROOP

12 days ago

oh is that not a thing anymore


craftzman7
PROOP

12 days ago

P.S does this make me eligible for Bug Basher 🥺


paulo
EMPLOYEE

10 days ago

Ahhh makes sense that the certs were conflicting between the two sources


paulo
EMPLOYEE

10 days ago

Great to know it's fixed, I've updated the logic to prevent that


paulo
EMPLOYEE

10 days ago

And Brody gave you the role!


Status changed to Solved brody 9 days ago


Loading...