5 months ago
Every day at 10 am AEST, UTC 00:00, my postgresql database locks up and all queries take 5 to 10x longer.
The metrics graph doesn't show any significant load. When I check long running queries or active queries there are none. I'm not sure what is placing a load on the system.
Is there something running on the Railway management end that would cause this?
Alternatively can we turn on query logging to diagnose the issue?
Attachments
23 Replies
5 months ago
Hey there! We've found the following might help you get unblocked faster:
🧵 "database system was not properly shut down" error when re-deploying Postgres service
🧵 Urgent Assistance Required - Data Recovery for MySQL Service (Project: natural-respect)
If you find the answer from one of these, please let us know by solving the thread!
5 months ago
Hello,
Thank you for bringing this to our attention.
I do indeed see spikes of IO wait on the host that runs your database every day at 0:00 UTC, as you say.
I am going to escalate this to our infra team to see if we can figure out what is causing these spikes and what we can do to stop it.
Best,
Brody
Status changed to Awaiting User Response Railway • 5 months ago
5 months ago
Hello!
We've escalated your issue to our engineering team.
We aim to provide an update within 1 business day.
Please reply to this thread if you have any questions!
5 months ago
Hi Railway,
It's been nearly 24 hours and update?
Status changed to Awaiting Railway Response Railway • 5 months ago
5 months ago
Hello,
We are still running an investigation on our end, we will update you when we have found the cause.
Status changed to Awaiting User Response Railway • 5 months ago
5 months ago
When I run:
-- WAL/commit and IO context
SELECT checkpoints_timed, checkpoints_req,
checkpoint_write_time, checkpoint_sync_time
FROM pg_stat_bgwriter;
checkpoints_timed 3685
checkpoints_req 24
checkpoint_write_time 159684913
checkpoint_sync_time 28993712
Which I'm told (ChatGPT) means the storage subsystem is spending a lot of time writing checkpoint data and syncing it.
Is there a way to improve the storage throughput? Or increase buffer sizes?
Status changed to Awaiting Railway Response Railway • 5 months ago
5 months ago
Hello!
We're acknowledging your issue and attaching a ticket to this thread.
We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.
Please reply to this thread if you have any questions!
5 months ago
Ticketed. We are working on something internally here. Seeing potential for improvements on some graphs and should get back once this is solved
Status changed to Awaiting User Response Railway • 5 months ago
5 months ago
Hey Jake,
Any updates you can share with me?
Status changed to Awaiting Railway Response Railway • 5 months ago
Status changed to Awaiting User Response Railway • 5 months ago
4 months ago
Hey Jake,
How about now?
Status changed to Awaiting Railway Response Railway • 5 months ago
4 months ago
✅ The ticket Performance issues at midnight UTC has been marked as completed.
4 months ago
Hi there,
We've made some changes and your host should have better latency as of today. Please let us know if you run into this issue again.
Status changed to Awaiting User Response Railway • 5 months ago
4 months ago
Thank you! Are you able to give a summary of what you changed?
Status changed to Awaiting Railway Response Railway • 4 months ago
4 months ago
Unfortunately not; it's some internal configuration that we cannot disclose.
Status changed to Awaiting User Response Railway • 4 months ago
4 months ago
✅ The ticket Performance issue with disk operations on metal has been marked as completed.
4 months ago
🛠️ The ticket Performance issue with disk operations on metal has been marked as in progress.
4 months ago
✅ The ticket Performance issue with disk operations on metal has been marked as completed.
4 months ago
🛠️ The ticket Performance issue with disk operations on metal has been marked as in progress.
4 months ago
🛠️ The ticket Performance issue with disk operations on metal has been marked as in progress.
4 months ago
🛠️ The ticket Performance issue with disk operations on metal has been marked as in progress.
4 months ago
✅ The ticket Performance issue with disk operations on metal has been marked as completed.
4 months ago
✅ The ticket Performance issue with disk operations on metal has been marked as completed.
3 months ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 3 months ago