Database Performance Issues - Slow Storage I/O Causing Extended Checkpoint

diffted
PRO

a month ago

Hi Railway Support,

I'm experiencing severe database performance issues with my PostgreSQL service that appear to be related to storage I/O throttling/limitations.

Current Setup:

  • Service: PostgreSQL

  • Plan: 32 vCPU, 32GB RAM

  • Application: Medusa.js e-commerce platform

Issue: Database operations (user registration, cart operations) have become extremely slow over the past 2 weeks. PostgreSQL checkpoint logs show concerning patterns:

Checkpoint Performance:

  • Normal checkpoints taking 85-180+ seconds (should be <30 seconds)

  • One checkpoint took 855 seconds (14+ minutes)

  • Write times consistently 80-800+ seconds

  • Sync times are normal (<0.1 seconds)

Example Log Entry:

2025-07-07 10:30:28.228 UTC [27] LOG: checkpoint complete: wrote 857 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=85.812 s, sync=0.028 s, total=85.867 s; sync files=215, longest=0.013 s, average=0.001 s; distance=6178 kB, estimate=145656 kB

Already Completed:

  • Optimized PostgreSQL configuration (checkpoint_timeout, wal_buffers, etc.)

  • Database maintenance (VACUUM, ANALYZE)

  • Query optimization

Questions:

  1. What are the current storage I/O limits (IOPS, throughput) for my plan?

  2. Are there any I/O throttling alerts or metrics showing my service hitting limits?

  3. What storage upgrade options are available to improve write performance?

  4. Can you see any storage-related performance metrics for my database service?

The issue appears to be infrastructure-level storage performance rather than database configuration, as the extremely long write times indicate I/O bottlenecking.

Awaiting User Response

50 Replies

diffted
PRO

a month ago

Settings:
max_connections: 200

shared_buffers: 8GB

effective_cache_size: 24GB

maintenance_work_mem: 2GB

checkpoint_completion_target: 0.95

wal_buffers: 64MB

default_statistics_target: 100

random_page_cost: 1

effective_io_concurrency: 300

work_mem: 4MB

huge_pages: try

min_wal_size: 1GB

max_wal_size: 4GB


Railway
BOT

a month ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!

Status changed to Awaiting User Response Railway 27 days ago


diffted
PRO

a month ago

The checkpoint logs show the issue is getting worse. I'm seeing some alarming patterns:

Sync Performance Degradation

  • 04:30:52: sync=9.035s (was <0.1s before)

  • 04:06:11: sync=7.462s

  • 03:30:43: sync=8.759s

    Sync times should be under 0.1 seconds. When sync times spike to 6-9 seconds, it indicates the storage system is severely struggling to flush data to disk.

    One Extremely Bad Checkpoint

    • 04:06:11: 416 seconds total (nearly 7 minutes)

    • 408 seconds write time + 7.4 seconds sync time

      2025-07-08 04:45:41.255 UTC [27] LOG: checkpoint complete: wrote 695 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=77.143 s, sync=3.848 s, total=85.668 s; sync files=94, longest=2.621 s, average=0.041 s; distance=4901 kB, estimate=8421 kB; lsn=C/918DEF38, redo lsn=C/91857DA8

2025-07-08 04:30:52.487 UTC [27] LOG: checkpoint complete: wrote 718 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=84.770 s, sync=9.035 s, total=96.636 s; sync files=75, longest=6.358 s, average=0.121 s; distance=4970 kB, estimate=8812 kB; lsn=C/91457D50, redo lsn=C/9138E950

2025-07-08 04:15:40.751 UTC [27] LOG: checkpoint complete: wrote 813 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=80.978 s, sync=0.033 s, total=84.816 s; sync files=107, longest=0.016 s, average=0.001 s; distance=6355 kB, estimate=9239 kB; lsn=C/90F95E30, redo lsn=C/90EB3F90

2025-07-08 04:06:11.835 UTC [27] LOG: checkpoint complete: wrote 3977 buffers (0.4%); 0 WAL file(s) added, 0 removed, 1 recycled; write=408.466 s, sync=7.462 s, total=416.800 s; sync files=213, longest=3.860 s, average=0.036 s; distance=9559 kB, estimate=9559 kB; lsn=C/90C09B78, redo lsn=C/9087F090

2025-07-08 03:45:27.945 UTC [27] LOG: checkpoint complete: wrote 619 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=72.841 s, sync=0.732 s, total=73.918 s; sync files=93, longest=0.294 s, average=0.008 s; distance=4438 kB, estimate=9532 kB; lsn=C/8FF49BE0, redo lsn=C/8FF29128

2025-07-08 03:30:43.928 UTC [27] LOG: checkpoint complete: wrote 806 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=80.758 s, sync=8.759 s, total=90.479 s; sync files=158, longest=6.599 s, average=0.056 s; distance=5590 kB, estimate=10099 kB; lsn=C/8FBE3AB8, redo lsn=C/8FAD3908

2025-07-08 03:16:44.351 UTC [27] LOG: checkpoint complete: wrote 1378 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=138.379 s, sync=0.023 s, total=151.299 s; sync files=117, longest=0.011 s, average=0.001 s; distance=10600 kB, estimate=10600 kB; lsn=C/8F74DF78, redo lsn=C/8F55E038

2025-07-08 03:00:19.955 UTC [27] LOG: checkpoint complete: wrote 663 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=66.284 s, sync=0.071 s, total=67.004 s; sync files=79, longest=0.040 s, average=0.001 s; distance=4839 kB, estimate=8028 kB; lsn=C/8F1D41A0, redo lsn=C/8EB03FF0

2025-07-08 02:46:00.852 UTC [27] LOG: checkpoint complete: wrote 907 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=95.083 s, sync=0.020 s, total=108.015 s; sync files=200, longest=0.007 s, average=0.001 s; distance=6261 kB, estimate=8382 kB; lsn=C/8E72CA38, redo lsn=C/8E64A298

2025-07-08 02:30:26.739 UTC [27] LOG: checkpoint complete: wrote 615 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=69.092 s, sync=4.309 s, total=73.801 s; sync files=49, longest=2.570 s, average=0.088 s; distance=4538 kB, estimate=8617 kB; lsn=C/8E0E1B80, redo lsn=C/8E02CC98


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

Hi Ted.

I've gone ahead and applied a config that I hope will resolve this. Would you mind letting me know how it performs over the next 5 minutes (and then circling back)?

Apologies you're running into this!


Status changed to Awaiting User Response Railway 26 days ago


diffted
PRO

a month ago

Hi, thank you for responding.

2025-07-08 05:17:59.168 UTC [27] LOG: checkpoint complete: wrote 2234 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; write=223.564 s, sync=0.008 s, total=223.576 s; sync files=226, longest=0.001 s, average=0.001 s; distance=14723 kB, estimate=14723 kB; lsn=C/92C8E7D8, redo lsn=C/92B8C7D0

2025-07-08 05:19:02.419 UTC [25344] LOG: duration: 2444.116 ms statement: SELECT "id", "workflow_id", "transaction_id", "execution", "context", "state", "created_at", "updated_at", "deleted_at", "retention_time", "run_id" FROM "public"."workflow_execution" ORDER BY "created_at" DESC NULLS FIRST, "workflow_id" DESC, "transaction_id" DESC, "run_id" DESC LIMIT 1000

Good news: Sync time back to normal (0.008s) Bad news: Write time still extremely high (223 seconds / 3.7 minutes)


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

Alright that's good. Would you mind checking for me, one more time, over the next 5 minutes.


Status changed to Awaiting User Response Railway 26 days ago


diffted
PRO

a month ago

2025-07-08 05:30:41.914 UTC [27] LOG: checkpoint complete: wrote 861 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=86.640 s, sync=0.004 s, total=86.647 s; sync files=185, longest=0.001 s, average=0.001 s; distance=5961 kB, estimate=13847 kB; lsn=C/932515B8, redo lsn=C/9315EDB8

This checkpoint shows the consistent pattern continuing - 86+ seconds is still far too slow for normal operations.

Application severely impacted - user registration, cart operations failing.


Status changed to Awaiting Railway Response Railway 26 days ago


diffted
PRO

a month ago

2025-07-08 05:45:40.227 UTC [27] LOG: checkpoint complete: wrote 841 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=84.205 s, sync=0.006 s, total=84.214 s; sync files=207, longest=0.001 s, average=0.001 s; distance=5580 kB, estimate=13020 kB; lsn=C/93790B18, redo lsn=C/936D1EC0


diffted
PRO

a month ago

2025-07-08 06:05:59.364 UTC [27] LOG: checkpoint complete: wrote 4024 buffers (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=403.031 s, sync=0.005 s, total=403.038 s; sync files=93, longest=0.001 s, average=0.001 s; distance=8670 kB, estimate=12585 kB; lsn=C/94294F10, redo lsn=C/93F499E8


a month ago

We're looking into this this week as it's deeply important. However, this may take a bit of time to get to the bottom of it. Apologies I don't have better news here.


Status changed to Awaiting User Response Railway 26 days ago


diffted
PRO

a month ago

This is a production database, with live users trying to use the project. Not sure what to do, we can't afford a week of lost registrations and orders.


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

Gotchya. I can temporarily move you back to our old cloud machines to see if that resolves it


Status changed to Awaiting User Response Railway 26 days ago


diffted
PRO

a month ago

We are using static IP's, and don't have fast option to change it on payment providers. And this move requires change of static IP's, yes?


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

Yes it would. I can attempt to move you regions, and then back, which would land you on a different host (this one seems to be a bit more temperamental)


Status changed to Awaiting User Response Railway 26 days ago


diffted
PRO

a month ago

I can try to get IP's sorted after the move, what down time we are looking at?


Status changed to Awaiting Railway Response Railway 26 days ago


diffted

I can try to get IP's sorted after the move, what down time we are looking at?

a month ago

Let’s attempt the cross regional reselection first, which will prevent the need for altering the IP

I’m quite confident that we can resolve it by doing the above


Status changed to Awaiting User Response Railway 26 days ago


a month ago

As for downtime, it should be about 30-60s


diffted
PRO

a month ago

ok, we can proceed, anything from our side needed?


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

I’ll simply need you to confirm the integrity of the database once moved to the new region

I’ll move it to US East first, then back to Amsterdam after confirming


Status changed to Awaiting User Response Railway 26 days ago


a month ago

Copy process in progress. It should complete in less than 2 minutes from now


a month ago

Completed AMS -> US east

Please confirm everything looks good, and I’m happy to move it back (to another instance)


diffted
PRO

a month ago

All good


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

Confirmed. Moving from US East to Europe on a new host


Status changed to Awaiting User Response Railway 26 days ago


a month ago

Worth noting the speeds in the US East. We will look for this once this lands in Europe

Attachments


jake

Worth noting the speeds in the US East. We will look for this once this lands in Europe

diffted
PRO

a month ago

wow


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

We appear to be nominal in Europe again. Please confirm data and speeds

Attachments


Status changed to Awaiting User Response Railway 26 days ago


diffted
PRO

a month ago

I will keep tracing throughout the day, update in a few hours. Thanks for this! Really.


Status changed to Awaiting Railway Response Railway 26 days ago


a month ago

You're very welcome. I'm sorry we caused your business undue harm. I've applied a $100 credit to your account for this inconvenience.


Status changed to Awaiting User Response Railway 26 days ago


a month ago

(Please reopen this if you once again run into any issues)


Status changed to Solved jake 26 days ago


diffted
PRO

a month ago

Although the first checkpoint after the migration looked promising, from the second one onward we have been seeing the same problem as before:

2025-07-08 07:46:08.363 UTC [27] LOG: checkpoint complete: wrote 977 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=97.873 s, sync=0.019 s, total=97.930 s; sync files=186, longest=0.007 s, average=0.001 s; distance=6117 kB, estimate=6117 kB; lsn=C/9744A1A0, redo lsn=C/97341958

2025-07-08 08:01:06.419 UTC [27] LOG: checkpoint complete: wrote 958 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=95.888 s, sync=0.030 s, total=95.956 s; sync files=212, longest=0.015 s, average=0.001 s; distance=6066 kB, estimate=6112 kB; lsn=C/97A2A438, redo lsn=C/9792E218

2025-07-08 08:15:54.591 UTC [27] LOG: checkpoint complete: wrote 839 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=84.031 s, sync=0.018 s, total=84.073 s; sync files=205, longest=0.005 s, average=0.001 s; distance=5664 kB, estimate=6067 kB; lsn=C/97FBA438, redo lsn=C/97EB6590

2025-07-08 08:30:39.841 UTC [27] LOG: checkpoint complete: wrote 690 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=69.092 s, sync=0.025 s, total=69.150 s; sync files=73, longest=0.013 s, average=0.001 s; distance=4692 kB, estimate=5930 kB; lsn=C/984109A8, redo lsn=C/9834B8D0

2025-07-08 08:45:56.431 UTC [27] LOG: checkpoint complete: wrote 853 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=85.431 s, sync=0.025 s, total=85.492 s; sync files=217, longest=0.007 s, average=0.001 s; distance=5639 kB, estimate=5900 kB; lsn=C/989AE8D0, redo lsn=C/988CD620


Status changed to Awaiting Railway Response Railway 26 days ago


Status changed to Solved diffted 26 days ago


diffted
PRO

a month ago

Anything else we can do?


Status changed to Awaiting Railway Response Railway 26 days ago


diffted
PRO

a month ago

2025-07-08 09:00:54.299 UTC [27] LOG: checkpoint complete: wrote 837 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=83.726 s, sync=0.031 s, total=83.776 s; sync files=214, longest=0.011 s, average=0.001 s; distance=5568 kB, estimate=5867 kB; lsn=C/9953A068, redo lsn=C/98E3D6D8

2025-07-08 09:08:06.741 UTC [1992] LOG: duration: 1597.122 ms statement: COMMIT;

2025-07-08 09:17:50.683 UTC [27] LOG: checkpoint complete: wrote 1999 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; write=200.199 s, sync=0.032 s, total=200.285 s; sync files=244, longest=0.009 s, average=0.001 s; distance=14388 kB, estimate=14388 kB; lsn=C/99F5F9D0, redo lsn=C/99C4A7F8

2025-07-08 09:23:26.513 UTC [2343] LOG: duration: 1976.771 ms statement: COMMIT;
2025-07-08 09:28:32.296 UTC [2435] LOG: duration: 1208.386 ms statement: COMMIT;

2025-07-08 09:37:03.871 UTC [27] LOG: checkpoint complete: wrote 4524 buffers (0.4%); 0 WAL file(s) added, 0 removed, 1 recycled; write=453.007 s, sync=0.041 s, total=453.091 s; sync files=217, longest=0.014 s, average=0.001 s; distance=12982 kB, estimate=14247 kB; lsn=C/9ADB5970, redo lsn=C/9A8F8160

2025-07-08 09:46:23.821 UTC [27] LOG: checkpoint complete: wrote 1127 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=112.800 s, sync=0.020 s, total=112.850 s; sync files=210, longest=0.008 s, average=0.001 s; distance=7859 kB, estimate=13608 kB; lsn=C/9B1CA980, redo lsn=C/9B0A5058

2025-07-08 09:54:08.584 UTC [3005] LOG: duration: 2469.892 ms statement: COMMIT;

2025-07-08 09:54:08.611 UTC [3026] LOG: duration: 2454.961 ms statement: COMMIT;

2025-07-08 09:54:08.611 UTC [3011] LOG: duration: 1192.506 ms statement: COMMIT;

2025-07-08 10:01:16.156 UTC [27] LOG: checkpoint complete: wrote 1051 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=105.191 s, sync=0.018 s, total=105.236 s; sync files=208, longest=0.006 s, average=0.001 s; distance=7020 kB, estimate=12950 kB; lsn=C/9B859BA0, redo lsn=C/9B780410

2025-07-08 10:15:43.133 UTC [27] LOG: checkpoint complete: wrote 727 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=72.811 s, sync=0.018 s, total=72.878 s; sync files=100, longest=0.007 s, average=0.001 s; distance=5104 kB, estimate=12165 kB; lsn=C/9BEA71F8, redo lsn=C/9BC7C5F0

2025-07-08 10:31:22.926 UTC [27] LOG: checkpoint complete: wrote 1125 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=112.616 s, sync=0.022 s, total=112.696 s; sync files=216, longest=0.010 s, average=0.001 s; distance=7874 kB, estimate=11736 kB; lsn=C/9C5936A8, redo lsn=C/9C42CEA8

2025-07-08 10:34:56.398 UTC [3914] LOG: duration: 3045.045 ms statement: COMMIT;

2025-07-08 10:34:56.480 UTC [3935] LOG: duration: 2960.344 ms statement: COMMIT;


Railway
BOT

a month ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!


a month ago

We're gonna have another look. Sorry about that


Status changed to Awaiting User Response Railway 26 days ago


diffted
PRO

25 days ago

2025-07-08 15:31:15.481 UTC [27] LOG: checkpoint complete: wrote 988 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=98.967 s, sync=0.045 s, total=99.095 s; sync files=220, longest=0.014 s, average=0.001 s; distance=6625 kB, estimate=13776 kB; lsn=C/A8C26BC8, redo lsn=C/A8B08EE0

2025-07-08 15:41:24.555 UTC [10975] LOG: duration: 1772.422 ms statement: COMMIT;

2025-07-08 15:51:39.357 UTC [11150] LOG: duration: 2499.706 ms statement: COMMIT;

2025-07-08 15:51:39.409 UTC [11176] LOG: duration: 2545.245 ms statement: COMMIT;

2025-07-08 15:53:21.289 UTC [27] LOG: checkpoint complete: wrote 5240 buffers (0.5%); 0 WAL file(s) added, 0 removed, 1 recycled; write=524.654 s, sync=0.021 s, total=524.710 s; sync files=226, longest=0.008 s, average=0.001 s; distance=15058 kB, estimate=15058 kB; lsn=C/A9D20A80, redo lsn=C/A99BDAA8

2025-07-08 16:01:28.211 UTC [27] LOG: checkpoint complete: wrote 1116 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=111.779 s, sync=0.025 s, total=111.828 s; sync files=232, longest=0.007 s, average=0.001 s; distance=7881 kB, estimate=14341 kB; lsn=C/AA29F418, redo lsn=C/AA170078

2025-07-08 16:12:08.271 UTC [11643] LOG: duration: 1027.636 ms statement: COMMIT;

2025-07-08 16:16:47.144 UTC [27] LOG: checkpoint complete: wrote 1305 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=130.770 s, sync=0.024 s, total=130.835 s; sync files=223, longest=0.009 s, average=0.001 s; distance=8958 kB, estimate=13802 kB; lsn=C/AAC75A48, redo lsn=C/AAA2FB18

2025-07-08 16:31:23.063 UTC [27] LOG: checkpoint complete: wrote 1066 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=106.749 s, sync=0.029 s, total=106.821 s; sync files=207, longest=0.010 s, average=0.001 s; distance=7275 kB, estimate=13150 kB; lsn=C/AB2A3070, redo lsn=C/AB14A820

2025-07-08 16:46:26.560 UTC [27] LOG: checkpoint complete: wrote 1102 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=110.379 s, sync=0.022 s, total=110.436 s; sync files=207, longest=0.008 s, average=0.001 s; distance=7611 kB, estimate=12596 kB; lsn=C/ABACABD0, redo lsn=C/AB8B9730

2025-07-08 16:53:07.216 UTC [12526] LOG: duration: 1242.326 ms statement: COMMIT;

2025-07-08 16:53:08.620 UTC [12526] LOG: duration: 1365.486 ms statement: COMMIT;

2025-07-08 17:01:45.981 UTC [27] LOG: checkpoint complete: wrote 1291 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=129.228 s, sync=0.039 s, total=129.323 s; sync files=221, longest=0.019 s, average=0.001 s; distance=9157 kB, estimate=12252 kB; lsn=C/ACA16748, redo lsn=C/AC1AADF0

2025-07-08 17:03:21.606 UTC [12783] LOG: duration: 2352.834 ms statement: COMMIT;

2025-07-08 17:03:21.643 UTC [12784] LOG: duration: 2385.618 ms statement: COMMIT;

2025-07-08 17:17:41.365 UTC [27] LOG: checkpoint complete: wrote 1840 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=184.238 s, sync=0.018 s, total=184.285 s; sync files=242, longest=0.007 s, average=0.001 s; distance=13658 kB, estimate=13658 kB; lsn=C/AD150C58, redo lsn=C/ACF01898

2025-07-08 17:31:21.483 UTC [27] LOG: checkpoint complete: wrote 1038 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=103.961 s, sync=0.027 s, total=104.021 s; sync files=223, longest=0.009 s, average=0.001 s; distance=7265 kB, estimate=13019 kB; lsn=C/ADDFB580, redo lsn=C/AD619FC0

2025-07-08 17:34:09.217 UTC [13490] LOG: duration: 3221.338 ms statement: COMMIT;


Status changed to Awaiting Railway Response Railway 25 days ago


25 days ago

Looking into this one once again. I've been able to modify some config to drop the timing from 500s to 50-60s

I believe this will continue to improve as your snapshots age out. The reason for this, is that I believe the issue is one of write amplification caused by the way we've implemented snapshots.

As such, I believe, if you need the performance ASAP, you should be able to remove 3-4 of the old snapshots and see an improvement (which should hold as new backups are created due to the config I've applied)

Once again, apologies you're running into this and please let me know

  1. How the current state of the application is

  2. If you're going to attempt the backup removal to increase compression (and decrease write amplification)


Status changed to Awaiting User Response Railway 25 days ago


diffted
PRO

25 days ago

We’ve removed four snapshots and retained two. As user activity is currently low, we’ll have better insight into the impact of this change in the coming hours. I’ll continue to monitor the application and follow up with an update here.

Thank you.


Status changed to Awaiting Railway Response Railway 25 days ago


25 days ago

Gotchya. Let's see where it lands for now. I'll be monitoring it but please let us know as well


Status changed to Awaiting User Response Railway 25 days ago


diffted
PRO

25 days ago

Hi, we're not seeing any noticeable improvement at this stage— the database is still experiencing performance issues.

2025-07-09 05:06:18.360 UTC [27] LOG: checkpoint complete: wrote 1827 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; write=183.078 s, sync=0.052 s, total=183.148 s; sync files=218, longest=0.036 s, average=0.001 s; distance=13781 kB, estimate=13781 kB; lsn=C/C1788800, redo lsn=C/C16872B8

2025-07-09 05:19:48.375 UTC [27] LOG: checkpoint complete: wrote 927 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=92.868 s, sync=0.031 s, total=92.916 s; sync files=212, longest=0.013 s, average=0.001 s; distance=6257 kB, estimate=13029 kB; lsn=C/C1CB0228, redo lsn=C/C1CA3A58

2025-07-09 05:34:17.605 UTC [27] LOG: checkpoint complete: wrote 620 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=62.095 s, sync=0.015 s, total=62.130 s; sync files=88, longest=0.004 s, average=0.001 s; distance=4455 kB, estimate=12171 kB; lsn=C/C20FD7F0, redo lsn=C/C20FD7B8

2025-07-09 05:49:44.538 UTC [27] LOG: checkpoint complete: wrote 865 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=86.553 s, sync=0.931 s, total=88.836 s; sync files=207, longest=0.901 s, average=0.005 s; distance=5887 kB, estimate=11543 kB; lsn=C/C26EDC20, redo lsn=C/C26BD458

2025-07-09 05:53:01.910 UTC [24515] LOG: duration: 1201.381 ms statement: COMMIT;

2025-07-09 06:04:50.755 UTC [27] LOG: checkpoint complete: wrote 949 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=95.070 s, sync=0.025 s, total=95.117 s; sync files=216, longest=0.006 s, average=0.001 s; distance=6657 kB, estimate=11054 kB; lsn=C/C2ECAF50, redo lsn=C/C2D3D8E8

2025-07-09 06:20:01.193 UTC [27] LOG: checkpoint complete: wrote 1052 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=105.280 s, sync=0.017 s, total=105.339 s; sync files=220, longest=0.006 s, average=0.001 s; distance=7166 kB, estimate=10665 kB; lsn=C/C34BBBF8, redo lsn=C/C343D448

2025-07-09 06:34:59.620 UTC [27] LOG: checkpoint complete: wrote 1042 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=104.276 s, sync=0.028 s, total=104.327 s; sync files=215, longest=0.006 s, average=0.001 s; distance=7322 kB, estimate=10331 kB; lsn=C/C3DDB1E0, redo lsn=C/C3B63F98


Status changed to Awaiting Railway Response Railway 25 days ago


diffted
PRO

25 days ago

2025-07-09 09:29:28.222 UTC [27] LOG: checkpoint complete: wrote 6689 buffers (0.6%); 0 WAL file(s) added, 0 removed, 1 recycled; write=670.090 s, sync=0.019 s, total=670.126 s; sync files=237, longest=0.007 s, average=0.001 s; distance=16271 kB, estimate=16271 kB; lsn=C/CAEAF4B8, redo lsn=C/CA9A3780

2025-07-09 09:34:50.906 UTC [27] LOG: checkpoint complete: wrote 924 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=92.551 s, sync=0.015 s, total=92.587 s; sync files=99, longest=0.005 s, average=0.001 s; distance=7439 kB, estimate=15388 kB; lsn=C/CB15A6D0, redo lsn=C/CB0E7560

2025-07-09 09:50:00.766 UTC [27] LOG: checkpoint complete: wrote 1015 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=101.673 s, sync=0.067 s, total=101.762 s; sync files=222, longest=0.053 s, average=0.001 s; distance=7084 kB, estimate=14558 kB; lsn=C/CB7E9230, redo lsn=C/CB7D25A8

2025-07-09 10:04:54.876 UTC [27] LOG: checkpoint complete: wrote 949 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=94.938 s, sync=0.069 s, total=95.023 s; sync files=218, longest=0.058 s, average=0.001 s; distance=6830 kB, estimate=13785 kB; lsn=C/CC0029B8, redo lsn=C/CBE7E120

2025-07-09 10:20:09.295 UTC [27] LOG: checkpoint complete: wrote 1092 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=109.282 s, sync=0.023 s, total=109.321 s; sync files=189, longest=0.008 s, average=0.001 s; distance=7558 kB, estimate=13162 kB; lsn=C/CC628750, redo lsn=C/CC5DF9D0


diffted
PRO

24 days ago

2025-07-09 17:06:15.660 UTC [27] LOG: checkpoint complete: wrote 1725 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; write=172.649 s, sync=0.029 s, total=172.699 s; sync files=238, longest=0.011 s, average=0.001 s; distance=12621 kB, estimate=12621 kB; lsn=C/DA5BF480, redo lsn=C/DA4DD680

2025-07-09 17:26:52.852 UTC [38296] LOG: duration: 1007.132 ms statement: COMMIT;

2025-07-09 17:29:58.443 UTC [27] LOG: checkpoint complete: wrote 6949 buffers (0.7%); 0 WAL file(s) added, 0 removed, 1 recycled; write=695.653 s, sync=0.016 s, total=695.685 s; sync files=99, longest=0.008 s, average=0.001 s; distance=14218 kB, estimate=14218 kB; lsn=C/DB7AE3A8, redo lsn=C/DB2C0240


24 days ago

We've been able to look into it and have some ideas. This occurs on machines as load increases.

However, they're gonna take a sec to implement

In the interim, we're happy to move you back to the Cloud machines (which are dead cold as we move to shut them down)

Please let us know; it's the same process we went through prior


Status changed to Awaiting User Response Railway 24 days ago


diffted
PRO

24 days ago

Hi, you will be moving database only? same region as we are using now?


Status changed to Awaiting Railway Response Railway 24 days ago


23 days ago

Yup we can start with just that database for now, see if it's alleviated. If not, we can progress to move any connected applications (which should 100% alleviate it)


Status changed to Awaiting User Response Railway 23 days ago


diffted
PRO

23 days ago

Hi, we can't risk moving before weekend, let's see Monday, there is no general update on this issue? Even if we migrate to not metal, later on we will have to migrate back.


Status changed to Awaiting Railway Response Railway 23 days ago


22 days ago

Noted, as for a general update. We know what the shape of the issue looks like, but we need to reproduce it reliably. Since this is not widespread on every single host, it's hard to debug.

We're still working on this however and will give you updates as they come along.


Status changed to Awaiting User Response Railway 22 days ago


diffted
PRO

20 days ago

2025-07-14 05:18:24.052 UTC [27] LOG: checkpoint complete: wrote 12500 buffers (1.2%); 0 WAL file(s) added, 0 removed, 3 recycled; write=854.887 s, sync=0.020 s, total=854.938 s; sync files=230, longest=0.007 s, average=0.001 s; distance=37089 kB, estimate=37089 kB; lsn=D/D1A3CEF8, redo lsn=D/D10E4330

2025-07-14 05:21:14.984 UTC [27] LOG: checkpoint complete: wrote 1258 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=125.849 s, sync=0.023 s, total=125.888 s; sync files=189, longest=0.011 s, average=0.001 s; distance=9594 kB, estimate=34339 kB; lsn=D/D1B88CB0, redo lsn=D/D1A42C40

2025-07-14 05:25:18.251 UTC [38439] LOG: duration: 1903.774 ms statement: COMMIT;

2025-07-14 05:36:10.861 UTC [27] LOG: checkpoint complete: wrote 1205 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=120.706 s, sync=0.047 s, total=120.779 s; sync files=200, longest=0.036 s, average=0.001 s; distance=8546 kB, estimate=31760 kB; lsn=D/D23CF7A0, redo lsn=D/D229B598


Status changed to Awaiting Railway Response Railway 20 days ago


19 days ago

Update Ted, we have applied a number of configuration changes on the hosts that may fix the issue, can you confirm that you are still seeing those high p99 values?


Status changed to Awaiting User Response Railway 19 days ago


diffted
PRO

19 days ago

Hi,

As you can see from the log, it does not look good: write=315.021 s, write=154.935 s, write=285.339 s

Not sure what to do as this has been more then a week now..

025-07-15 02:24:34.053 UTC [27] LOG: checkpoint complete: wrote 3147 buffers (0.3%); 0 WAL file(s) added, 0 removed, 1 recycled; write=315.021 s, sync=0.051 s, total=315.108 s; sync files=221, longest=0.024 s, average=0.001 s; distance=10809 kB, estimate=12142 kB; lsn=E/371F1F8, redo lsn=E/35567A0

2025-07-15 02:35:23.314 UTC [27] LOG: checkpoint complete: wrote 651 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=65.119 s, sync=0.026 s, total=65.164 s; sync files=65, longest=0.009 s, average=0.001 s; distance=4803 kB, estimate=11408 kB; lsn=E/3A075D8, redo lsn=E/3A075A0

2025-07-15 02:50:40.975 UTC [27] LOG: checkpoint complete: wrote 805 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=80.629 s, sync=1.845 s, total=82.562 s; sync files=168, longest=1.796 s, average=0.011 s; distance=5734 kB, estimate=10841 kB; lsn=E/3FA0F58, redo lsn=E/3FA0F20

2025-07-15 03:06:07.701 UTC [60824] LOG: duration: 3536.711 ms statement: COMMIT;

2025-07-15 03:06:54.055 UTC [27] LOG: checkpoint complete: wrote 1546 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=154.935 s, sync=0.023 s, total=154.981 s; sync files=103, longest=0.013 s, average=0.001 s; distance=12035 kB, estimate=12035 kB; lsn=E/4C63BA8, redo lsn=E/4B61BC0

2025-07-15 03:20:17.730 UTC [27] LOG: checkpoint complete: wrote 585 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=58.565 s, sync=0.015 s, total=58.593 s; sync files=38, longest=0.005 s, average=0.001 s; distance=4415 kB, estimate=11273 kB; lsn=E/4FB2A70, redo lsn=E/4FB1AD0

2025-07-15 03:39:05.193 UTC [27] LOG: checkpoint complete: wrote 2850 buffers (0.3%); 0 WAL file(s) added, 0 removed, 1 recycled; write=285.339 s, sync=0.006 s, total=285.369 s; sync files=27, longest=0.005 s, average=0.001 s; distance=9000 kB, estimate=11045 kB; lsn=E/59CFF18, redo lsn=E/587BE70


Status changed to Awaiting Railway Response Railway 19 days ago


18 days ago

I can make it so that you can deploy back onto GCP in the meantime if that will help tide you over.


Status changed to Awaiting User Response Railway 18 days ago


Railway
BOT

5 days ago

🛠️ The ticket Disk Performance Issue on Metal has been marked as todo.