Database Performance Issues - Slow Storage I/O Causing Extended Checkpoint

diffted

PRO

4 months ago

Hi Railway Support,

I'm experiencing severe database performance issues with my PostgreSQL service that appear to be related to storage I/O throttling/limitations.

Current Setup:

Service: PostgreSQL
Plan: 32 vCPU, 32GB RAM
Application: Medusa.js e-commerce platform

Issue: Database operations (user registration, cart operations) have become extremely slow over the past 2 weeks. PostgreSQL checkpoint logs show concerning patterns:

Checkpoint Performance:

Normal checkpoints taking 85-180+ seconds (should be <30 seconds)
One checkpoint took 855 seconds (14+ minutes)
Write times consistently 80-800+ seconds
Sync times are normal (<0.1 seconds)

Example Log Entry:

2025-07-07 10:30:28.228 UTC [27] LOG: checkpoint complete: wrote 857 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=85.812 s, sync=0.028 s, total=85.867 s; sync files=215, longest=0.013 s, average=0.001 s; distance=6178 kB, estimate=145656 kB

Already Completed:

Optimized PostgreSQL configuration (checkpoint_timeout, wal_buffers, etc.)
Database maintenance (VACUUM, ANALYZE)
Query optimization

Questions:

What are the current storage I/O limits (IOPS, throughput) for my plan?
Are there any I/O throttling alerts or metrics showing my service hitting limits?
What storage upgrade options are available to improve write performance?
Can you see any storage-related performance metrics for my database service?

The issue appears to be infrastructure-level storage performance rather than database configuration, as the extremely long write times indicate I/O bottlenecking.

Solved

56 Replies

diffted

PRO

4 months ago

Settings:
max_connections: 200

shared_buffers: 8GB

effective_cache_size: 24GB

maintenance_work_mem: 2GB

checkpoint_completion_target: 0.95

wal_buffers: 64MB

default_statistics_target: 100

random_page_cost: 1

effective_io_concurrency: 300

work_mem: 4MB

huge_pages: try

min_wal_size: 1GB

max_wal_size: 4GB

Railway

BOT

4 months ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!

Status changed to Awaiting User Response Railway • 4 months ago

diffted

PRO

4 months ago

The checkpoint logs show the issue is getting worse. I'm seeing some alarming patterns:

Sync Performance Degradation

04:30:52: sync=9.035s (was <0.1s before)
04:06:11: sync=7.462s
03:30:43: sync=8.759s
Sync times should be under 0.1 seconds. When sync times spike to 6-9 seconds, it indicates the storage system is severely struggling to flush data to disk.
One Extremely Bad Checkpoint
- 04:06:11: 416 seconds total (nearly 7 minutes)
- 408 seconds write time + 7.4 seconds sync time
  2025-07-08 04:45:41.255 UTC [27] LOG: checkpoint complete: wrote 695 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=77.143 s, sync=3.848 s, total=85.668 s; sync files=94, longest=2.621 s, average=0.041 s; distance=4901 kB, estimate=8421 kB; lsn=C/918DEF38, redo lsn=C/91857DA8

2025-07-08 04:30:52.487 UTC [27] LOG: checkpoint complete: wrote 718 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=84.770 s, sync=9.035 s, total=96.636 s; sync files=75, longest=6.358 s, average=0.121 s; distance=4970 kB, estimate=8812 kB; lsn=C/91457D50, redo lsn=C/9138E950

2025-07-08 04:15:40.751 UTC [27] LOG: checkpoint complete: wrote 813 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=80.978 s, sync=0.033 s, total=84.816 s; sync files=107, longest=0.016 s, average=0.001 s; distance=6355 kB, estimate=9239 kB; lsn=C/90F95E30, redo lsn=C/90EB3F90

2025-07-08 04:06:11.835 UTC [27] LOG: checkpoint complete: wrote 3977 buffers (0.4%); 0 WAL file(s) added, 0 removed, 1 recycled; write=408.466 s, sync=7.462 s, total=416.800 s; sync files=213, longest=3.860 s, average=0.036 s; distance=9559 kB, estimate=9559 kB; lsn=C/90C09B78, redo lsn=C/9087F090

2025-07-08 03:45:27.945 UTC [27] LOG: checkpoint complete: wrote 619 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=72.841 s, sync=0.732 s, total=73.918 s; sync files=93, longest=0.294 s, average=0.008 s; distance=4438 kB, estimate=9532 kB; lsn=C/8FF49BE0, redo lsn=C/8FF29128

2025-07-08 03:30:43.928 UTC [27] LOG: checkpoint complete: wrote 806 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=80.758 s, sync=8.759 s, total=90.479 s; sync files=158, longest=6.599 s, average=0.056 s; distance=5590 kB, estimate=10099 kB; lsn=C/8FBE3AB8, redo lsn=C/8FAD3908

2025-07-08 03:16:44.351 UTC [27] LOG: checkpoint complete: wrote 1378 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=138.379 s, sync=0.023 s, total=151.299 s; sync files=117, longest=0.011 s, average=0.001 s; distance=10600 kB, estimate=10600 kB; lsn=C/8F74DF78, redo lsn=C/8F55E038

2025-07-08 03:00:19.955 UTC [27] LOG: checkpoint complete: wrote 663 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=66.284 s, sync=0.071 s, total=67.004 s; sync files=79, longest=0.040 s, average=0.001 s; distance=4839 kB, estimate=8028 kB; lsn=C/8F1D41A0, redo lsn=C/8EB03FF0

2025-07-08 02:46:00.852 UTC [27] LOG: checkpoint complete: wrote 907 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=95.083 s, sync=0.020 s, total=108.015 s; sync files=200, longest=0.007 s, average=0.001 s; distance=6261 kB, estimate=8382 kB; lsn=C/8E72CA38, redo lsn=C/8E64A298

2025-07-08 02:30:26.739 UTC [27] LOG: checkpoint complete: wrote 615 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=69.092 s, sync=4.309 s, total=73.801 s; sync files=49, longest=2.570 s, average=0.088 s; distance=4538 kB, estimate=8617 kB; lsn=C/8E0E1B80, redo lsn=C/8E02CC98

Status changed to Awaiting Railway Response Railway • 4 months ago