My production Postgres DB crashing now!
texmeijin
PROOP

2 months ago

could not accept SSL connection: Success
error is many occuring NOW.

from 16:10(JST), in 7:10(UTC).

And, I and from my web service cannot access to DB and my web service crashing now. also can not access from my db client.

Log Detail:
2025-12-16 07:23:41.268 UTC [34] LOG: could not accept SSL connection: Success
2025-12-16 07:23:41.270 UTC [35] LOG: could not accept SSL connection: Success
2025-12-16 07:23:41.275 UTC [36] LOG: could not accept SSL connection: Success
2025-12-16 07:23:41.281 UTC [37] LOG: could not accept SSL connection: Success
2025-12-16 07:23:42.000 UTC [33] FATAL: the database system is starting up
2025-12-16 07:23:54.084 UTC [32] LOG: syncing data directory (fsync), elapsed time: 12.79 s, current path: ./pg_subtrans/006F
2025-12-16 07:23:57.865 UTC [32] LOG: database system was not properly shut down; automatic recovery in progress
2025-12-16 07:24:03.399 UTC [32] LOG: redo starts at 1E/62C4A5F8
2025-12-16 07:24:03.399 UTC [32] LOG: invalid record length at 1E/62C4A6E0: wanted 24, got 0
2025-12-16 07:24:03.399 UTC [32] LOG: redo done at 1E/62C4A6A8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-12-16 07:24:15.378 UTC [38] FATAL: the database system is not yet accepting connections
2025-12-16 07:24:15.378 UTC [38] DETAIL: Consistent recovery state has not been yet reached.
2025-12-16 07:24:18.778 UTC [30] LOG: checkpoint starting: end-of-recovery immediate wait
2025-12-16 07:24:59.525 UTC [30] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=8.686 s, sync=8.314 s, total=42.800 s; sync files=3, longest=6.472 s, average=2.772 s; distance=0 kB, estimate=0 kB
2025-12-16 07:24:59.655 UTC [5] LOG: database system is ready to accept connections

5 Replies

dev
MODERATOR

2 months ago

this may be related to the ongoing incident in #🚨|incidents


texmeijin
PROOP

2 months ago

umm ok, BUT, our service's staging env is correctly working now. only production is fail. is this incident related to volume size? staging env has low data, but production data has over 5 GB.


dev
MODERATOR

2 months ago

From what I can tell the incident is widespread, it covers deployments, private networking and public networking

It's possible your production environment is having issues because its the one handling traffic while the staging environment isn't handling the traffic meaning there isn't opporotunity for it to fail (possibly from connection timeouts or the like)

of course this is just a hunch


dev
MODERATOR

2 months ago

Team is rolling out a fix - the incident should hopefully be cleared up soon


whitetown
PRO

2 months ago

Still

We are unable to connect to the database via SSH.
The database container is starting up or transitioning. Please wait a moment and try again.


Loading...