Project ID: e65b286a-33ab-4969-a730-4881238cfa7d Environment ID: a751aa2a-f213-4152-a732-19603e6f0c3b Postgres Service ID: d29ea3f2-e0c2-4f5f-8803-36108b20f77c Incident: Production catalog tables (products, categories, brands, articles) became empty on 2026-05-31. The investigation window is 04:03-05:37 UTC, with the most suspicious metric anomaly around 04:05:30-04:07:00 UTC. Metric anomalies: - Disk usage baseline: 0.125919232 GB - 04:05:30 UTC: 0.124905472 GB - 04:06:00 UTC: 0.122212352 GB - 04:06:30 UTC: 0.12416 GB - 04:07:00 UTC onward: stabilized around 0.124329984 GB - CPU and memory spiked during the same 04:05:30-04:06:30 UTC window. - No visible Postgres deployment/redeploy in our dashboard during this window. Database evidence: - products/categories/brands/articles currently have n_tup_ins=0 and n_tup_del=0. - promo_strips has n_tup_ins=6. - strapi_core_store_settings has n_tup_ins=24, n_tup_upd=183, n_tup_del=24. - strapi_database_schema has n_tup_ins=5, n_tup_del=5. - admin_permissions/admin_permissions_role_lnk show large insert/delete churn around 04:05 UTC. - SQL statement logging was disabled at the time, so we cannot verify SQL-level DELETE/TRUNCATE/DROP operations from our side. Request: Please check Railway internal infrastructure/service/volume logs for: 1. Volume reset/recreate/restore/replacement 2. Snapshot restore or rollback 3. Disk/volume events 4. Postgres crash/restart not visible in deployment history 5. Infrastructure maintenance or patching 6. Any database-level reset/drop/truncate event available internally Important: HTTP 200 responses are not proof that data still existed, because Strapi returns 200 for empty collections. We need infrastructure/volume/service audit logs to explain why the catalog tables are now empty while table stats do not show catalog row deletes.

Data Integrity Incident - Production Catalog Data Loss

hailesaigon

FREEOP

25 days ago

Project ID: e65b286a-33ab-4969-a730-4881238cfa7d

Environment ID: a751aa2a-f213-4152-a732-19603e6f0c3b

Postgres Service ID: d29ea3f2-e0c2-4f5f-8803-36108b20f77c

Incident:

Production catalog tables (products, categories, brands, articles) became empty on 2026-05-31. The investigation window is 04:03-05:37 UTC, with the most suspicious metric anomaly around 04:05:30-04:07:00 UTC.

Metric anomalies:

Disk usage baseline: 0.125919232 GB
04:05:30 UTC: 0.124905472 GB
04:06:00 UTC: 0.122212352 GB
04:06:30 UTC: 0.12416 GB
04:07:00 UTC onward: stabilized around 0.124329984 GB
CPU and memory spiked during the same 04:05:30-04:06:30 UTC window.
No visible Postgres deployment/redeploy in our dashboard during this window.

Database evidence:

products/categories/brands/articles currently have n_tup_ins=0 and n_tup_del=0.
promo_strips has n_tup_ins=6.
strapi_core_store_settings has n_tup_ins=24, n_tup_upd=183, n_tup_del=24.
strapi_database_schema has n_tup_ins=5, n_tup_del=5.
admin_permissions/admin_permissions_role_lnk show large insert/delete churn around 04:05 UTC.
SQL statement logging was disabled at the time, so we cannot verify SQL-level DELETE/TRUNCATE/DROP operations from our side.

Request:

Please check Railway internal infrastructure/service/volume logs for:

Volume reset/recreate/restore/replacement
Snapshot restore or rollback
Disk/volume events
Postgres crash/restart not visible in deployment history
Infrastructure maintenance or patching
Any database-level reset/drop/truncate event available internally

Important:

HTTP 200 responses are not proof that data still existed, because Strapi returns 200 for empty collections. We need infrastructure/volume/service audit logs to explain why the catalog tables are now empty while table stats do not show catalog row deletes.

Solved

3 Replies

Railway

BOT

25 days ago

Your Postgres service ran continuously during the 04:00-06:00 UTC window on 2026-05-31 with no crashes, restarts, or redeployments - the last Postgres deployment was on 2026-05-17. The volume is attached, in READY state, and was never reset or replaced. The Postgres logs show normal checkpoint activity throughout, with no crash recovery or "database system was shut down" messages. Notably, the checkpoint at 04:09 UTC wrote significantly more data (497 buffers, 4250 kB, 422 sync files) than surrounding checkpoints (6-22 buffers), indicating substantial SQL write operations were executed against the database during your anomaly window. By 04:24 UTC, Postgres logs show queries failing because several Strapi relations (admin_api_tokens, admin_api_token_permissions, hbk_importer_import_jobs) did not exist, which is consistent with tables being dropped and recreated by an application-level operation. We do not have SQL-level audit logs on our side either, as databases are unmanaged at the DB layer, but the infrastructure evidence shows no Railway-initiated volume or service events, and the data changes originated from SQL operations executed by a connected client.

Status changed to Awaiting User Response Railway • 25 days ago

hailesaigon

FREEOP

25 days ago

Can you please check Railway-side internal logs for our Postgres service during 2026-05-31 04:04-04:09 UTC?

We see Postgres checkpoint evidence of significant write activity:

03:54 UTC checkpoint was nearly idle: wrote 1 buffer, distance 4 kB.
04:04 UTC checkpoint wrote 160 buffers, distance 1367 kB.
04:09 UTC checkpoint wrote 497 buffers, sync files=422, distance 4250 kB.

This aligns with our metrics anomaly around 04:05:30-04:07:00 UTC, where disk usage dropped and CPU/memory spiked.

SQL statement logging was disabled, so we cannot identify the SQL statements from our side. Please check whether there were any internal volume/service events, snapshot restore/rollback, volume reset/recreate/replacement, Postgres restart/crash, infrastructure maintenance, or database-level reset/drop/truncate activity during this window.

Status changed to Awaiting Railway Response Railway • 25 days ago

Railway

BOT

25 days ago

We have re-confirmed: no volume events, snapshot restores, rollbacks, Postgres restarts, crashes, or infrastructure maintenance occurred on your Postgres service during the 04:04-04:09 UTC window on 2026-05-31. The volume remained in READY state throughout, and the service has been running continuously since its last deployment on 2026-05-17. The write activity captured by those checkpoints originated from SQL operations executed by a connected client, not from any Railway infrastructure operation.

Status changed to Awaiting User Response Railway • 25 days ago

Railway

BOT

18 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway • 18 days ago

Welcome!