PITR restore drill fails: cannot reach source pgBackRest catalog
dti13
PROOP

3 days ago

I am trying to verify a non-destructive Point-in-Time Recovery restore for a production Railway Postgres service. This is not an outage, but it is important for recovery readiness.

Environment: production

Source service: Postgres

Volume: production Postgres volume

Postgres image: ghcr.io/railwayapp-templates/postgres-ssl:17.7

Goal:

Create a separate restored Postgres service using PITR, without touching the live production Postgres service, without changing DATABASE_URL, and without restoring over the existing production volume.

We attempted the Railway GraphQL mutation volumeInstancePITRRestore twice:

  1. Target timestamp: 2026-06-20T11:36:42.649Z

    Proposed new service: issue-143-pitr-drill-20260620-1136

    Trace ID: 2263536005594059044

  2. Target timestamp: 2026-06-20T11:12:47.188Z

    Proposed new service: issue-143-pitr-drill-20260620-1112

    Trace ID: 5861890342496370075

Both attempts returned:

"Couldn't reach the source service's pgBackRest catalog. This is usually transient (network or storage hiccup) — try again in a moment. If it persists, check that the source service is healthy."

Afterward, no restore drill service was created, and production stayed healthy.

Can someone advise:

  • whether this means PITR is not correctly enabled/usable for this Postgres service;
  • why Railway cannot reach the source service's pgBackRest catalog;
  • whether WAL/archive/catalog storage may need attention;
  • what I should do to make a non-destructive PITR restore to a separate service work;
  • whether there is a safe dashboard-supported way to run this restore drill without touching the live source service?

Please note: I do not want to restore over the existing production volume or change the live service connection.

Awaiting User Response

1 Replies

Status changed to Awaiting Railway Response Railway 3 days ago


I dug into your Postgres service (the drill's source). The reason volumeInstancePITRRestore returns "couldn't reach the source service's pgBackRest catalog" is that the source isn't actually running pgBackRest / WAL archiving, there's no stanza or catalog for a restore to read from. Its logs over the last day show only normal checkpoints, with no archive-push / WAL-archiving activity at all.

PITR has to be enabled on the service first: that's what provisions pgBackRest, creates the stanza, takes a base backup, and begins continuously archiving WAL. Until that's running and a base backup has completed, a PITR restore has no catalog to restore from, which is exactly the error you're hitting. So the next step is to confirm PITR / continuous backups is enabled on this Postgres service (Settings → Backups). If it's off, enable it and let the first base backup complete, then the drill will work. If you believe it's already enabled, reply and I'll escalate to find why the stanza/archiving isn't running on your service. Your live production service and data are untouched by any of this. — Angelo


Status changed to Awaiting User Response Railway about 17 hours ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...