16 days ago
Severity: High (data integrity risk)
Impact: Business ledger (stock_adjustments) intermittently not durable after successful transaction; app had to implement compensating reinsertion.
Summary
We are seeing a transactional durability anomaly when writing to a Railway-hosted PostgreSQL database through Railway proxy hosts (*.proxy.rlwy.net).
In our backend, an INSERT INTO stock_adjustments ... RETURNING succeeds, and the inserted row is visible inside the same transaction (tx_ledger_visible_count=1). Trigger-based write tracing inside the same transaction also confirms insert visibility (tx_trace_visible_count=1).
However, immediately after commit, querying by returned stock_adjustment_id returns no row.
At the same time, writes to another table in the same flow (warehouse_stocks) persist normally.
This causes inconsistent state: stock updated, ledger row missing.
Environment / topology
- App runtime: Deno backend
- DB host used by company DB:
tramway.proxy.rlwy.net:38478(PostgreSQL) - Security DB host:
crossover.proxy.rlwy.net:22357 - DB name:
railway - User:
postgres - Connection mode tested:
- pooled and non-pooled behavior in app (issue persisted in write path before compensating fallback)
- Transaction diagnostics captured from app:
write_node.host == read_node.host == "10.250.11.183/32"- same
backend_pidin write/read session_replication_role = originsearch_path = "$user", public- unqualified/public
stock_adjustmentsresolve to same OID
Repro behavior (application-level)
- Begin tx
- Update/upsert
warehouse_stocks(persists) - Insert into
stock_adjustmentswithRETURNING stock_adjustment_id(e.g.,1378) - Still inside tx:
- row exists by that ID (
tx_ledger_visible_count=1) - trace trigger row exists in
stock_adjustments_write_trace(tx_trace_visible_count=1)
- row exists by that ID (
- Commit
- Outside tx, query
stock_adjustments where stock_adjustment_id = <returned_id>returns 0 rows
Concrete evidence snippets
- Error emitted by app guard:
"IDs faltantes en stock_adjustments: 1378""tx_ledger_visible_count=1""tx_trace_visible_count=1"- tx diagnostics resolve to
public.stock_adjustments, no temp table
- Database checks:
stock_adjustmentsmissing IDs like1373,1374,1377,1378, etc.- sequence advanced (holes present), suggesting insert path executed but final durability missing
- Manual control test (psql direct):
BEGIN; INSERT ...; COMMIT;onstock_adjustmentspersisted correctly (e.g., id1372)- corresponding trace row exists (
TRIGGER_INSERT)
Why this is critical
This is not just a stale-read issue: we validate in the same backend process and connection context, and later the row is absent. It breaks transactional guarantees for a financial/inventory ledger table.
Temporary mitigation implemented on our side
- Added hard post-commit ledger verification.
- Added compensating reinsertion (
ledger_repaired) if post-commit row missing. - This avoids user-facing failure but is a workaround, not a root fix.
Request to Railway
Please investigate proxy/connection-layer behavior for this Postgres service and confirm if there are known issues that can cause:
- committed row disappearance on specific write paths,
- inconsistent commit durability through proxy endpoint,
- tx-level visibility followed by non-durability.
Also advise best-practice endpoint/config for strict transactional durability (direct host vs proxy, pooling mode, pgbouncer mode, etc.) for write-critical workloads.
If helpful, we can provide:
- exact timestamps (UTC),
- returned IDs,
- txids captured by app,
- SQL traces from our trigger tables (
stock_adjustments_write_trace,warehouse_stocks_write_trace).
1 Replies
16 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 16 days ago
Status changed to Awaiting Railway Response Railway • 11 days ago
Status changed to Awaiting User Response Railway • 11 days ago
10 days ago
I would treat this as a critical incident, but I would not assume PostgreSQL acknowledged a commit and then lost one row until a smaller repro proves it.
A row that is visible inside a transaction and absent after commit, while another table from the same logical flow persists, usually means one of these is happening:
- the ledger insert is inside a transaction/savepoint that later rolls back, while the stock update is actually outside that same DB transaction;
- a later trigger/job/code path deletes or moves the ledger row;
- the post-commit read is not querying the same physical table/partition/schema path you think it is;
- pooled/proxy connection reuse is hiding which connection/endpoint is used after commit.
Sequence gaps do not prove durability loss. PostgreSQL sequences are non-transactional, so an insert that later rolls back still leaves ID holes.
I would run this in order:
- Temporarily bypass the Railway public TCP proxy for write-critical paths if the app and DB are in the same Railway project. Use the private/internal Postgres endpoint for the app, then keep the proxy only for external admin access. If the anomaly disappears, you have a strong proxy/pooling signal.
- Create a tiny durability probe table with no triggers, no FKs, no RLS, no partitions, and run the exact same Deno driver/transaction helper through the same connection URL:
create table if not exists durability_probe (
id bigserial primary key,
marker text not null,
created_at timestamptz default now()
);
Then loop:
begin;
insert into durability_probe(marker) values ($1) returning id, txid_current(), pg_backend_pid();
commit;
select *, txid_current(), pg_backend_pid(), pg_is_in_recovery(), inet_server_addr(), inet_server_port()
from durability_probe where id = $returned_id;
If this probe never fails, the issue is almost certainly in the stock_adjustments table path or app transaction structure, not generic Railway/Postgres durability.
- Prove the stock update and ledger insert are in the same transaction by logging the same values for both statements:
txid_current(),pg_backend_pid(),pg_current_wal_lsn(),current_schema(), andto_regclass('public.stock_adjustments')::oid. - Add AFTER DELETE and UPDATE audit triggers on
stock_adjustmentsduring the investigation. Your current insert trace only proves the row existed before transaction end; it does not prove no later statement deleted/updated/moved it. - Inspect table mechanics:
select tgname, tgenabled, pg_get_triggerdef(oid) from pg_trigger where tgrelid = 'public.stock_adjustments'::regclass and not tgisinternal;
select rulename, definition from pg_rules where schemaname = 'public' and tablename = 'stock_adjustments';
select inhparent::regclass, inhrelid::regclass from pg_inherits where inhparent = 'public.stock_adjustments'::regclass or inhrelid = 'public.stock_adjustments'::regclass;
- If your Deno transaction helper supports nested transactions/savepoints, audit every caught exception after the insert. A swallowed error after
INSERT ... RETURNINGcan roll back a savepoint/transaction while the surrounding business flow continues.
The best Railway support packet would be: a failing probe script under 50 lines, connection URL type used (public proxy vs private endpoint, redact credentials), txid/backend_pid/LSN before commit and after read, exact UTC timestamp, and whether the tiny durability_probe table reproduces. Without that, this looks more like app/table transaction behavior than a confirmed proxy durability bug.
Status changed to Awaiting Railway Response Railway • 10 days ago
Status changed to Awaiting User Response Railway • 10 days ago
3 days ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 3 days ago