autonoma-core: 7 consecutive build failures with "rpc error: code = NotFound" — site down (502) for ~1h

Question

Project ID: 74516efb-56c2-4b6c-b336-91ee8558fc75

Service ID: 384bf90c-6bbc-4286-8c1b-545690e9bde4

Service: autonoma-core

Hostname: autonoma-core-production.up.railway.app

Plan: Hobby

Hi — our autonoma-core service has been down (502 Bad Gateway) for

about an hour. Seven consecutive build attempts have all failed with

the same Metal builder error, in both Dockerfile and nixpacks

configurations. Your dashboard's own Diagnose output says "The code

and configuration are correct and are not the cause of this failure."

Identical error on every build:

Build Failed: build daemon returned an error <

failed to receive status:

rpc error: code = NotFound

desc = no such job -

>

Full build log from the most recent attempt (Redeploy of commit

7dd90d2 via dashboard, nixpacks builder, scheduled on

builder-wkkdbn):

fetched snapshot sha256:d37e25aaa8... (111 kB)

unpacking archive

using build driver nixpacks-v1.41.0

setup: python312, libglib2.0-0, libpango-1.0-0, libpangoft2-1.0-0,

libcairo2, libgdk-pixbuf-2.0-0, libharfbuzz0b,

shared-mime-info, fonts-liberation

install: python -m venv --copies /opt/venv && . /opt/venv/bin/activate

&& pip install -r requirements.txt

start: uvicorn server:app --host 0.0.0.0 --port $PORT --loop asyncio

Saved output to: snapshot-target-unpack

Build Failed: build daemon returned an error <

failed to receive status:

rpc error: code = NotFound

desc = no such job ab64a1dd-830e-43d8-939b-67ccc14462a2-20260514143808

>

The snapshot unpacked successfully and the build driver was assigned

correctly, so this is happening below the build-config layer in your

job orchestration. The orchestrator appears to be losing track of

jobs between assignment and first status report — the worker can't

find its own job ID anymore. Affects both Docker and nixpacks builds

for this service, and Redeploys of previously-green commits from

history (which trigger fresh builds) also fail with the same error.

Failed deployments today (most recent first):

- 79ba996 (empty retry, ~30 min ago)

- 1f70585 (empty retry)

- 0f2ace0 (empty retry)

- 3358bb0 (railway.toml: drop startCommand + pin DOCKERFILE)

- 9d6edde (switch nixpacks → Dockerfile)

- Redeploy of 7dd90d2 from history — also failed identically

- Redeploy of 0d18509 from history — also failed identically (build

log above)

status.railway.com currently shows "All systems operational" — this

is not being detected by your monitoring.

Asks:

1. Promote one of the previously-built images (last fully-green

deployment was 18b12d7d-b7db-4e2b-89e6-4896d8d4e93e on commit

7dd90d2) to restore service ASAP, OR

2. Route this service's builds to a different builder pool.

Happy to provide more deployment IDs or logs. Thanks.