Build failing — no space left on device on builder production-builderv3-us-west1-g754

lanardisaac-max

PROOP

a month ago

Hi Railway team,

Builds for my service are failing with a buildkit disk exhaustion error

on builder production-builderv3-us-west1-g754. The error reads:

ResourceExhausted: failed to create lease: write

/var/lib/buildkit/runc-overlayfs/containerdmeta.db: no space left on

device

Service details:

Project: grateful-quietude (f8e53bb5-0927-4e59-8f9d-2a00e41d808b)
Service: ltigroup-sovereign (b2adc938-7b48-492a-9242-de226e58ae53)
Environment: production (9571d347-c0c4-4256-b201-f5e30a69f680)
Most recent failed deployment: 607dcff0-f5f6-4a00-978a-d3c5d52de5e0
Timestamp: 2026-05-20T18:44:58 UTC

Could you clear the buildkit cache on that builder, or route my builds

to a different builder? This is blocking a production deploy that adds

the TURNSTILE_SECRET env var, which our public contact form depends on.

Thanks,

Lanard Isaac

LTI Group LLC

$20 Bounty

10 Replies

Railway

BOT

a month ago

This is a transient issue with a specific builder node running out of disk space. Retriggering the deployment from the deployments tab or pushing a new commit will route your build to a different builder, which should resolve it.

Status changed to Awaiting User Response Railway • about 1 month ago

Railway

lanardisaac-max

PROOP

a month ago

This did not solve the issue.

My issue is not a transient build node disk-space problem.

Actual issue:

Service: ltigroup-sovereign
Environment: production
Deployment completes successfully
Variables are visible in Railway UI
But the running container does not receive specific variables at runtime

Confirmed live runtime evidence from the diagnostic endpoint:

X-Diag-Env-Dt: absent
X-Diag-Env-Pmk: absent

At the same time, other variables in the same service/environment appear to be injected normally, so this is not a general application boot failure.

What has already been verified:

Fresh deployments completed successfully
Runtime diagnostic headers still show these vars as absent
This persists across redeploys
The issue is blocking diagnostic/auth troubleshooting because the app cannot read the required runtime env vars

This appears to be a selective control-plane/runtime environment variable propagation issue, not a build-node disk-space issue.

Please have a human review the service/environment variable injection path for this production service.

Status changed to Awaiting Railway Response Railway • about 1 month ago

Railway

BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway • about 1 month ago

richwardle

PRO

a month ago

to narrow this — what are the actual variable names that aren't propagating (the ones behind DT and Pmk)? a few patterns cause selective drops on the runtime injection path:

names containing shell-reserved chars (-, ., leading digit) get filtered when railway injects via the process env
"sealed" variables need a redeploy after sealing to take effect at runtime — if you sealed them AFTER setting and the variables-tab didn't trigger a fresh deploy, the runtime keeps the unsealed copy and the sealed value reads as absent
reference variables ${{Service.VAR}} that can't resolve at injection time (target service renamed, circular ref) fail silently
env-level value with a service-level explicit "" shadows it to empty

if you can share the variable names + whether they were sealed at any point + scope (env-level or service-level), should pinpoint which path it is.

lanardisaac-max

PROOP

a month ago

DIAG_TOKEN and PLATFORM_MASTER_KEY.

Details:

Service: ltigroup-sovereign

Environment: production

Scope: service-level variables on this service, not env-level reference variables

Names are exactly:

DIAG_TOKEN

PLATFORM_MASTER_KEY

Both are simple uppercase names with underscores only

No leading digit

No hyphen, dot, shell-reserved character, or template/reference syntax

They are not set via ${{Service.VAR}}

I am not intentionally shadowing them with empty env-level values

What has already been tested:

Fresh redeploys completed successfully

Variables are visible in the Railway Variables UI with non-empty values

Runtime diagnostic endpoint still reports:

X-Diag-Env-Dt: absent

X-Diag-Env-Pmk: absent

Other variables in the same service/environment appear to inject normally

I also deleted and re-added DIAG_TOKEN, then redeployed, and the runtime result remained the same.

I do not believe this is a naming-format issue based on your list above. Please have someone inspect why these two service-level variables are not reaching process.env for this service/environment at runtime.

richwardle

PRO

a month ago

ruling out sealing first — per railway's docs sealed variables do inject at runtime, they're just hidden from the UI. so the visible-in-UI + absent-at-runtime shape doesn't match sealing.

fastest diagnostic to split where the gap is — bypasses both your diag endpoint and your app code:

railway shell -s ltigroup-sovereign -e production
env | grep -E '^(DIAG_TOKEN|PLATFORM_MASTER_KEY)='

if both appear in env → the vars ARE reaching the container; the bug is in your app's read path (config library overriding process.env, a late-binding worker process that doesn't inherit env, namespace prefix on the diag endpoint, dotenv loading an empty file that shadows them).

if either is missing from env → it's railway-side at the injection layer; tag @brody with project + service ids + the variable names.

a couple of things worth checking while shelled in:

the service's Dockerfile / start command — anything that does env -i, explicit unset, or starts the app via something like su user -- ... without -p will strip the env after railway has injected
any doppler or external secret-manager integration syncing into railway — a failed push on those will blank the runtime value while the UI keeps showing the last-good value from the previous push

lanardisaac-max

PROOP

a month ago

Understood. I will run your exact shell test to determine whether DIAG_TOKEN and PLATFORM_MASTER_KEY are present inside the running production container.

Service: ltigroup-sovereign

Environment: production

I will report back only with:

the exact shell output for:

railway shell -s ltigroup-sovereign -e production

env | grep -E '^(DIAG_TOKEN|PLATFORM_MASTER_KEY)='

whether each variable is present or missing in the container
whether that confirms a Railway runtime injection issue or an application-side read-path issue

If either variable is missing from env inside the running container, that will directly confirm this is a Railway-side injection problem rather than an app-code issue.

lanardisaac-max

PROOP

a month ago

Understood. I attempted the exact shell diagnostic, but the Railway CLI token available in the runtime is scoped to a different project and cannot access the target LTI project.

What happened:

The target service is: ltigroup-sovereign
The target environment is: production
The available Railway token is tied to a different project and returns Unauthorized / Service not found when trying to reach the LTI project
So the shell diagnostic could not be executed against the target container

Important point:

This does not contradict the prior live runtime evidence already captured from the deployed app:

X-Diag-Env-Dt: absent
X-Diag-Env-Pmk: absent

So at this point, the shell test is blocked by project access scope, not by any finding that the variables are present.

Please advise one of the following:

whether there is a Railway dashboard-native shell for this exact service/environment I can use directly in the UI, or
whether you can inspect runtime env injection for service ltigroup-sovereign / environment production from your side based on the existing evidence.

The issue remains:

specific service-level variables visible in the Railway UI are not appearing in the running production process for this service/environment.

lanardisaac-max

PROOP

a month ago

Hello Railway Support,

I need escalation to a human/platform engineer for a confirmed runtime environment variable propagation issue affecting one specific service/environment.

Affected resources:

Workspace: lanardisaac-max's Projects

Project: grateful-quietude (f8e53bb5-0927-4e59-8f9d-2a00e41d808b)

Service: ltigroup-sovereign (b2adc938-7b48-492a-9242-de226e58ae53)

Environment: production (9571d347-c0c4-4256-b201-f5e30a69f680)

Affected variables:

DIAG_TOKEN

PLATFORM_MASTER_KEY

Issue summary:

These two variables are present in Railway’s stored configuration, but they are not present in the running production container’s process.env even after a successful fresh redeploy.

Confirmed evidence:

Stored config contains both variables

Using Railway CLI against the correct project/service/environment, the stored config resolves both variables with non-empty values:

DIAG_TOKEN=PRESENT(len=22)

PLATFORM_MASTER_KEY=PRESENT(len=44)

Fresh redeploy completed successfully

A fresh redeploy was triggered for the exact production service and completed successfully:

Deployment ID: f79182a8-f623-44a8-9012-35a7770fdc11

Status: SUCCESS

Running container still does not have the variables at runtime

The Node/Express app exposes a diagnostic endpoint that captures process.env.DIAG_TOKEN and process.env.PLATFORM_MASTER_KEY at module-load using the standard pattern:

(process.env.DIAG_TOKEN || "").trim()

(process.env.PLATFORM_MASTER_KEY || "").trim()

That endpoint emits headers reflecting runtime presence/absence. After the successful redeploy, repeated live probes still return:

X-Diag-Env-Dt: absent

X-Diag-Env-Pmk: absent

Example live response:

HTTP/1.1 404 Not Found

X-Diag-Env-Dt: absent

X-Diag-Env-Pmk: absent

This was rechecked repeatedly after the new deployment was confirmed live, and the result did not change.

What is already ruled out:

Not an application code issue for reading env vars

Not a stale pre-redeploy container

Not a CLI targeting issue

Not a general “all env vars missing” problem

Safer characterization:

Other variables on the same service appear to be available at runtime based on observed application behavior, but these two specific variables remain absent in the running container.

Requested investigation:

Please investigate why DIAG_TOKEN and PLATFORM_MASTER_KEY are not being injected into the runtime container for ltigroup-sovereign in production, despite:

being present in stored config,

having non-empty values,

and a successful fresh redeploy.

At this point, further application-code changes are not justified. The blocker is runtime propagation of these two specific variables.

If needed, I can provide:

the exact CLI commands used,

the deployment ID,

and the live diagnostic header outputs again.

Thank you,

Lanard Isaac

LTI Group LLC

richwardle

PRO

a month ago

the CLI variables check confirms the stored config has them, but it doesn't confirm runtime injection — that's still your diag endpoint's word against the stored config, and the diag endpoint is itself code that could be reading from somewhere other than process.env. before the escalation lands, one more test that bypasses both the shell-access block AND your diag endpoint:

add a one-liner to your Dockerfile CMD or entrypoint script:

echo "DIAG_TOKEN_LEN=${#DIAG_TOKEN} PMK_LEN=${#PLATFORM_MASTER_KEY}" >&2
exec <your normal start command>

(node equivalent before your main app import: console.error('DIAG_TOKEN_LEN', (process.env.DIAG_TOKEN||'').length))

redeploy → deployment logs will print both lengths. three possible outcomes:

both nonzero → vars ARE reaching the container; your diag endpoint or its import chain is reading from somewhere other than process.env (config library override, dotenv loading an empty file, late-binding worker process that doesn't inherit env)
both zero → confirmed railway-side at injection; this log line is the concrete evidence to attach to the support thread
one of each → that's the smoking gun for a per-name filter; worth flagging the pattern

also one cheap dashboard check: open each of the two variables in the Variables panel and look for duplicates at workspace or environment scope. railway resolves service > env > workspace, so an empty service-level entry silently nulls out a non-empty inherited value. easy to miss because both look "set" in the UI.

lanardisaac-max

PROOP

a month ago

Thanks. This is the first helpful narrowing step.

Understood on the distinction:

railway run proves stored config
it does not prove what the already-running container received at startup

Before making any code or Dockerfile changes, I am going to check for duplicate definitions of:

DIAG_TOKEN
PLATFORM_MASTER_KEY

across:

service scope
environment scope
workspace/shared scope

especially for any blank higher-priority value shadowing a non-empty lower-priority value.

If no duplicate/blank override exists, then I will consider the temporary startup-length log you suggested as the next controlled diagnostic step.

Welcome!