NestJS deployment error without build logs
citizencage
HOBBYOP

2 months ago

Build image fails immediately, but does not generate the build logs for me to review and debug.

The only error message I see during build:

process "pnpm install --frozen-lockfile --prefer-offline" did not complete successfully: exit code: 137: context canceled: context canceled

Attempts to resolve build error:

Changes made:

  1. .npmrc - Skips optional dependencies

  2. package.json - Replaces heavy native modules with empty stubs

  3. .node-version - Pins Node 20.11.0 for prebuilt binaries

"pnpm": {

"overrides": {

"mongodb-client-encryption": "npm:empty-npm-package@1.0.0",

"kerberos": "npm:empty-npm-package@1.0.0",

"@mongodb-js/zstd": "npm:empty-npm-package@1.0.0",

"@aws-sdk/credential-providers": "npm:empty-npm-package@1.0.0",

"snappy": "npm:empty-npm-package@1.0.0"

}

Solved$10 Bounty

3 Replies

Railway
BOT

2 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


citizencage
HOBBYOP

2 months ago

specific issues and mitigation steps below:

# Railway Build Issues - Summary for Collaboration

## Original Problem

Railway deployment fails during pnpm install --frozen-lockfile --prefer-offline with exit code 137 (OOM - Out of Memory killed).

- Error message: process "pnpm install --frozen-lockfile --prefer-offline" did not complete successfully: exit code: 137: context canceled

- The build was working prior to commit 6e83128 (sg distance buckets)

- Local builds work fine with pnpm run build

- Railway Hobby Plan: 8 GB RAM, 8 vCPU (runtime), but build memory is limited separately

## Root Cause Analysis

Exit code 137 indicates the Linux kernel's OOM killer terminated the process. This happens during native module compilation (kerberos, mongodb-client-encryption, snappy, etc.) which are optional dependencies of mongoose/mongodb.

The commit 6e83128 only added a new npm script and ~2200 lines of TypeScript code - no new dependencies. The OOM is likely due to:

1. Railway infrastructure changes (reduced build memory)

2. pnpm lockfile regeneration pulling different versions

3. Build cache invalidation forcing full recompilation

## Attempted Solutions

### 1. pnpm Overrides (Removed - caused issues)

Added overrides to replace heavy native modules with empty stubs:

```json

"pnpm": {

"overrides": {

"mongodb-client-encryption": "npm:empty-npm-package@1.0.0",

"kerberos": "npm:empty-npm-package@1.0.0",

"@mongodb-js/zstd": "npm:empty-npm-package@1.0.0",

"@aws-sdk/credential-providers": "npm:empty-npm-package@1.0.0",

"snappy": "npm:empty-npm-package@1.0.0"

}

}

```

Result: Caused ajv module resolution errors in Railway container.

### 2. .npmrc Configuration (Current)

```

optional=false

shamefully-hoist=true

```

- optional=false: Skip optional native dependencies

- shamefully-hoist=true: Flatten node_modules to fix module resolution

### 3. .node-version

Pinned to 20.11.0 to use prebuilt binaries instead of compiling from source.

### 4. railway.toml

```toml

[build]

buildCommand = "pnpm install --frozen-lockfile && pnpm run build"

[build.env]

NODE_OPTIONS = "--max-old-space-size=4096"

```

Result: NODE_OPTIONS doesn't appear to affect Railway's build memory limit.

### 5. Dockerfile (Current)

Multi-stage build to reduce image size:

```dockerfile

FROM node:20.11.0-alpine AS builder

WORKDIR /app

RUN corepack enable && corepack prepare pnpm@latest --activate

COPY package.json pnpm-lock.yaml .npmrc ./

RUN pnpm install --frozen-lockfile

COPY . .

RUN pnpm run build

FROM node:20.11.0-alpine AS production

WORKDIR /app

RUN corepack enable && corepack prepare pnpm@latest --activate

COPY package.json pnpm-lock.yaml .npmrc ./

RUN pnpm install --frozen-lockfile --prod

COPY --from=builder /app/dist ./dist

EXPOSE 3000

CMD ["node", "dist/src/main.js"]

```

### 6. Removed mongodb-mcp-server

This unused dependency pulled in 190 packages including heavy native modules.

Result: Reduced dependency count significantly.

## Current State

After all changes:

- Local build works: white_check_mark emoji

- Railway build: Still experiencing issues (ajv module resolution errors)

## Error After Dockerfile Approach

```

Error: Cannot find module 'ajv/dist/compile/codegen'

Require stack:

- ajv-formats/dist/limit.js

- @angular-devkit/core (used by @nestjs/cli)

```

This is a pnpm hoisting issue where ajv-formats can't find ajv due to pnpm's strict module isolation. Added shamefully-hoist=true to fix.

## Current Configuration Files

### package.json (relevant parts)

- NestJS 11.x

- Mongoose 8.x

- No mongodb-mcp-server (removed)

- No pnpm overrides (removed)

### Dependencies that may cause native compilation

- bcrypt (has native bindings)

- mongoose → mongodb (optional native deps)

## Questions to Explore

1. Is there a way to increase Railway's build memory without upgrading to Pro ($20/mo)?

2. Would switching from pnpm to npm for the build reduce memory usage?

3. Could we pre-build the dist folder locally and push it (skip build on Railway)?

4. Are there alternative hosting platforms with more generous build resources?

5. Could we use a GitHub Action to build a Docker image and push to Railway's container registry?

## Relevant Links

- Railway Hobby Plan limits: https://railway.app/pricing

- pnpm shamefully-hoist: https://pnpm.io/npmrc#shamefully-hoist

- Node.js OOM in Docker: https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes

## Timeline

1. Commit 6e83128 - sg distance buckets (no dep changes)

2. Railway deploy fails with exit code 137

3. Added pnpm overrides → caused ajv errors

4. Added Dockerfile → still ajv errors

5. Removed mongodb-mcp-server (-190 packages)

6. Added shamefully-hoist=true → local build works

7. Awaiting next Railway deploy test


citizencage
HOBBYOP

2 months ago

Figured it out - solution below:

Railway OOM Build Failure - Resolution Summary

Problem

Railway deployment failed with exit code 137 (OOM killed) during pnpm install --frozen-lockfile.

Error message:

process "pnpm install --frozen-lockfile --prefer-offline" did not complete successfully: exit code: 137: context canceled

Root Cause

Not a Railway infrastructure issue. The failure was caused by a regenerated pnpm-lock.yaml that resolved to newer package versions with heavier dependencies.

Key Finding

When debugging, we regenerated the lockfile multiple times (deleted pnpm-lock.yaml and ran pnpm install). This pulled newer versions of packages like:

  • @nestjs/jwt 11.0.0 → 11.0.2

  • @nestjs/common 11.1.3 → 11.1.11

  • mongoose 8.16.1 → 8.21.0

  • Many transitive dependency updates

These newer versions, combined with their transitive dependencies, increased memory usage during installation beyond Railway's build memory limit.

Debugging Process

  1. Initial hypothesis: Railway build memory too low

  2. Attempted fixes (did not resolve):

    • Added .npmrc with optional=false, shamefully-hoist=true, concurrency throttling

    • Added pnpm overrides to stub out native modules

    • Created multi-stage Dockerfile with pnpm fetch/-offline

    • Removed unused mongodb-mcp-server dependency

    • Added ajv as direct dependency to fix hoisting issues

  3. Breakthrough: Rolled back to last known working commit (f248685) and deployed successfully on Railway.

  4. Confirmation: Cherry-picked only the code changes onto the working base (keeping original lockfile) - build succeeded.

Solution

Preserve the original pnpm-lock.yaml rather than regenerating it.

When the lockfile was regenerated, it resolved to newer package versions that:

  • Had larger dependency trees

  • Required more memory during installation

  • Exceeded Railway's Hobby plan build memory limit

Lessons Learned

  1. Never delete pnpm-lock.yaml to "fix" build issues - this often makes things worse by pulling newer, potentially heavier dependencies.

  2. Use git diff to check lockfile changes before deploying:

    git diff HEAD~1 -- pnpm-lock.yaml | wc -l
    
    
  3. For OOM during pnpm install, first verify the lockfile hasn't changed before blaming infrastructure.

  4. Rollback testing is essential - deploy the last known working commit to isolate whether it's code vs. environment.

Environment

  • Platform: Railway Hobby Plan (8 GB RAM runtime, limited build memory)

  • Package Manager: pnpm 10.x

  • Node.js: 20.x

  • Framework: NestJS 11.x with MongoDB/Mongoose

Files Affected

No code changes required. The fix was reverting to the original pnpm-lock.yaml that was present in the last successful deployment.


Status changed to Solved brody about 2 months ago


Loading...