2 months ago
Build image fails immediately, but does not generate the build logs for me to review and debug.
The only error message I see during build:
process "pnpm install --frozen-lockfile --prefer-offline" did not complete successfully: exit code: 137: context canceled: context canceled
Attempts to resolve build error:
Changes made:
.npmrc - Skips optional dependencies
package.json - Replaces heavy native modules with empty stubs
.node-version - Pins Node 20.11.0 for prebuilt binaries
"pnpm": {
"overrides": {
"mongodb-client-encryption": "npm:empty-npm-package@1.0.0",
"kerberos": "npm:empty-npm-package@1.0.0",
"@mongodb-js/zstd": "npm:empty-npm-package@1.0.0",
"@aws-sdk/credential-providers": "npm:empty-npm-package@1.0.0",
"snappy": "npm:empty-npm-package@1.0.0"
}
3 Replies
2 months ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
2 months ago
specific issues and mitigation steps below:
# Railway Build Issues - Summary for Collaboration
## Original Problem
Railway deployment fails during pnpm install --frozen-lockfile --prefer-offline with exit code 137 (OOM - Out of Memory killed).
- Error message: process "pnpm install --frozen-lockfile --prefer-offline" did not complete successfully: exit code: 137: context canceled
- The build was working prior to commit 6e83128 (sg distance buckets)
- Local builds work fine with pnpm run build
- Railway Hobby Plan: 8 GB RAM, 8 vCPU (runtime), but build memory is limited separately
## Root Cause Analysis
Exit code 137 indicates the Linux kernel's OOM killer terminated the process. This happens during native module compilation (kerberos, mongodb-client-encryption, snappy, etc.) which are optional dependencies of mongoose/mongodb.
The commit 6e83128 only added a new npm script and ~2200 lines of TypeScript code - no new dependencies. The OOM is likely due to:
1. Railway infrastructure changes (reduced build memory)
2. pnpm lockfile regeneration pulling different versions
3. Build cache invalidation forcing full recompilation
## Attempted Solutions
### 1. pnpm Overrides (Removed - caused issues)
Added overrides to replace heavy native modules with empty stubs:
```json
"pnpm": {
"overrides": {
"mongodb-client-encryption": "npm:empty-npm-package@1.0.0",
"kerberos": "npm:empty-npm-package@1.0.0",
"@mongodb-js/zstd": "npm:empty-npm-package@1.0.0",
"@aws-sdk/credential-providers": "npm:empty-npm-package@1.0.0",
"snappy": "npm:empty-npm-package@1.0.0"
}
}
```
Result: Caused ajv module resolution errors in Railway container.
### 2. .npmrc Configuration (Current)
```
optional=false
shamefully-hoist=true
```
- optional=false: Skip optional native dependencies
- shamefully-hoist=true: Flatten node_modules to fix module resolution
### 3. .node-version
Pinned to 20.11.0 to use prebuilt binaries instead of compiling from source.
### 4. railway.toml
```toml
[build]
buildCommand = "pnpm install --frozen-lockfile && pnpm run build"
[build.env]
NODE_OPTIONS = "--max-old-space-size=4096"
```
Result: NODE_OPTIONS doesn't appear to affect Railway's build memory limit.
### 5. Dockerfile (Current)
Multi-stage build to reduce image size:
```dockerfile
FROM node:20.11.0-alpine AS builder
WORKDIR /app
RUN corepack enable && corepack prepare pnpm@latest --activate
COPY package.json pnpm-lock.yaml .npmrc ./
RUN pnpm install --frozen-lockfile
COPY . .
RUN pnpm run build
FROM node:20.11.0-alpine AS production
WORKDIR /app
RUN corepack enable && corepack prepare pnpm@latest --activate
COPY package.json pnpm-lock.yaml .npmrc ./
RUN pnpm install --frozen-lockfile --prod
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/src/main.js"]
```
### 6. Removed mongodb-mcp-server
This unused dependency pulled in 190 packages including heavy native modules.
Result: Reduced dependency count significantly.
## Current State
After all changes:
- Local build works: 
- Railway build: Still experiencing issues (ajv module resolution errors)
## Error After Dockerfile Approach
```
Error: Cannot find module 'ajv/dist/compile/codegen'
Require stack:
- ajv-formats/dist/limit.js
- @angular-devkit/core (used by @nestjs/cli)
```
This is a pnpm hoisting issue where ajv-formats can't find ajv due to pnpm's strict module isolation. Added shamefully-hoist=true to fix.
## Current Configuration Files
### package.json (relevant parts)
- NestJS 11.x
- Mongoose 8.x
- No mongodb-mcp-server (removed)
- No pnpm overrides (removed)
### Dependencies that may cause native compilation
- bcrypt (has native bindings)
- mongoose → mongodb (optional native deps)
## Questions to Explore
1. Is there a way to increase Railway's build memory without upgrading to Pro ($20/mo)?
2. Would switching from pnpm to npm for the build reduce memory usage?
3. Could we pre-build the dist folder locally and push it (skip build on Railway)?
4. Are there alternative hosting platforms with more generous build resources?
5. Could we use a GitHub Action to build a Docker image and push to Railway's container registry?
## Relevant Links
- Railway Hobby Plan limits: https://railway.app/pricing
- pnpm shamefully-hoist: https://pnpm.io/npmrc#shamefully-hoist
- Node.js OOM in Docker: https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes
## Timeline
1. Commit 6e83128 - sg distance buckets (no dep changes)
2. Railway deploy fails with exit code 137
3. Added pnpm overrides → caused ajv errors
4. Added Dockerfile → still ajv errors
5. Removed mongodb-mcp-server (-190 packages)
6. Added shamefully-hoist=true → local build works
7. Awaiting next Railway deploy test
2 months ago
Figured it out - solution below:
Railway OOM Build Failure - Resolution Summary
Problem
Railway deployment failed with exit code 137 (OOM killed) during pnpm install --frozen-lockfile.
Error message:
process "pnpm install --frozen-lockfile --prefer-offline" did not complete successfully: exit code: 137: context canceled
Root Cause
Not a Railway infrastructure issue. The failure was caused by a regenerated pnpm-lock.yaml that resolved to newer package versions with heavier dependencies.
Key Finding
When debugging, we regenerated the lockfile multiple times (deleted pnpm-lock.yaml and ran pnpm install). This pulled newer versions of packages like:
@nestjs/jwt11.0.0 → 11.0.2@nestjs/common11.1.3 → 11.1.11mongoose8.16.1 → 8.21.0Many transitive dependency updates
These newer versions, combined with their transitive dependencies, increased memory usage during installation beyond Railway's build memory limit.
Debugging Process
Initial hypothesis: Railway build memory too low
Attempted fixes (did not resolve):
Added
.npmrcwithoptional=false,shamefully-hoist=true, concurrency throttlingAdded pnpm overrides to stub out native modules
Created multi-stage Dockerfile with
pnpm fetch/-offlineRemoved unused
mongodb-mcp-serverdependencyAdded
ajvas direct dependency to fix hoisting issues
Breakthrough: Rolled back to last known working commit (
f248685) and deployed successfully on Railway.Confirmation: Cherry-picked only the code changes onto the working base (keeping original lockfile) - build succeeded.
Solution
Preserve the original pnpm-lock.yaml rather than regenerating it.
When the lockfile was regenerated, it resolved to newer package versions that:
Had larger dependency trees
Required more memory during installation
Exceeded Railway's Hobby plan build memory limit
Lessons Learned
Never delete
pnpm-lock.yamlto "fix" build issues - this often makes things worse by pulling newer, potentially heavier dependencies.Use
git diffto check lockfile changes before deploying:git diff HEAD~1 -- pnpm-lock.yaml | wc -lFor OOM during
pnpm install, first verify the lockfile hasn't changed before blaming infrastructure.Rollback testing is essential - deploy the last known working commit to isolate whether it's code vs. environment.
Environment
Platform: Railway Hobby Plan (8 GB RAM runtime, limited build memory)
Package Manager: pnpm 10.x
Node.js: 20.x
Framework: NestJS 11.x with MongoDB/Mongoose
Files Affected
No code changes required. The fix was reverting to the original pnpm-lock.yaml that was present in the last successful deployment.
Status changed to Solved brody • about 2 months ago