20 days ago
Hi Railway Support,
We are experiencing recurring crashes/restarts of our Next.js service.
Our application is built with Next.js 14 and runs on Railway as a Node.js service. The service seems to crash or become unavailable almost every day. The latest incident happened 20 May and at that moment memory usage was only around 3 GB, which is well below the available memory limit for the service.
I am attaching screenshots of the recent errors, deployment logs, and memory metrics.
Service details:
- Framework: Next.js 14
- Start command:
next start - Next.js config includes:
output: 'standalone'
Symptoms:
- The service sometimes returns 502 responses.
- The service appears to restart or become unavailable even when memory usage is only around 3 GB.
- We do not see memory usage reaching the service limit before the crash.
Could you please check the platform-side logs for this deployment/service and confirm what caused the restart or unavailability?
Specifically, can you check whether this was caused by:
- OOM killer or memory limit enforcement
- healthcheck failure
- container eviction or host-level issue
- proxy timeout
- deployment/runtime restart
- process receiving SIGTERM/SIGKILL
- any Railway infrastructure issue at that time
Could this warning cause unstable runtime behavior or restarts on Railway, or is it unrelated?
Please let us know what happened at the platform level during the incident and what you recommend changing on our side.
Thanks.
1 Replies
20 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 20 days ago
19 days ago
However, this issue is becoming critical for our production environment. The service instability is happening almost daily, and the behavior strongly suggests a platform/runtime-level problem rather than an application-level memory issue.
Additional observations from our side:
Memory usage was only around 3 GB during the incident, well below the available limit.
The application suddenly became unavailable and started returning 502 errors.
The logs show the process receiving SIGTERM unexpectedly during next start.
We are not seeing evidence of memory exhaustion before the restart.
Response time spikes reached 30 seconds before the failure.
Request error rate suddenly increased to over 22%.
Redeploys sometimes do not start correctly after commits from GitHub.
At this point, we need Railway to confirm whether this was caused by:
host/node instability
container eviction
internal platform restart
healthcheck termination
proxy/network timeout
infrastructure balancing issue
runtime orchestration problem
We would also like clarification on whether Railway is automatically restarting or recycling containers due to platform-side conditions even when memory usage remains stable.
This level of instability is severely impacting our production operations, deployments, and customer experience. If this continues without a proper resolution or mitigation plan, we will unfortunately need to evaluate alternative infrastructure providers.
Additionally, considering the recurring downtime and deployment failures, we request service credits/compensation for the disruption caused.
Please escalate this case to the infrastructure/platform engineering team and provide detailed root-cause analysis.
Thank you.