9 days ago
Service: @schoolstackbudget/api-server
Environment: production
Region: europe-west4-drams3a (EU West)
Project ID: ac396ee9-0404-4498-866e-6e9228964aa6
Issue: My API service is experiencing persistent health check failures, preventing deployments from succeeding. The application itself is healthy and starts correctly, but Railway's health check requests are not reaching the container.
Symptoms:
Application starts successfully, runs migrations, and listens on port 8080
No application errors or missing variables
Health check requests fail to route to the container during deployment
Same commit has succeeded once (May 19, 18:18 UTC) but failed 7+ times after with zero code changes between deployments
Failure rate: 7 out of 8 deployments of the same commit
Failed Deployment IDs:
813bffce-d1e3-4e71-9267-c2077034cd6a (May 20, 16:55 UTC)
27946796-62fa-457f-95be-2c502ca7ca57 (May 20, 16:39 UTC)
b264fef9-5e17-4a90-9df6-a95698b20674 (May 20, 17:03 UTC)
2f895ae1-173f-4fe0-bf90-89c7c3f7025c (May 20, 17:07 UTC)
Successful Deployment ID (for comparison):
4d74813d-0fdf-42ee-b060-27a2426699b5 (May 19, 18:18 UTC)
Diagnosis: Railway's deployment diagnostics indicate: "The application starts correctly and listens on port 8080, but healthcheck requests are not reaching the container. This has happened on 7 out of 8 deployments of the same commit with no code or config changes between them."
Impact: Site is currently down. This appears to be a persistent routing issue in the eu-west4 region that redeploying does not reliably resolve.
Request: Please investigate the routing/load balancer configuration for eu-west4-drams3a and determine why health check requests are failing to reach healthy containers.
9 Replies
Status changed to Awaiting Railway Response Railway • 9 days ago
9 days ago
Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.
It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.
You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY
If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c
Feel free to respond if your question has not been addressed.
Status changed to Awaiting User Response Railway • 9 days ago
9 days ago
Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.
It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.
You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY
If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c
Feel free to respond if your question has not been addressed.
9 days ago
I NEED URGENT HELP! The only think left to do is delete everything and start over fresh with github -- please help!!!
Status changed to Awaiting Railway Response Railway • 9 days ago
9 days ago
Please do not delete your project - that would cause permanent data loss and will not help here. Your application is starting correctly and listening on port 8080 with no errors.
As a temporary workaround, remove the health check path from your service settings (Settings > Deploy > Healthcheck Path, clear the field and save), then redeploy. This will allow your deployment to go live without waiting for a health check response. You can find more details on health check configuration here: https://docs.railway.com/deployments/healthchecks
Once your service is back online, we'd recommend re-adding the health check and investigating why your container isn't responding to health check requests during deployment.
Status changed to Awaiting User Response Railway • 9 days ago
Status changed to Solved mykal • 9 days ago
9 days ago
The site has been down all day and once again the health checks aren't working. I need help-- everything was working before your issues today and now my site is down
Status changed to Awaiting Railway Response Railway • 9 days ago
9 days ago
Hey, sorry your site has been down all day and for the duplicate canned responses earlier.
Did you get a chance to try the workaround Mykal suggested? Removing the health check path temporarily (Settings > Deploy > Healthcheck Path, clear the field and save) and then redeploying should get your service back online while we sort out the underlying routing issue in EU West. Your app is healthy, it's the health check routing that's failing.
Status changed to Awaiting User Response Railway • 9 days ago
chandrika
Hey, sorry your site has been down all day and for the duplicate canned responses earlier. Did you get a chance to try the workaround Mykal suggested? Removing the health check path temporarily (Settings > Deploy > Healthcheck Path, clear the field and save) and then redeploying should get your service back online while we sort out the underlying routing issue in EU West. Your app is healthy, it's the health check routing that's failing.
8 days ago
Appreciate your help. Site is still giving me a 502 error. I'm going to try to shift the front end to netlify and use railway for API/Postgres bc I don't know what else to do.
Status changed to Awaiting Railway Response Railway • 8 days ago
8 days ago
We are still experiencing a production outage on a Railway API service.
Service/domain:
-
schoolstackbudget.up.railway.app -
Public requests to
/healthand/api/readyreturn Railway’s “Application failed to respond” page / 502. -
Example Request IDs:
1k51WwPMS7apm5EayCLmYg1xav573zT02Q8h8GyCLmYg
The container deploy logs show the app booted successfully and is listening:
[preflight] PREFLIGHT_SKIP=1 set — skipping ledger/schema gate[migrate] Schema up to date[startup] WARN: R2 boot probe SKIPPED via SKIP_R2_BOOT_PROBE=1[seed] SKIP_PREVIEW_SEED=true — skipping preview-data seedServer listening on [::] (dual-stack):8080
Environment:
PORT=8080- Dockerfile-based deploy
- Postgres is a separate Railway service in the same project
- API connects to Postgres successfully during boot/migrations
Issue:
The container appears healthy and listening, but Railway public networking is not forwarding requests to it. This looks like a routing/service mesh issue rather than an application boot failure.
Can you investigate the public routing for this service/domain and confirm whether the service mesh registration is stuck or misconfigured?
8 days ago
We need Railway support to investigate public routing for a new service.
Project/service:
Schoolstack_Budget
Generated domain:
schoolstackbudget-production.up.railway.app
We created a brand-new service and temporarily replaced the app with a minimal Node HTTP smoke server:
node -e "require('http').createServer((req,res)=>{console.log('[smoke-hit]',req.method,req.url);res.writeHead(200,{'content-type':'application/json'});res.end(JSON.stringify({ok:true,path:req.url,port:process.env.PORT||8080}))}).listen(process.env.PORT||8080,'0.0.0.0',()=>console.log('[smoke] listening on 0.0.0.0:'+(process.env.PORT||8080)))"
Deploy logs show:
[smoke] listening on 0.0.0.0:8080
But public requests still return Railway 502 / “Application failed to respond.” HTTP logs show requests at the edge, but they time out after 15s:
GET /health → 502, 15sGET /api/ready → 502, 15sGET / → 502, 15s
This proves the issue is not our Express app, database, migrations, R2, CORS, Netlify, or Postgres. Public routing is not reaching the listening process.
Please investigate target-port/public networking/service mesh routing for this service.