22 days ago
4 Replies
22 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 22 days ago
22 days ago
OK FREN
22 days ago
The deployment is not failing at build time. The build completes successfully, but the new release crashes during the health check, so the platform marks it unhealthy and falls back to the previous stable v1.4 deployment.
The likely issue is that the Custom Build/Start Command is not being applied correctly, or the new app is not starting with the expected entrypoint.
Please verify that the deployment is using the latest GitHub files:
api_app.py
vector_loader.py
Also please confirm the service is starting from api_app.py, binding to the correct host/port, and that the health-check endpoint returns HTTP 200.
This is critical because every new deployment builds successfully but fails health check and rolls back to old v1.4.
22 days ago
Thank you for your reply,
@imsushant2005
.We have triple-checked all the points you mentioned:
- api_app.py and vector_loader.py are both present and correct on the main branch (I just verified the raw files).
- Procfile is: web: uvicorn api_app:app --host 0.0.0.0 --port $PORT
- Healthcheck path in railway.toml is /health
- The app starts successfully (logs show "Application startup complete" and Uvicorn running on 0.0.0.0:8000)
However, the healthcheck still fails with "1/1 replicas never became healthy!" even after increasing timeout to 1200 seconds.
The new classical version never becomes live — Railway always falls back to the old v1.4 (with ML).
There is also a stuck Custom Build Command in the Railway UI (pip install -r requirements.txt && python -c "from sentence_transformers...") that I cannot remove, even though the new code doesn't use ML.Any idea why the container starts but healthcheck never passes?
Thank you.
21 days ago
Persistent deployment failure with 4 yellow triangles in a row – new classical version never becomes live despite following all suggestions
Dear Railway Support Team,I am writing about a persistent and severe deployment issue that has been ongoing for several days on my service.
Project/Service details:
- GitHub repo: marunigno-ship-it/QERRA-v2-api (main branch)
- Service: QERRA-v2-api-production
- Public domain: qerra-v2-api-production.up.railway.app
Problem:
Every new deployment of my clean classical api_app.py (1.4-classical with real SEMEV vectors) results in:
- Successful build
- Multiple yellow triangles (currently 4 in a row)
- Healthcheck failure after long retries
- Automatic fallback to the old v1.4 version with ML layer Live /health always returns: {"status":"ok","version":"1.4","ml":{"enabled":false,"model_name":"paraphrase-MiniLM-L3-v2","loaded":false,"error":null}} What has been tried (following your team’s suggestions): * Correct api_app.py, vector_loader.py, Procfile, requirements.txt * Updated railway.toml with explicit startCommand using $PORT and timeout 1200s * Removed startCommand as per latest diagnosis (letting Railpack auto-detect) * Multiple redeploys * Increased healthcheck timeout The latest diagnosis from your system said the startCommand with $PORT was causing the crash, so I removed it. Still the same problem — 4 yellow triangles and fallback to old version.This is blocking the entire project. I am on a paid plan and this level of repeated failure is not acceptable. Could you please investigate from the backend: * Why the new deployment never passes healthcheck even after removing startCommand * Why the public domain is not routing to the latest green deployment * Whether there is a stale Custom Build Command or routing cache I can provide screenshots, deployment IDs, full logs, or any other information needed.Thank you for your urgent assistance * Best regards, Marussa Metocharaki QERRA-v2 Project