Visiting domain returns "Application failed to respond"; replicas are healthy

Question

We've hit this several times now: Our domain returns the "Application failed to respond" page for several minutes at a time. Clicking "Restart" in the Railway dashboard immediately resolves it, every time.

The replicas themselves are healthy during the outage:

\- QStash/internal traffic to /process-webhook-comment/\* keeps flowing

across all 4 replicas the entire time (\~5 req/s, p99 < 1s)

\- No app-side errors, no OOM, no crash, no SIGTERM

\- Only public-edge traffic to our Domain are affected

Most recent occurrence (2026-04-30 UTC):

\- 13:31:25 GET / → 200 1ms (last successful public request)

\- 13:31:25 → 13:40:26 no successful public responses (\~9 min)

\- 13:39:26 GET / received but never responded (event loop appears

fine — internal /process-webhook-comment/\* requests in the

same window resolved in <1s on the same replicas)

\- 13:40:26 GET / → 200 0ms (after manual restart)

\- Request ID from the error page: jKHCszjVTHyj6bEJGbGh5g  
  
 This pattern (public edge dead, private/internal traffic fine, restart instantly fixes it) suggests the issue is in the edge proxy / routing ayer rather than our app. Can you take a look?