15 days ago
Hi Railway Support,
We're experiencing a critical production outage that has been ongoing for over an hour, and our users are actively affected.
The issue: Requests are not reaching our API. We're seeing the following error in the logs:
> Pop visit count exceeded max threshold on cache-fra-etou8220130-FRA: scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-etou8220054-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-eddf8230177-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-etou8220083-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-eddf8230177-FRA
This appears to be a Railway edge/proxy layer issue — traffic is not being forwarded to our service. Earlier today we also experienced an SSL-related problem, which resolved itself, but now this new issue has surfaced.
Impact:
- Production environment fully down
- All end users unable to use the application
- Ongoing for 60+ minutes
Could you please investigate this as a priority? Any status updates or ETAs would be greatly appreciated — we'd like to communicate transparently with our users.
Thank you for your urgency on this.
7 Replies
15 days ago
I get errors from Fastly here are some headers.
"Error-Reason": "loop detected",
"Fastly-Ff": "scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-etou8220054-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-eddf8230177-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-etou8220083-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-eddf8230177-FRA",
"Fastly-Host": "cache-fra-etou8220130-FRA",I use Upstash Workers as a background worker queue, which request my server endpoints.
14 days ago
Another error found in logs
Same machine same service in the most recent hop on cache-fra-etou8220105-FRA: scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-etou8220054-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-etou8220105-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-eddf8230082-FRA, scJ4iSV4UkyHDI2juJNzrcUzYk/oO1PMCyYg4bz5Utg!FRA!cache-fra-etou8220105-FRA
14 days ago
The 'Error-Reason: loop detected' header reveals this is an application configuration issue, not a Railway platform problem. Your Next.js app is forcing HTTPS redirects that conflict with how Railway's edge proxy handles requests from Upstash Workers. Railway's proxy always sets X-Forwarded-Proto: https for internal forwarding (the connection is already encrypted), but your application middleware is checking for HTTPS and creating a redirect loop. You'll need to update your Next.js middleware to trust the X-Forwarded-Proto header instead of forcing redirects. This is a common issue when external workers like Upstash call Railway services.
Status changed to Awaiting User Response Railway • 14 days ago
14 days ago
Hi,
Thank you for the quick response. I appreciate the direction, but after investigating further I don't believe this is an application-side HTTPS redirect issue — here's why:
1. I'm not forcing HTTPS redirects in Next.js middleware. I did make some minor CSP and redirect adjustments today after incident, but nothing that could fix this behavior.
2. The issue is intermittent, not consistent. Most QStash callbacks succeed — only some fail, and the probability increases after longer-running workflow steps. A middleware redirect loop would fail 100% of the time.
3. This setup worked flawlessly for ~9 months. The failures started today, coinciding with the introduction of your Fastly layer.
Root cause (our analysis):
Our request chain is: QStash → Cloudflare (proxied) → Railway/Fastly → App
We're running Cloudflare in Full SSL mode as our DNS/CDN proxy. With the new Fastly layer added on Railway's side, we now have two CDN proxy layers back-to-back. The Fastly-Ff response header confirms the request is bouncing through 4+ Frankfurt PoPs before Fastly kills it with a loop detection error — this looks like a Cloudflare
Fastly routing conflict, not an application redirect.
The Error-Reason: loop detected is Fastly's CDN-level loop detection, not an HTTP redirect loop originating from our app.
Our fix (in progress):
We're setting up a DNS-only subdomain workflows.retuszuj.pl in Cloudflare (proxy disabled), pointing directly to Railway's CNAME. QStash callbacks will use this subdomain, making the chain: QStash → Railway/Fastly → App — a single CDN layer, which should eliminate the conflict entirely.
In the meantime, we've enabled a synchronous fallback to bypass Upstash, but this is not a sustainable solution.
Questions for the team:
1. Is there anything on Railway's Fastly configuration that could contribute to loop detection when receiving proxied POST requests from another CDN (Cloudflare)?
2. Are there recommended headers or routing configurations for machine-to-machine callbacks that help Fastly distinguish legitimate upstream-proxied requests from loops?
3. Are there any known issues with Fastly looping on POST requests with larger payloads coming through an upstream proxy?
4. Would it be possible to flag or exempt a specific subdomain/path pattern from Fastly's loop detection for trusted callback traffic?
Happy to provide full request headers or Fastly-Ff traces if that helps your investigation.
Thanks for your time.
14 days ago
Hi,
Quick update — the DNS-only subdomain fix did not resolve the issue. Fastly is still looping requests the same way as before, so removing Cloudflare from the chain didn't make a difference.
I also tested with a completely separate, non-production app — a standalone Hono API server, no Cloudflare, no Next.js, no middleware of any kind — and it has the exact same problem. QStash callbacks fail with the same Fastly loop detection error.
I want to be transparent about why this matters to us: Upstash Workflows (QStash) is a core part of our architecture. We rely on it heavily for async processing, so this isn't something we can work around long-term with a synchronous fallback.
We have two independent apps — different stacks, different domains, different configurations — both failing the same way with QStash callbacks since the Fastly layer was introduced.
Happy to provide logs, request traces, or anything else that would help.
Thanks
14 days ago
Hi,
Good news — after hours of debugging we identified and fixed the root cause on our end. Sharing the details in case it's useful for other Railway users hitting the same issue.
Root cause:
The Upstash Workflow SDK's recreateUserHeaders() function filters known CDN headers (Cloudflare, Vercel, Render) before forwarding them via QStash's Upstash-Forward-* mechanism — but it does not filter Fastly headers. As a result, headers like fastly-ff from Railway's edge proxy were being forwarded to each subsequent workflow step callback. Each time a callback passed through Railway/Fastly, a new entry was appended to the existing Fastly-Ff header. After ~4 workflow steps, Fastly's "pop visit count" threshold was exceeded and it returned 503.
This explains why the issue is step-count dependent (not random), and why it only appeared after Railway introduced the Fastly layer.
Our fix (app-level workaround):
1. Strip Fastly/CDN headers in Next.js middleware before requests reach the Upstash SDK, preventing the SDK from picking them up and forwarding them via Upstash-Forward-*. Headers stripped: fastly-ff, cdn-loop, fastly-client, fastly-client-ip, fastly-ssl, fastly-temp-xff, via.
2. Strip CDN headers from workflow route responses as a belt-and-suspenders measure, removing fastly-ff, cdn-loop, and via from POST handler responses.
Upstream fix (for Railway to consider):
This is ultimately a bug in the Upstash Workflow SDK — recreateUserHeaders() should include Fastly headers in its exclusion list, which would protect all Railway users without requiring app-level workarounds. I'm planning to file an issue at https://github.com/upstash/workflow-ts/issues.
However, it would also be worth Railway considering whether Fastly can be configured to strip or reset the Fastly-Ff header on incoming requests from external sources (like QStash), rather than accumulating across hops. That would prevent this class of issue entirely for any app using webhook/callback-based workflows on Railway.
Hope this helps. Happy to share more details if useful.
14 days ago
thanks wfalowski for the workaround