24 days ago
4 SSE streams on nextjs-app-production-f045.up.railway.app silently dropped between 17:04:00–17:10:00 UTC on Feb 20, 2026. Zero errors in our
application logs or Sentry — every upstream call succeeded. We suspect the 60s proxy keep-alive timeout killed idle connections
during long AI tool call processing.
Could you pull edge proxy connection termination events for our production services (python-services and nextjs-app) during that
6-minute window? We want to confirm it was idle timeout vs residual effects from the Feb 19 networking incident.
thanks
5 Replies
Status changed to Awaiting User Response Railway • 23 days ago
23 days ago
Thanks Brody, really appreciate you looking into this!
Our main concern is understanding the root cause so we can figure out two things: whether there's anything we should fix on our side,
and how to build better observability and recovery for next time.
We've been investigating and can see 5 dropped SSE connections that day (10:09 UTC and 17:04-17:09 UTC), but from our application
logs and traces everything looks healthy. The silence gaps between SSE events were only 6-8 seconds, so it doesn't seem like the
keep-alive timeout — which leaves us a bit in the dark about what actually caused the drops.
If there's any way to see edge or proxy logs from that window, or any insight into what might have caused them, that would really
help us close the loop and build the right safeguards going forward.
Thanks again!
Status changed to Awaiting Railway Response Railway • 23 days ago
23 days ago
Happy to explain!
We recently rolled out support for Fastly, meaning we put Fastly's proxy in front of everyone's domain, but we forgot to tweak some timeout settings to closer match what our proxy allowed for, so for SSE requests specifically, if there was a sizeable time gap between events, the connection would get closed, we have since increased that gap to 15 minutes to match the 15 minutes timeout we have set for an entire request.
Status changed to Awaiting User Response Railway • 23 days ago
23 days ago
Thanks Brody, that's really helpful context about the Fastly rollout!
We want to make sure we're not missing something on our side, so we dug into the traces pretty carefully. What we're seeing suggests the drops weren't caused by
idle gaps between SSE events:
- All 5 failed connections on Feb 20 died during the post-tool-call phase, where Claude was actively generating and streaming response tokens
- The longest silence gap across any of the 5 failures was ~8 seconds (during a tool call)
- But successful streams from the same user, same day, survived longer silence gaps (up to 9.9 seconds) without issue
So it looks like the connections were severed while data was actively flowing, not during idle periods. That's what makes us think it might have been the Fastly
cutover itself (existing connections being reset as the new proxy was inserted) rather than a timeout on the new proxy.
Is there any way we could see edge/proxy-level logs for our production services (nextjs-app and python-services on production-f045.up.railway.app) from that
window? Specifically around 10:09 UTC and 17:04-17:10 UTC on Feb 20. We have full application-level traces from Sentry but zero visibility into what happened at
the proxy layer — connection termination events, Fastly handoff logs, anything like that would help us close the loop and rule out anything on our end.
Thanks again for looking into this!
Status changed to Awaiting Railway Response Railway • 23 days ago
23 days ago
I assure you this was not caused by cutting over. The cutover was purely DNS-based. We left our proxies intact and did not perform any restarts of any kind on our infrastructure. We swapped our IPs for Fastly's IPs, so existing connections weren't disrupted in any way, while new connections connected to Fastly and experienced the SSE issue before we corrected the configuration.
This was not on your end; it was purely an oversight in our Fastly settings.
Status changed to Awaiting User Response Railway • 23 days ago
15 days ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 15 days ago