Stalled Requests
charlie
PROOP

2 months ago

Hi, I'm having the issue where Chrome Browser is telling me some of my client API requests are "stalled".

The API requests are going to a Railway Hosted FastAPI server. I've tested this on local and I don't experience this so I've narrowed it down to a Railway Connection issue.

Has anyone else had this? It means my p99 Response Times in the Railway dashboard are consistently >30 seconds which is weird because Chrome documentation indicates that if a request is "Stalled" it's not even left the browser (but this only happens when I point my application at my Railway API).

$20 Bounty

5 Replies

ilyass012
FREE

2 months ago

the "stalled" label in chrome isn't always what it seems, chrome actually folds cors preflight delays into the stalled state so it can be misleading. best way to actually diagnose what's happening is go to chrome://net-export, capture a log while reproducing the issue, then import it at netlog-viewer.appspot.com and look at the events for your request. you're likely going to see either "socket_pool_stalled_max_sockets_per_group" or http/2 sessions with goaway frames showing no_error, which means the connection is dying but chrome thinks it's still alive and tries to reuse it. try the capture first and share what you see, that'll pinpoint the exact cause.


ilyass012
FREE

2 months ago

hey, did the net-export log show anything? curious to see what came up


charlie
PROOP

2 months ago

Yes, it was quite helpful. I was struggling to replicate the issue over the weekend however this morning it was consistently happening so I was able to log the network requests.

After importing them into the netlog-viewer, I couldn't filter and see any ""socket_pool_stalled_max_sockets_per_group" however I could find my request which was taking particularly long (~8 seconds). I took those logs: the HTTP/2_SESSION and URL_REQUEST and (with a bit of help from gemini) and there's indication that it could be the individual requests to Railway are taking a long time to resolve (i.e. my Application code).

This is the suggestion from gemini after adding in the logs:

"Your application is suffering from "Head-of-Line Blocking" (application-level) and extreme server-side latency on every request."

This still seems a bit mysterious to me because my local server is functioning just fine, however, I'm logging the actual database connection times to see if they're the source of the long latency.


charlie

Yes, it was quite helpful. I was struggling to replicate the issue over the weekend however this morning it was consistently happening so I was able to log the network requests. After importing them into the [netlog-viewer](http://netlog-viewer.appspot.com), I couldn't filter and see any ""socket\_pool\_stalled\_max\_sockets\_per\_group" however I could find my request which was taking particularly long (\~8 seconds). I took those logs: the HTTP/2\_SESSION and URL\_REQUEST and (with a bit of help from gemini) and there's indication that it could be the individual requests to Railway are taking a long time to resolve (i.e. my Application code). This is the suggestion from gemini after adding in the logs: "**Your application is suffering from "Head-of-Line Blocking" (application-level) and extreme server-side latency on every request."** This still seems a bit mysterious to me because my local server is functioning just fine, however, I'm logging the actual database connection times to see if they're the source of the long latency.

ilyass012
FREE

2 months ago

happy to hear from you, sounds like we're on the right track, the db connection logging will tell you a lot. what are you seeing so far in the logs?


charlie
PROOP

2 months ago

So there's not particularly any long requests to my database that I can see (I've started logging the time of every request over N ms). The longest time is ~5/6 seconds (out of several thousand requests these were the highest and there were only a few) however this matches the timeline in my "Response Times" metrics dashboard for p99 users, but this is peaking at ~25/26 seconds so there seems to be something on Railway's side that is adding an extra 20 seconds the response.


Welcome!

Sign in to your Railway account to join the conversation.

Loading...