a month ago
Hello
We had a traffic spike January 8th around 20h UTC.
Our service node api we saw these errors in the logs:
"upstreamErrors": "[{\"deploymentInstanceID\":\"f55af9d8-2152-4036-aac3-bf051f52f287\",\"duration\":13121,\"error\":\"an unknown error occurred\"},{\"deploymentInstanceID\":\"6acb3dab-a3cc-4818-8bde-8fd77bdc9222\",\"duration\":5000,\"error\":\"connection dial timeout\"},{\"deploymentInstanceID\":\"d257e79c-a2c4-485d-b114-1ea1a1d50dbd\",\"duration\":5000,\"error\":\"connection dial timeout\"}]"
Could you help us understand what went wrong? We dont see any spike in CPU/RAM but the service returned some http 502 to the clients.
Thanks
3 Replies
a month ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
a month ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open brody • 28 days ago
a month ago
Do you use third party services? It seems like your service is doing a request that timeout...
a month ago
If you are using node.js, a single synchronous operation or a blocked Event Loop can cause this. However, that usually spikes the cpu. If cpu was flat, it's more likely the event loop was waiting on a promise that never resolved or a slow external API call. And since cpu was low, your application was likely waiting.
Under normal load, a server accepts a connection in milliseconds. A 5 second delay means the server was running but completely unresponsive to new network traffic.
You should check your database metrics to see if it's causing the bottleneck. Check if active connections hit the limit as well. Or maybe you're calling an external API that's causing this.
I also recommend that you implement timeouts for database queries and external API calls to ensure you aren't awaiting a slow promise without a timeout.