8 days ago
Hello,
We are trying to investigate a huge jump in 4xx responses that started Nov 25th on our image service.
The above chart is showing the last 7 days. We went from something like a 99% success rate on these requests, down to a ~17% success rate.
Usage and load patterns aren't drastically different. There were no code or config changes made during that time. Both of which has made this all the more confusing.
I did see on the status page there was an issue on the 25th. Is there any chance this is related?
If not, is there anything you can think of that would have caused this?
Perhaps underlying hardware being swapped out for something drastically different (ie amd64 -> arm64 or similar)?
I've tried ramping up replicas (3 -> 6), as the error messages we are getting are timeouts, implying an inability for the server to handle the load. I also tried swapping to a different region to see if that would clear out anything stuck / stale. No appreciable results with either.
We're grasping at straws here. Is there anything you can think of that may have caused this?
Attachments
2 Replies
8 days ago
Hi,
I agree that this situation is unusual, but I can confirm that the incident on the 25th was unrelated to user workloads; it affected only our internal task scheduler and did not impact running services.
There have been no hardware changes or architecture swaps in any region; all machines in all regions run the same hardware and have been running it for several months.
From our side, there’s nothing that would explain the sudden spike in 4xx errors or timeouts. However, reviewing your logs, I’m seeing a high volume of errors related to connection issues and failures to spawn the headless browser. This points to an application-level problem rather than a platform or infrastructure issue.
I realize this may not be the answer you were hoping for, but based on what we’re seeing, the 499s are being caused by something within your application. Unfortunately, we’re not able to provide application-level debugging support.
If you have any other questions about the platform or need more context, let me know.
Best,
Brody
Status changed to Awaiting User Response Railway • 8 days ago
4 days ago
Hi Brody,
Thanks for the reply and the confirmation that nothing material has changed on your end.
Status changed to Awaiting Railway Response Railway • 4 days ago
Status changed to Solved brody • 4 days ago
