8 days ago
When posting data to the REST API of my service I suddenly get a lot of "Connection aborted" responses. Only about 1 in 4 calls gets through successfully. This happened out of the blue, with me not doing any changes to the service myself.
In the Network Flow logs I can see the following popping up again and again:
TCP_AOFAILUREin connection from service to postgresTCP_OVERWINDOWin connection from outside to service
The logs of postgres and my service look as expected, nothing special shown there.
I think the Network Flow layer is new, right? Is it possible that somehow this plays a role in the different behaviour?
Pinned Solution
7 days ago
I solved this by not making a new requests.post(...) request for each POST I sent to my production service. Instead I reused the connection using requests.Session(...). It looks like something on the Railway end changed so the previous approach used too many connections therefore some got rejected. I suspect the recently introduced DDoS Protection, but I cannot say for sure without further insights.
3 Replies
Status changed to Awaiting Railway Response Railway • 8 days ago
7 days ago
I checked my database and found the following:
I still had 8 CPU and 8 GB RAM configured
Metrics show me that RAM usage is around 100MB, CPU usage is shown at 0.0 vCPU, with a little spike within the time I tried to use my API yesterday (still showing as 0.0 vCPU)
Connections are shown with 1 or 2 / 100 while I'm running my script, so it looks like I'm nowhere near the limit
My database size is around 73 MB, so I guess this is well within the limits
I enabled DB stats, perhaps these bring some insight
Still digging, perhaps something turns up...
7 days ago
A few more things:
Query performance is totally acceptable, with selects being around 2ms and inserts around 22ms mean
I ran some vacuuming, but the affected rows were in the double digits, so shouldn't have much of an effect...
While my script that is using the REST API (I use it to update the data) is failing with
ConnectionResetError 10054using the website works totally fine, performance being as fast as it always was
The reason I suspected Network Flow was that I did not change anything on my deployment, running my update-script worked flawlessly a few weeks ago when I ran it last time. Now it gets Connection Reset at the first connection attempt. So something in either my environment or within the Railway environment must have changed 
7 days ago
I solved this by not making a new requests.post(...) request for each POST I sent to my production service. Instead I reused the connection using requests.Session(...). It looks like something on the Railway end changed so the previous approach used too many connections therefore some got rejected. I suspect the recently introduced DDoS Protection, but I cannot say for sure without further insights.
Status changed to Solved brody • 7 days ago