When posting data to the REST API of my service I suddenly get a lot of "Connection aborted" responses. Only about 1 in 4 calls gets through successfully. This happened out of the blue, with me not doing any changes to the service myself. In the Network Flow logs I can see the following popping up again and again: * `TCP_AOFAILURE` in connection from service to postgres * `TCP_OVERWINDOW` in connection from outside to service The logs of postgres and my service look as expected, nothing special shown there. I think the Network Flow layer is new, right? Is it possible that somehow this plays a role in the different behaviour?

"Connection aborted" when POSTing data to my service all of a sudden

monsdar

HOBBYOP

4 months ago

When posting data to the REST API of my service I suddenly get a lot of "Connection aborted" responses. Only about 1 in 4 calls gets through successfully. This happened out of the blue, with me not doing any changes to the service myself.

In the Network Flow logs I can see the following popping up again and again:

TCP_AOFAILURE in connection from service to postgres
TCP_OVERWINDOW in connection from outside to service

The logs of postgres and my service look as expected, nothing special shown there.

I think the Network Flow layer is new, right? Is it possible that somehow this plays a role in the different behaviour?

Solved$10 Bounty

Pinned Solution

monsdar

HOBBYOP

4 months ago

I solved this by not making a new requests.post(...) request for each POST I sent to my production service. Instead I reused the connection using requests.Session(...). It looks like something on the Railway end changed so the previous approach used too many connections therefore some got rejected. I suspect the recently introduced DDoS Protection, but I cannot say for sure without further insights.

3 Replies

Status changed to Awaiting Railway Response Railway • 4 months ago

monsdar

HOBBYOP

4 months ago

I checked my database and found the following:

I still had 8 CPU and 8 GB RAM configured
Metrics show me that RAM usage is around 100MB, CPU usage is shown at 0.0 vCPU, with a little spike within the time I tried to use my API yesterday (still showing as 0.0 vCPU)
Connections are shown with 1 or 2 / 100 while I'm running my script, so it looks like I'm nowhere near the limit
My database size is around 73 MB, so I guess this is well within the limits
I enabled DB stats, perhaps these bring some insight

Still digging, perhaps something turns up...

monsdar

HOBBYOP

4 months ago

A few more things:

Query performance is totally acceptable, with selects being around 2ms and inserts around 22ms mean
I ran some vacuuming, but the affected rows were in the double digits, so shouldn't have much of an effect...
While my script that is using the REST API (I use it to update the data) is failing with ConnectionResetError 10054 using the website works totally fine, performance being as fast as it always was

The reason I suspected Network Flow was that I did not change anything on my deployment, running my update-script worked flawlessly a few weeks ago when I ran it last time. Now it gets Connection Reset at the first connection attempt. So something in either my environment or within the Railway environment must have changed 😕

monsdar

HOBBYOP

4 months ago

I solved this by not making a new requests.post(...) request for each POST I sent to my production service. Instead I reused the connection using requests.Session(...). It looks like something on the Railway end changed so the previous approach used too many connections therefore some got rejected. I suspect the recently introduced DDoS Protection, but I cannot say for sure without further insights.

Status changed to Solved brody • 4 months ago

Welcome!

Sign in to your Railway account to join the conversation.

Login