I'm getting a lot of 502 with 600sec timeout
alaanor
HOBBYOP

a year ago

I don't know if I'm at fault or railway is or a combination of all. I just know it's stupid hard to debug and I haven't managed to reproduce the issue on my side. There are good chance it's my fault tbh.

What I do know tho:

  • On railway interface, in HTTP Logs, I do have a lot of 502, randomly. All are 600sec timeout as if my app behind this was at full capacity and couldn't even answer the request.

  • The cpu usage is at normal level, in fact near 0 even.

  • Plenty of request works and get their normal status 200.

  • I have opentelemetry setup and I have a span for every request and I can see if there are error and when an http request happened. I have 0 error there and I don't even have a single log concerning those request that fail (I picked a few random endpoint that 502'ed and compared the timing in both logs)

The 502 happens quite often and I would love to have an idea where to start debugging it. Any idea are welcome. As I said above, it might not even be railway's fault, but I think it would be wrong to not try to share it here.

project id: 8bccf693-4059-4ef3-9dd0-55493979fdb7
service id: b4116e2b-92a3-4506-b599-b79cce0efa9a

Solved

9 Replies

alaanor
HOBBYOP

a year ago

Semi related thread; https://discord.com/channels/713503345364697088/1320448499996954695 I wanted to use the search bar to find what kind of endpoint are mostly failed, maybe I would be able to find a pattern there.


a year ago

as of writing, the latest deployment for that service load the http logs just fine.


alaanor
HOBBYOP

a year ago

Yup ! I did change something. Still unsure about the exact reason but I disabled Keep-Alive stuff from actix (some rust lib) and seems to have solved. It was probably something on my end


alaanor
HOBBYOP

a year ago

I'm midly annoyed that I don't know the exact cause but it's solved for now :)


a year ago

fwiw i dont think http logs would have told you what the issue was, they would have told you the path though if that could have helped?


alaanor
HOBBYOP

a year ago

Right, they wouldn't directly solve, but if that was a particular kind of endpoint that fail consistently, it would have let me know that something is wrong with whathever this endpoint does, if that make sense. That's why I wanted to search


alaanor
HOBBYOP

a year ago

but it's all solved now <:salute_guy:1244814929630855188>


a year ago

sounds good!, if this happens again, please open a new thread


a year ago

!s


Status changed to Solved brody about 1 year ago


Loading...