a year ago
I don't know if I'm at fault or railway is or a combination of all. I just know it's stupid hard to debug and I haven't managed to reproduce the issue on my side. There are good chance it's my fault tbh.
What I do know tho:
On railway interface, in HTTP Logs, I do have a lot of 502, randomly. All are 600sec timeout as if my app behind this was at full capacity and couldn't even answer the request.
The cpu usage is at normal level, in fact near 0 even.
Plenty of request works and get their normal status 200.
I have opentelemetry setup and I have a span for every request and I can see if there are error and when an http request happened. I have 0 error there and I don't even have a single log concerning those request that fail (I picked a few random endpoint that 502'ed and compared the timing in both logs)
The 502 happens quite often and I would love to have an idea where to start debugging it. Any idea are welcome. As I said above, it might not even be railway's fault, but I think it would be wrong to not try to share it here.
project id: 8bccf693-4059-4ef3-9dd0-55493979fdb7
service id: b4116e2b-92a3-4506-b599-b79cce0efa9a
9 Replies
Semi related thread; https://discord.com/channels/713503345364697088/1320448499996954695 I wanted to use the search bar to find what kind of endpoint are mostly failed, maybe I would be able to find a pattern there.
a year ago
as of writing, the latest deployment for that service load the http logs just fine.
Yup ! I did change something. Still unsure about the exact reason but I disabled Keep-Alive stuff from actix (some rust lib) and seems to have solved. It was probably something on my end
I'm midly annoyed that I don't know the exact cause but it's solved for now :)
a year ago
fwiw i dont think http logs would have told you what the issue was, they would have told you the path though if that could have helped?
Right, they wouldn't directly solve, but if that was a particular kind of endpoint that fail consistently, it would have let me know that something is wrong with whathever this endpoint does, if that make sense. That's why I wanted to search
a year ago
sounds good!, if this happens again, please open a new thread
a year ago
!s
Status changed to Solved brody • about 1 year ago