Experiencing Intermittent Network Issues

tistaharahap
BIZCLASS

a year ago

Hi Railway, our deployments are experiencing intermittent network issues when connecting to different services deployed also within Railway. It doesn't matter if we call other services either through their public hostname or the private hostname. For the same request, we sometimes get a 200 and sometimes it's a 503. This is making our deployments being highly unstable, we've been getting numerous monitoring downtimes because of this. I screenshot our resource usages, it's very low. Here are our projects:

https://railway.app/project/56943544-81c5-486d-8741-d8cca3f88ed1
https://railway.app/project/2f705f5a-06d6-45ac-871e-f5b0a7690fa7
https://railway.app/project/2119123b-28c6-4ba9-97e5-75be1b52dcc1
https://railway.app/project/c9986c4d-72c9-46ad-9186-160d3e9c1d44

Would this be relatable to the new TCP proxy upgrade you guys did recently? Any help is appreciated, thank you.

6 Replies

a year ago

i too got random 503's calling other railway services from within railway earlier today


a year ago

getting the team involved


a year ago

Hey there @Tista - the infra team is conducting an investigation, can you provide some timestamps for us to narrow the problem down?


digineticai
HOBBY

a year ago

Hi I have a frontend on vercel and my backend hosted on railway is also throwing a 503. It just says application failed to respond and I do not see anything in the logs!


a year ago

Hi I have a frontend on vercel and my backend hosted on railway is also throwing a 503. It just says application failed to respond and I do not see anything in the logs!

Can you also provide additional information like timestamps and project-ids?


digineticai
HOBBY

a year ago

@angelo all my requests are 503 so the server is completely down.
Last successful request I see is from 10:38.
I do not have a health endpoint configured so I am not sure when it actually started giving the error after that.
Is it possible to dm the project id?


tistaharahap
BIZCLASS

a year ago

Hey Angelo, thank you for replying. I'm on UTC +4, we've been experiencing this since 4:54AM this morning (March 8) and it's still ongoing, we're still receiving random 503s. Would appreciate the root cause through your investigation, thank you.


tistaharahap
BIZCLASS

a year ago

Hey Railway, this is also happening in your docs site


a year ago

Yep- updating, we have found the source of the affected resources, can you trigger a redeploy for your services? This will land your workload on a different resource.


tistaharahap
BIZCLASS

a year ago

All right, will redeploy now


a year ago

Checking in.


digineticai
HOBBY

a year ago

@angelo
Redploy failed for me

Container failed to start

=========================

We failed to create a container for this image.


digineticai
HOBBY

a year ago

@angelo Second redeploy fixed it. Do we know what the issue was?


a year ago

Going to leave the final investigation for the Infra team as they address and fix the issue, glad you are resolved for now.


tistaharahap
BIZCLASS

a year ago

We've restarted deployments in all of our projects, still monitoring for 503s


a year ago

have you restarted the deployments that your services are making requests to?


tistaharahap
BIZCLASS

a year ago

Yes we restarted each and every one of our services


Anonymous
TRIAL

a year ago

I've got a question related to this issue - when was the underlying issue introduced?
We've experienced same issues yesterday 7PM UTC, and after restarting the service that was unavailable, it worked fine for couple of hours. Today we're experiencing similar issues, as described above, and again - redeplyoment worked for the time being


a year ago

according to my logs the first error appeared 2024-03-07T07:50:01.818978082Z UTC aka March 7th 7:50AM UTC


a year ago

hey @Tista @andrzej | t2 the incident has now been resolved, you can read about the reasoning here https://discord.com/channels/713503345364697088/846875565357006878/1215585864286081034

if you are still experiencing this issue please do another set of redeploys.


tistaharahap
BIZCLASS

a year ago

Yeah i’ve read it, thanks for the quick turn around. I can confirm we’re no longer getting 503s.


a year ago

happy to hear that!