Experiencing Intermittent Network Issues

2 years ago

i too got random 503's calling other railway services from within railway earlier today

2 years ago

getting the team involved

2 years ago

Hey there @Tista - the infra team is conducting an investigation, can you provide some timestamps for us to narrow the problem down?

digineticai

HOBBY

2 years ago

Hi I have a frontend on vercel and my backend hosted on railway is also throwing a 503. It just says application failed to respond and I do not see anything in the logs!

2 years ago

Hi I have a frontend on vercel and my backend hosted on railway is also throwing a 503. It just says application failed to respond and I do not see anything in the logs!

Can you also provide additional information like timestamps and project-ids?

digineticai

HOBBY

2 years ago

@angelo all my requests are 503 so the server is completely down.

Last successful request I see is from 10:38.

I do not have a health endpoint configured so I am not sure when it actually started giving the error after that.

Is it possible to dm the project id?

tistaharahap

ENTERPRISEOP

2 years ago

Hey Angelo, thank you for replying. I'm on UTC +4, we've been experiencing this since 4:54AM this morning (March 8) and it's still ongoing, we're still receiving random 503s. Would appreciate the root cause through your investigation, thank you.

tistaharahap

ENTERPRISEOP

2 years ago

Hey Railway, this is also happening in your docs site

2 years ago

Yep- updating, we have found the source of the affected resources, can you trigger a redeploy for your services? This will land your workload on a different resource.

tistaharahap

ENTERPRISEOP

2 years ago

All right, will redeploy now

2 years ago

Checking in.

digineticai

HOBBY

2 years ago

@angelo

Redploy failed for me

Container failed to start

=========================

We failed to create a container for this image.

digineticai

HOBBY

2 years ago

@angelo Second redeploy fixed it. Do we know what the issue was?

2 years ago

Going to leave the final investigation for the Infra team as they address and fix the issue, glad you are resolved for now.

tistaharahap

ENTERPRISEOP

2 years ago

We've restarted deployments in all of our projects, still monitoring for 503s

2 years ago

have you restarted the deployments that your services are making requests to?

tistaharahap

ENTERPRISEOP

2 years ago

Yes we restarted each and every one of our services

Anonymous

TRIAL

2 years ago

I've got a question related to this issue - when was the underlying issue introduced?

We've experienced same issues yesterday 7PM UTC, and after restarting the service that was unavailable, it worked fine for couple of hours. Today we're experiencing similar issues, as described above, and again - redeplyoment worked for the time being

2 years ago

according to my logs the first error appeared 2024-03-07T07:50:01.818978082Z UTC aka March 7th 7:50AM UTC

2 years ago

hey @Tista @andrzej | t2 the incident has now been resolved, you can read about the reasoning here https://discord.com/channels/713503345364697088/846875565357006878/1215585864286081034

if you are still experiencing this issue please do another set of redeploys.

tistaharahap

ENTERPRISEOP

2 years ago

Yeah i’ve read it, thanks for the quick turn around. I can confirm we’re no longer getting 503s.