Multi-region service outage

9 months ago

My APIs are in node.js everything is working like a charm, but today we have outage in out betterstack(shorterloopstatus.com) its for some regions. One thing I changed is added replicas- amsterdam, NL (metal) and Virgenia(Metal).

New York, USAError opening https://api.shorterloop.com: curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received
San Francisco, USAError opening https://api.shorterloop.com: curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received

Singapore Error opening https://api.shorterloop.com: curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received

Sydney, AustraliaError opening https://api.shorterloop.com: curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received

Tokyo, JapanError opening https://api.shorterloop.com: curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received
Dallas, USAError opening https://api.shorterloop.com: curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received

Works:-

Amsterdam, Netherlands18 ms21 ms0 ms106 ms107 ms112 ms112 ms147 B1.3 KB/s66.33.22.2
Frankfurt, Germany15 ms23 ms0 ms106 ms106 ms130 ms130 ms147 B1.1 KB/s66.33.22.3
London, United Kingdom20 ms29 ms0 ms127 ms128 ms143 ms143 ms147 B1022 B/s66.33.22.4

Please help to resolve this issue!!

Solved

10 Replies

9 months ago

Removed : US East(Virginia, USA) Metal
Fixed all the issue, Attached pdf for your references.
Its for sure: The new Railway replicas (metal) aren't properly handling global traffic or are not registered in the load balancer/edge correctly.
It serves traffic in their own region. The other regions may route to a dead/non-listening replica, resulting in curl: (28) Operation timed ou.

I kept Amsterdam only, until everything works again, and removed Virginia and any other metal replicas. Everything worked!

Can this be fixed, Railway?

Attachments


Hey there,

Clarifying question, did the outage occur when you deployed the instances or you are noticing that all global traffic is failing to serve.

Is there a staging environment we can test against for a reproducible example?


Status changed to Awaiting User Response Railway 9 months ago


angelo-railway

Hey there,Clarifying question, did the outage occur when you deployed the instances or you are noticing that all global traffic is failing to serve.Is there a staging environment we can test against for a reproducible example?

9 months ago

It happened after adding replica and deployin!


Status changed to Awaiting Railway Response Railway 9 months ago


Railway
BOT

9 months ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!

Status changed to Awaiting User Response Railway 9 months ago


9 months ago

Is this consistently reproducible with the US East region for you? And does it happen immediately on deploy

(I've escalated to our network expert)


20k-ultra
EMPLOYEE

9 months ago

Hello, I tried reproducing the error by configuring an application with EU West and US East but did not see any errors routing to my application.

Can you try again ? You can try to reproduce in a development environment if you are concerned about having any impact in your production environment.

In the past 24 hours our monitors have not detected any issues and this is the first report of such an error.

Let us know if the issue persists.


20k-ultra
EMPLOYEE

9 months ago

You can also perform your curl request with -vvv to include more information about which route is used.


20k-ultra
EMPLOYEE

9 months ago

I see some HTTP requests on the us-east4 application that has been removed.

https://railway.com/project/2394abc5-3936-4f8d-b3a3-920ee7f91835/service/3bd9b3e0-0ee6-448b-8625-15d421cf2e7a?environmentId=715f26ad-9a53-4177-90b6-808425724c64&id=179348be-9c76-4507-8378-c64b0a342024#http

I am investigating but it looks like requests were sent to this application but they took 5 minutes to complete and your curl request gives up after 30 seconds.


20k-ultra
EMPLOYEE

9 months ago

When someone makes a request to https://api.shorterloop.com/ does that service make another HTTP request to something else ? does it make a request to another service on railway ?


8 months ago

Yes, mysql database!


Status changed to Awaiting Railway Response Railway 8 months ago


20k-ultra

You can also perform your curl request with -vvv to include more information about which route is used.

8 months ago

Could you get the data Mig has requested here?


Status changed to Awaiting User Response Railway 8 months ago


Status changed to Solved parmstar 7 months ago


Loading...