Railway seems to not shut down previous deployments even after a new build & deployment has occured

dalechyn

PROOP

2 years ago

We have experienced a very weird issue after pushing a new commit to our service that uses Twitter's Filtered Streams.

This might sound as "out-of-context", but: Twitter allows only one consumer to listen to filtered streams.

After the deployment, we noticed a shutdown of the service from 04 AM GMT+2 and 14 hours after.

I have tried to fix the issue and thought it might be code related, although I did not change any integration logic.

And just now, like 5 minutes ago, I have pushed some commits to another service, our website's backend.

It had an issue that would fail during the deployment. I fixed it.

However, I see logs both from the previous deployment and from the current deployment.

Solved

59 Replies

dalechyn

PROOP

2 years ago

One important notice is that we have "Always restart" policy

dalechyn

PROOP

2 years ago

And when the new deployment kicks in, it seems not to shut down the previous one immediately – it feels like it hangs on for random amount of time

dalechyn

PROOP

2 years ago

1318687036491956357

dalechyn

PROOP

2 years ago

here's the twitter issue i mentioned at the start

1318687665973104660

dalechyn

PROOP

2 years ago

although the deployment was like several ahours prior to the start of that long red-blue candle dance, it seems to me that previous failing deployment decided to boot up at 4 am

dalechyn

PROOP

2 years ago

or was never off

dalechyn

PROOP

2 years ago

service 0b9228e6-0528-4f63-bc97-157a8909cc2b

brody

EMPLOYEE

2 years ago

project id, service id, and environment please

dalechyn

PROOP

2 years ago

project id 3e6a2b9c-4e34-41f8-979a-b83b27f3198d

dalechyn

PROOP

2 years ago

environment mainnet

dalechyn

PROOP

2 years ago

it's for this one

dalechyn

PROOP

2 years ago

for this one,

project id 3e6a2b9c-4e34-41f8-979a-b83b27f3198d

service 0e389c2b-a588-46d2-a58e-b4b76fce4613

environment testnet

brody

EMPLOYEE

2 years ago

im sorry but the issue is not clear, you opened this thread and reported issues with two services?

dalechyn

PROOP

2 years ago

since it has occured two times with different services and different environments (exactly where i pushed new code and caused new deployments to roll out), it made me think something is wrong with railway

dalechyn

PROOP

2 years ago

yes, you can check any

brody

EMPLOYEE

2 years ago

same issue for both services?

dalechyn

PROOP

2 years ago

most likely – previous deployment kept running with the new (current) one for too long before stopping

dalechyn

PROOP

2 years ago

for this case this was literally hours

dalechyn

PROOP

2 years ago

for this case this was 7 minutes

dalechyn

PROOP

2 years ago

the twitter service literally worked without no flaws for the last two weeks, no twitter integration part was changed

dalechyn

PROOP

2 years ago

i pushed a new commit to tune the ai prompt and here we are

brody

EMPLOYEE

2 years ago

you are giving a lot of unorganized information all at once here

dalechyn

PROOP

2 years ago

ok, guide me, what do you need exactly to investigate the issue further?

brody

EMPLOYEE

2 years ago

lets focus on one service at a time, what service would you like me to look into first?

dalechyn

PROOP

2 years ago

let's look at this one

brody

EMPLOYEE

2 years ago

okay, can you provide a full UTC timestamp of when you made a new deployment, and the old deployment didnt get killed?

drmarshall

PRO

2 years ago

Confirming that old services are not being removed properly. It seems like new deployments are not picking up network properly

brody

EMPLOYEE

2 years ago

DrMarshall,

Please open your own thread.

dalechyn

PROOP

2 years ago

At Dec 16 21:56 UTC I have pushed the commit

dalechyn

PROOP

2 years ago

1734386160

dalechyn

PROOP

2 years ago

At Dec 17 02:00 UTC our service started failing – this is quite common in case if our twitter service has a load and has to fight rate limits. there was no load.

the specific details I provided above about twitter are important since twitter api allows only one stream consumer at a time.

it was throwing "Too Many Connections" error, flagging that someone else was consuming the stream – supposingly another replica

brody

EMPLOYEE

2 years ago

i can see very spotty metrics during that time, indicating it crash looping

brody

EMPLOYEE

2 years ago

but i also see only a single deployment running during that time, at least for the given service and environment id

dalechyn

PROOP

2 years ago

yes – this is something I architected on purpose so it fails occasionally to bear through rate limits

dalechyn

PROOP

2 years ago

well then I'd suggest to check the second case

brody

EMPLOYEE

2 years ago

will do

dalechyn

PROOP

2 years ago

with this details

brody

EMPLOYEE

2 years ago

same time stamp?

dalechyn

PROOP

2 years ago

Dec 17 8:48PM - pushed first commit to fix the issue

Dec 17 8:52PM - pushed the "last fix" 🙂

the previous container kept running for at least another 4 minutes (if 8:55PM is the actual time of the "last fix" deployment container start) (i'm in gmt+2)

1318698529484701749