Redis deployments temporarily crash our web server due to ETIMEDOUT
Anonymous
PROOP

5 months ago

Whenever we trigger a deployment of Redis, our Express web server temporarily becomes unresponsive to web page requests for up 2-3 minutes sometimes (page requests just hang, and then fail).

When looking at the Redis deploy logs for the 'old' and 'new' deployments, it appears as if the timing of old service starting and the new service stopping is exactly in sync (see time stamps in first and second screenshot respectively).

However, our web server becomes unresponsive to page requests from this point onwards for anywhere up to 3 minutes sometimes.

  • I can see that roughly 2 minutes after the Redis deployment we start seeing ETIMEDOUT errors from socket.io-redis. These cause the server to crash and then get automatically restarted (see screenshot).

Socket-io works fine for us outside of this redis redeployment window. i.e. socket-io does eventually connect correctly and works fine (we make sure that we are connecting with IPv6 in line with Railway's docs).

What could be causing this temporary outage and the temporary ETIMEDOUT errors during Redis deployments, and how might we fix it? Thank you!

Links to deployments from which screenshots above came:

Solved

3 Replies

The ETIMEDOUT errors and temporary outages you're experiencing during Redis deployments are likely related to how the service transition occurs between the old and new deployments. When a new deployment occurs, the previous version is usually stopped and removed, which can lead to some downtime if the transition isn't handled smoothly.

To address this, you might want to explore the deployment lifecycle settings available in Railway.

Specifically, you can configure an overlap time where the old deployment is kept active for a short period after the new one is live. This can help ensure zero downtime by allowing connections to gracefully transition to the new deployment without interruption. You can adjust these settings in the service configuration section of your Railway project.

Additionally, ensure that your Express server and Socket.io are configured to handle potential reconnections gracefully. Since Socket.io is used for real-time communications, any momentary disconnection might cause these timeout errors. You might also want to verify that your environment is set up to use the private network for internal communication, as this can reduce latency and connection issues during such transitions.

For more detailed guidance, you can refer to Railway's documentation on deployment teardown, which covers these configuration options. If you're using the private network, ensure your configurations align with the private networking guide.


Status changed to Awaiting User Response Railway 5 months ago


Anonymous
PROOP

5 months ago

Thanks for your help on this

You might also want to verify that your environment is set up to use the private network for internal communication

  • Our web app uses the host redis.railway.internal to connect to Redis – is this what you mean?

In the teardown deployment docs, could I ask what meant by this line "Once the new deployment is active, the previous deployment remains active for a configurable amount of time." in combination with what you've written above "Specifically, you can configure an overlap time where the old deployment is kept active for a short period after the new one is live." Specifically:

  • During the overlap window, which deployment is actually the one accepting the requests behind redis.railway.internal – the new or the old deployment?

  • It's hard to tell from the Railway docs as a deployment being Active appears to be equivalent to being live / receiving the traffic.


Status changed to Awaiting Railway Response Railway 5 months ago


Noted!

1. Yes.
2. Active means that it's alive but not accepting new connections.
3. Noted, I have forwarded this feedback to the engineer

Regardless, let me know if it helps.


Status changed to Awaiting User Response Railway 5 months ago


Railway
BOT

4 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 4 months ago


Loading...