Clarify restart semantics when restart is set to "always"
rewbs
PROOP

7 months ago

Hi!

I understand from other threads that you restrict consecutive restarts to avoid restart spinning. However the semantics of this are unclear to me.

Is that 10 restarts that fail to come up, or 10 restarts within a time window? If so, what is the time window? Are any further restart attempts made, e.g. with a backoff strategy?

I'll add that my reading of the current documentation at https://docs.railway.com/guides/restart-policy is that for paid plans, "always" is not limited to 10 restarts.

Solved

4 Replies

Railway
BOT

7 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


Always means that we'll keep on attempting restart with an exponential backoff.


Status changed to Awaiting User Response Railway 7 months ago


rewbs
PROOP

7 months ago

Thanks. I'd love a little more detail on this. For example, how long does the app need to be up before the backoff resets? Do you always restart within the same replica or do you (sometimes) attempt full replica replacements? Do you emit information anywhere about when restarts were attempted or do you rely on us collecting that application side? The reason for the last question is that from our side it did not look like an exponential backoff, but rather a number of near-immediate restarts followed by a multi-hour stopped state (that ended when we restarted manually).


Status changed to Awaiting Railway Response Railway 7 months ago


Absolutely.

With this policy, Railway attempts to restart your service every time it stops, regardless of the reason. While there isn't a strict time window for restarts, an exponential backoff strategy is used, meaning the time between restart attempts increases incrementally to avoid a rapid restart loop.

To reset the backoff, the service needs to remain up and running for a period of time without crashing. If a service fails multiple times in quick succession, it might appear as if it stops, but this is typically a result of the backoff strategy coming into play to minimize resource wastage and potential further issues.

Regarding replica replacements, restarts within a single replica are generally attempted first. However, if issues persist, a full replica replacement might be considered, especially if additional troubleshooting indicates it's necessary.

As for logging, information about restarts isn't automatically logged within Railway's platform, so it's advisable to implement logging within your application to track restart attempts and any associated errors.


Status changed to Awaiting User Response Railway 7 months ago


Railway
BOT

6 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 6 months ago


Loading...