Question About Safe Downscaling of Worker Replicas
vishnu-mouli-102408
PROOP

a month ago

Hi,

I have a question about scaling down workers.

I am running a worker service on Railway. Each replica runs a long process that pulls jobs from Redis and processes them. Some of these jobs take a long time to finish.

I am using the GraphQL API to autoscale the replicas based on load. Scaling up works fine. But when scaling down, Railway seems to randomly stop some replicas. Sometimes it stops a replica that is still processing a job, and that job gets interrupted.

I wanted to ask:

  • Is there any graceful shutdown or draining when replicas are scaled down?

  • Is there a way to let a replica finish its current job before it is stopped?

  • Can we control which replica gets terminated, or add some delay before shutdown?

  • What is the recommended way to run long-running background workers with autoscaling on Railway?

Right now it looks like the replica is stopped immediately, which is risky for job processing. I just want to check if there is a safer way to handle this.

Thanks for your help!

Solved

2 Replies

a month ago

Railway sends a SIGTERM signal when scaling down replicas. You can tune the draining time to a value long enough to let all jobs complete via the Draining Time setting in Service Settings → Deploy, or by setting the RAILWAY_DEPLOYMENT_DRAINING_SECONDS environment variable.

At the same time, your worker should stop accepting new jobs after it has been sent a SIGTERM, thus allowing the remaining replicas to pick up incoming jobs while the old worker finishes any ongoing jobs. Then, once the old deployment finishes its jobs, it can exit.


Status changed to Awaiting User Response Railway 26 days ago


Railway
BOT

19 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 19 days ago


brody

Railway sends a SIGTERM signal when scaling down replicas. You can tune the draining time to a value long enough to let all jobs complete via the Draining Time setting in Service Settings → Deploy, or by setting the `RAILWAY_DEPLOYMENT_DRAINING_SECONDS` environment variable. At the same time, your worker should stop accepting new jobs after it has been sent a SIGTERM, thus allowing the remaining replicas to pick up incoming jobs while the old worker finishes any ongoing jobs. Then, once the old deployment finishes its jobs, it can exit.

vishnu-mouli-102408
PROOP

5 days ago

Hi, I see this. It's loading infinitely. is there something i can do.
It says scaling to 2 workers but still 8 workers are running. But in the settings it shows 2 workers.




Status changed to Awaiting Railway Response Railway 5 days ago


Status changed to Solved vishnu-mouli-102408 5 days ago


Loading...