Hi, I have a question about scaling down workers. I am running a worker service on Railway. Each replica runs a long process that pulls jobs from Redis and processes them. Some of these jobs take a long time to finish. I am using the GraphQL API to autoscale the replicas based on load. Scaling up works fine. But when scaling down, Railway seems to randomly stop some replicas. Sometimes it stops a replica that is still processing a job, and that job gets interrupted. I wanted to ask: * Is there any graceful shutdown or draining when replicas are scaled down? * Is there a way to let a replica finish its current job before it is stopped? * Can we control which replica gets terminated, or add some delay before shutdown? * What is the recommended way to run long-running background workers with autoscaling on Railway? Right now it looks like the replica is stopped immediately, which is risky for job processing. I just want to check if there is a safer way to handle this. Thanks for your help!

Question About Safe Downscaling of Worker Replicas

vishnu-mouli-102408

PROOP

4 months ago

Hi,

I have a question about scaling down workers.

I am running a worker service on Railway. Each replica runs a long process that pulls jobs from Redis and processes them. Some of these jobs take a long time to finish.

I am using the GraphQL API to autoscale the replicas based on load. Scaling up works fine. But when scaling down, Railway seems to randomly stop some replicas. Sometimes it stops a replica that is still processing a job, and that job gets interrupted.

I wanted to ask:

Is there any graceful shutdown or draining when replicas are scaled down?
Is there a way to let a replica finish its current job before it is stopped?
Can we control which replica gets terminated, or add some delay before shutdown?
What is the recommended way to run long-running background workers with autoscaling on Railway?

Right now it looks like the replica is stopped immediately, which is risky for job processing. I just want to check if there is a safer way to handle this.

Thanks for your help!

Solved

2 Replies

brody

EMPLOYEE

4 months ago

At the same time, your worker should stop accepting new jobs after it has been sent a SIGTERM, thus allowing the remaining replicas to pick up incoming jobs while the old worker finishes any ongoing jobs. Then, once the old deployment finishes its jobs, it can exit.

Status changed to Awaiting User Response Railway • 4 months ago

Railway

BOT

4 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway • 4 months ago

brody

Railway sends a SIGTERM signal when scaling down replicas. You can tune the draining time to a value long enough to let all jobs complete via the Draining Time setting in Service Settings → Deploy, or by setting the `RAILWAY_DEPLOYMENT_DRAINING_SECONDS` environment variable. At the same time, your worker should stop accepting new jobs after it has been sent a SIGTERM, thus allowing the remaining replicas to pick up incoming jobs while the old worker finishes any ongoing jobs. Then, once the old deployment finishes its jobs, it can exit.

vishnu-mouli-102408

PROOP

4 months ago

Hi, I see this. It's loading infinitely. is there something i can do.

It says scaling to 2 workers but still 8 workers are running. But in the settings it shows 2 workers.

Attachments

image.png

Status changed to Awaiting Railway Response Railway • 4 months ago

Status changed to Solved vishnu-mouli-102408 • 4 months ago

Welcome!