2 of my celery workers issue

andremaytorena

PROOP

3 months ago

2 of my celery workers connected to my redis stopped receiving tasks, I had to redeploy them to get them to receive tasks, could this be related to the outage that just happened? It hasn't happened in months.

$20 Bounty

4 Replies

mlenarciak

PRO

3 months ago

i think so -- it was specifically the redis container that got a SIGTERM that crashed our CRM. Other containers and projects that were not using Redis seemed to carry on without issue (like a simple node webhook receiver service just kept humming along). except in cases where some of our apps that crashed on redis k.o. ran out of restart budget due to the service not being up in time. **note to self: have unlimited retries on a prod service (!): try explaining this to a customer that runs a contact center:

- `2026-03-25 10:53:30Z`: Railway Redis received `SIGTERM` and shut down
- `2026-03-25 10:53:35Z`: `GBCRM` crashed on a Redis socket disconnect
- `2026-03-25 10:53:42Z`: `GBCRM Worker` crashed on the same failure mode
- `2026-03-25 11:04:46Z`: Redis returned
- `2026-03-25 11:47:36Z`: user-facing web service was healthy again after manual redeploy and rebuild

andremaytorena

PROOP

3 months ago

is it possible to send a notification on this? or a fix for this?

andremaytorena

PROOP

3 months ago

Because its not ideal not knowing about this

andremaytorena

PROOP

3 months ago

Hi, is there any update on this?

Welcome!