2 months ago
2 of my celery workers connected to my redis stopped receiving tasks, I had to redeploy them to get them to receive tasks, could this be related to the outage that just happened? It hasn't happened in months.
4 Replies
2 months ago
i think so -- it was specifically the redis container that got a SIGTERM that crashed our CRM. Other containers and projects that were not using Redis seemed to carry on without issue (like a simple node webhook receiver service just kept humming along). except in cases where some of our apps that crashed on redis k.o. ran out of restart budget due to the service not being up in time. **note to self: have unlimited retries on a prod service (!): try explaining this to a customer that runs a contact center:
- `2026-03-25 10:53:30Z`: Railway Redis received `SIGTERM` and shut down
- `2026-03-25 10:53:35Z`: `GBCRM` crashed on a Redis socket disconnect
- `2026-03-25 10:53:42Z`: `GBCRM Worker` crashed on the same failure mode
- `2026-03-25 11:04:46Z`: Redis returned
- `2026-03-25 11:47:36Z`: user-facing web service was healthy again after manual redeploy and rebuild2 months ago
Hi, is there any update on this?