Understanding unwanted Redis restart

ibadus

PROOP

6 months ago

Hello everyone,

I am posting here to understand the logs (screen below) and what happened.

Context:

Service "Rate Limit" is a Redis:8.2.1 (base db from Railway). It is used by an API to rate API calls. After these logs, the CPU and RAM usage of the API service skyrocketed (10x normal usage) and logs stopped displaying for the API service (there are no logs until I restarted the API service manually). My first guess is that the Redis instance restarted and killed the connection with the server, making the server non-responsive (it didn't crash). The Rate Limit Redis instance doesn't have auto-update activated or backups enabled

I'm not sure to really understand what happened; if anyone could help me understand what really happened, I would appreciate it

Thanks in advance

Logs in text:

1:M 06 Jan 2026 06:43:15.044 * 1 changes in 60 seconds. Saving...

1:M 06 Jan 2026 06:43:15.045 * Background saving started by pid 12729

12729:C 06 Jan 2026 06:43:15.052 * DB saved on disk

12729: C 06 Jan 2026 06:43:15.053 * Fork Cow for RDB: current 0 MB, peak 0 MB, average 0 MB

1:M 06 Jan 2026 06:43:15.146 * Background saving terminated with success

Attachments

CleanShot%2...

$10 Bounty

4 Replies

ibadus

PROOP

6 months ago

More context: We can clearly see that there are no logs in the screenshot between the Redis Rate Limits logs and my manual service restart

Attachments

CleanShot%2...

ilyassbreth

FREE

6 months ago

those redis logs just show a normal background save , it completed successfully in 100ms. redis didn't restart (you'd see "server initialized" or "ready to accept connections" if it did)

the real issue is your api service , notice the 3 hour gap in logs then "starting container" for blitz-api. something made your api unresponsive

to figure out what actually happened, check:

railway metrics for both services during that time window (cpu/memory graphs)
does your api have connection retry logic for redis?
any error logs from your api right before 07:43?

could be a connection leak, timeout issue, or your rate limiting code not handling the brief redis save operation well. but need more logs/metrics to say for sure what made the api hang

redis itself looks fine from what you posted

ibadus

PROOP

6 months ago

Thanks for your response, here are more informations like you asked. I create my connection to the redis Rate Limit service on startup in the API service and there is autoReconnect on true (default) (using bun's redis client https://bun.com/docs/runtime/redis), like: const client = new RedisClient("redis://username:password@localhost:6379"); (using railway variable REDIS_URL to connect to redis)

Looked at our Otel and logs in Axiom and there is also a gap (nothing between) like in railway.

I might be wrong but a save operation should not impact active connections ?

API service metrics (spike started around ~7:43 (can't have exact time):

Rate Limit Redis metrics:

Redis config:

Logs timeline, no errors before, everything was normal (200 statuses):

Logs before the db save (everything looked normal - status 200):

ibadus

PROOP

6 months ago

Bump, I'm still looking to figure out what happened

Welcome!