2 months ago
Hello everyone,
I am posting here to understand the logs (screen below) and what happened.
Context:
Service "Rate Limit" is a Redis:8.2.1 (base db from Railway). It is used by an API to rate API calls. After these logs, the CPU and RAM usage of the API service skyrocketed (10x normal usage) and logs stopped displaying for the API service (there are no logs until I restarted the API service manually). My first guess is that the Redis instance restarted and killed the connection with the server, making the server non-responsive (it didn't crash). The Rate Limit Redis instance doesn't have auto-update activated or backups enabled
I'm not sure to really understand what happened; if anyone could help me understand what really happened, I would appreciate it
Thanks in advance
Logs in text:
1:M 06 Jan 2026 06:43:15.044 * 1 changes in 60 seconds. Saving...
1:M 06 Jan 2026 06:43:15.045 * Background saving started by pid 12729
12729:C 06 Jan 2026 06:43:15.052 * DB saved on disk
12729: C 06 Jan 2026 06:43:15.053 * Fork Cow for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 06 Jan 2026 06:43:15.146 * Background saving terminated with success
Attachments
4 Replies
2 months ago
More context: We can clearly see that there are no logs in the screenshot between the Redis Rate Limits logs and my manual service restart
Attachments
2 months ago
those redis logs just show a normal background save , it completed successfully in 100ms. redis didn't restart (you'd see "server initialized" or "ready to accept connections" if it did)
the real issue is your api service , notice the 3 hour gap in logs then "starting container" for blitz-api. something made your api unresponsive
to figure out what actually happened, check:
railway metrics for both services during that time window (cpu/memory graphs)
does your api have connection retry logic for redis?
any error logs from your api right before 07:43?
could be a connection leak, timeout issue, or your rate limiting code not handling the brief redis save operation well. but need more logs/metrics to say for sure what made the api hang
redis itself looks fine from what you posted
2 months ago
Thanks for your response, here are more informations like you asked. I create my connection to the redis Rate Limit service on startup in the API service and there is autoReconnect on true (default) (using bun's redis client https://bun.com/docs/runtime/redis), like: const client = new RedisClient("redis://username:password@localhost:6379"); (using railway variable REDIS_URL to connect to redis)
Looked at our Otel and logs in Axiom and there is also a gap (nothing between) like in railway.
I might be wrong but a save operation should not impact active connections ?
API service metrics (spike started around ~7:43 (can't have exact time):
Rate Limit Redis metrics:
Rate Limit Redis metrics:
Redis config:
Logs timeline, no errors before, everything was normal (200 statuses):
Logs before the db save (everything looked normal - status 200):
2 months ago
Bump, I'm still looking to figure out what happened
