Redis Connection Drops After ~24 Hours
ashleyredman
PROOP

21 days ago

After making various changes, waiting 24 hours and returning, trying to diagnose with the railway agent. After days, the agent suggested opening a support ticket as it thinks the cause is coming down to internal infra. Here is all the information below:

Project ID: d2c887eb-291d-49ea-8c98-4f5622ee2288

Environment: production (0171b9b2-f47d-49c3-bc15-9051abf2fa9a)

Region: europe-west4-drams3a

Services Affected:

  • API: api.mod-sales.com (ID: 2f080c9e-645d-4871-9fb9-cbca7efd66ae)
  • Redis Cache: Cache (ID: ebee8451-4d7d-4c93-83ca-c3027cbf9134, redis:8.2.5)

Issue Description:

The Redis connection from the API becomes unreachable after approximately 24 hours of uptime, causing ECONNREFUSED errors on all cache operations. Manual restart of the Redis service temporarily resolves the issue, but it recurs after another ~24 hours.

Key Evidence:

  1. Issue reproduces with multiple client libraries: Both Bun's native redis client AND ioredis exhibit the same behavior, ruling out client-side bugs.
  2. Redis itself is healthy:
    • No errors in Redis logs
    • Keys/values visible in Railway UI (confirming other connections work)
    • Memory usage stable
    • RDB saves complete successfully
  3. Pattern is consistent: Connection drops reliably after ~24 hours, suggesting an idle connection timeout on Railway's internal networking.

Attempted Fixes (All Failed):

  • IPv6 binding on Redis (--bind 0.0.0.0 ::)
  • Increased maxclients to 100,000
  • Switched client libraries (Bun → ioredis)
  • Adjusted TCP keepAlive intervals (5000ms and 30000ms)
  • Added connection ready checks and offline queue handling

Current Configuration:

Redis start command:

redis-server --requirepass $REDIS_PASSWORD --save 60 1 --dir $RAILWAY_VOLUME_MOUNT_PATH --bind 0.0.0.0 ::

API ioredis config:

new Redis(env.REDIS_URL, {
    retryStrategy(times) { return Math.min(times * 100, 3000); },
    maxRetriesPerRequest: 3,
    keepAlive: 30000
})

Hypothesis:

Railway's internal networking layer appears to have an idle connection timeout (~24 hours) that silently drops TCP connections between services. The client libraries don't detect this and continue using the dead connection, resulting in ECONNREFUSED errors.

Request:

Please investigate whether there's an idle connection timeout policy on internal service-to-service networking in the europe-west4-drams3a region, and if so, whether it can be increased or disabled.

TLDR;

I have a hono api which sets & gets data from a redis instance (built in template, not a community one) - after around 24 hours the api can no long connect, but Redis seems in good health, the only solution is a manual restart of the redis deployment, then automatically the hono api can make connections again. Any support on this would be appreciated. Similarly, I have a postgres database and this issue does not occur with that as the api is check redis, is no data exists, its checks postgres, pretty much as a cache layer.

Thank you.

Solved$20 Bounty

3 Replies

ashleyredman
PROOP

21 days ago

Screenshot of the setup attached

Attachments


Railway
BOT

21 days ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 21 days ago


ashleyredman
PROOP

21 days ago

I'd rather this not be opened up to the community please? Thats why I created it privately


ashleyredman
PROOP

21 days ago

Can this please be turned back to private? I may have/could have posted sensitive information??


Status changed to Solved ashleyredman 15 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...