Server stop responding randomly
ibadus
PROOP

2 months ago

Hello,

I posted a ticket few weeks ago and never had a response.

Had no issues until yesterday, our production server keeps on getting non responsive without no real reason (cf old ticket, link below). Decided to upgrade our plan to PRO to see if it would fix anything, and it just made everything worse. Our production server went down 4 times since yesterday evening (CET).

This situation cannot continue for us, we'd really appreciate your help.

Old ticket:

https://station.railway.com/questions/understanding-unwanted-redis-restart-814f1f2a

To recap the issue. Seems like my server gets disconnected from the redis instance (on railway) when the Redis instance backups

Let us know if there's anything we can share to help resolve this asap

$10 Bounty

7 Replies

ibadus
PROOP

2 months ago

Logs attached showing that the error comes most likely from Redis

Attachments


2 months ago

Do you have a link to a deployment where this happened?

I checked some hosts your deploys were on and there was nothing abnormal about them with all other user workloads operating normally. There is no infrastructure-level evidence suggesting this is an issue on our end as far as I can tell.

If your deployment is "non-responsive" without any logs etc. and requests are 502-ing, it's highly likely your app has crashed or is stuck in some infinite loop. If it crashes without exiting with a proper error code (e.g. 1), we don't automatically restart it (because we'd have no way to know whether it's crashed or running).


Status changed to Awaiting User Response Railway about 2 months ago


ibadus
PROOP

2 months ago

Thanks for confirming. We identified the issue was our Redis client not having a reconnection strategy configured

Although it was not tribal, as it was not documented, that keepalive connections should be expected to fail at anytime on Railway

A transient socket close (likely from idle timeout or network hiccup) killed our Redis connections permanently, leaving the app unresponsive without crashing

We've implemented a reconnection logic and TCP keepalives

Hopefully should be resolved now

Thanks for the quick answer!


Status changed to Awaiting Railway Response Railway about 2 months ago


Status changed to Solved ibadus about 2 months ago


ibadus
PROOP

a month ago

Hey, coming back as the retry logic didn't fix the issue. We still have the issue even with a reconnection strategy on the redis connexion.
Below the TypeScript code we used to make the redis connection (using the Private Networking from Railway):

import IORedis from "ioredis";
import { env } from "@/lib/env";

const createRedisClient = (url: string, name: string) => {
  const client = new IORedis(url, {
    maxRetriesPerRequest: null,
    enableReadyCheck: true,
    retryStrategy: (times) => {
      if (times > 10) {
        console.error(`[Redis:${name}] Max reconnection attempts reached`);
        return null;
      }
      const delay = Math.min(times * 100, 3000);
      console.warn(`[Redis:${name}] Reconnecting in ${delay}ms (attempt ${times})`);
      return delay;
    },
    reconnectOnError: (err) => {
      const targetErrors = ["READONLY", "ECONNRESET", "ETIMEDOUT", "ECONNREFUSED"];
      return targetErrors.some((e) => err.message.includes(e));
    },
  });

  client.on("error", (err) => console.error(`[Redis:${name}] Error:`, err.message));
  client.on("reconnecting", (ms: number) => console.warn(`[Redis:${name}] Reconnecting in ${ms}ms`));
  client.on("ready", () => console.log(`[Redis:${name}] Connected`));

  return client;
};

export const redisRateLimiter = createRedisClient(env.RATE_LIMIT_REDIS_URL, "rate-limiter");
export const api_key_redis = createRedisClient(env.API_KEYS_REDIS_URL, "api-keys");

Status changed to Awaiting Railway Response Railway about 2 months ago


ibadus
PROOP

a month ago

Attached the logs. Also it's very strange that our application receives no requests, but railway observability shows 5XX/4XX errors on the dashboard (even tho the server receives nothing)

Attachments


ibadus
PROOP

a month ago

Whole app (blitz-api) was down from 25 jan at ~10:50 PM to 26 jan ~9:03 (manual restart)


ibadus
PROOP

a month ago

Attached some tests done on localhost by killing the redis instance to verify the retry logic:

Started development server: http://localhost:3000
[Redis:subscriber] Connected
[Redis:api-keys] Connected
[Redis:rate-limiter] Connected
[2026-01-26T09:50:00.023Z] info: --> POST /v2/enrichment/email 200 231ms
[Redis:rate-limiter] Reconnecting in 100ms (attempt 1)
[Redis:rate-limiter] Reconnecting in 100ms
[2026-01-26T09:50:37.071Z] info: <-- POST /v2/enrichment/email
[Redis:rate-limiter] Reconnecting in 200ms (attempt 2)
[Redis:rate-limiter] Reconnecting in 200ms
[Redis:rate-limiter] Reconnecting in 300ms (attempt 3)
[Redis:rate-limiter] Reconnecting in 300ms
[Redis:rate-limiter] Reconnecting in 400ms (attempt 4)
[Redis:rate-limiter] Reconnecting in 400ms
[Redis:rate-limiter] Reconnecting in 500ms (attempt 5)
[Redis:rate-limiter] Reconnecting in 500ms
[Redis:rate-limiter] Reconnecting in 600ms (attempt 6)
[Redis:rate-limiter] Reconnecting in 600ms
[Redis:rate-limiter] Reconnecting in 700ms (attempt 7)
[Redis:rate-limiter] Reconnecting in 700ms
[Redis:rate-limiter] Reconnecting in 800ms (attempt 8)
[Redis:rate-limiter] Reconnecting in 800ms
[Redis:rate-limiter] Reconnecting in 900ms (attempt 9)
[Redis:rate-limiter] Reconnecting in 900ms
[2026-01-26T09:50:46.692Z] info: <-- POST /v2/enrichment/email
[Redis:rate-limiter] Reconnecting in 1000ms (attempt 10)
[Redis:rate-limiter] Reconnecting in 1000ms
[Redis:rate-limiter] Max reconnection attempts reached
[2026-01-26T09:50:48.451Z] error: Connection is closed.
[2026-01-26T09:50:48.451Z] error: Connection is closed.
[2026-01-26T09:50:48.469Z] info: --> POST /v2/enrichment/email 500 11s
[2026-01-26T09:50:48.469Z] info: --> POST /v2/enrichment/email 500 2s
[2026-01-26T09:50:53.956Z] info: <-- POST /v2/enrichment/email
[2026-01-26T09:50:53.989Z] error: Connection is closed.
[2026-01-26T09:50:53.993Z] info: --> POST /v2/enrichment/email 500 36ms
[2026-01-26T09:51:00.020Z] info: <-- POST /v2/enrichment/email
[2026-01-26T09:51:00.044Z] error: Connection is closed.
[2026-01-26T09:51:00.046Z] info: --> POST /v2/enrichment/email 500 25ms
[2026-01-26T09:53:04.075Z] info: <-- POST /v2/enrichment/email
[2026-01-26T09:53:04.112Z] error: Connection is closed.
[2026-01-26T09:53:04.115Z] info: --> POST /v2/enrichment/email 500 39ms

Railway
BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway about 1 month ago


Loading...