**Evidence this is a networking/Redis-path problem, not my code:** 1. `/health` endpoint (does not touch Redis) responds in \~500ms consistently: ``` $ time curl -s -o /dev/null -w "Total: %{time_total}s\n" https://api.myapp.com/health Total: 0.475s Total: 0.488s Total: 0.715s ``` 1. `/nonexistent` (a 404 that passes through my rate-limit middleware which calls Redis `INCR`) takes 11-15 seconds: ``` $ time curl -s -o /dev/null -w "Total: %{time_total}s\n" https://api.myapp.com/nonexistent Total: 11.355s Total: 15.348s Total: 15.747s Total: 15.563s ``` The only difference between these two paths is one Redis call. 1. Redis itself appears healthy in the Redis service logs: * `Ready to accept connections tcp` * Memory < 1.2 MB with 7 keys loaded * CPU \~0% * No OOM, no MISCONF, no slow-log entries * I restarted/redeployed the Redis service — no change. 2. Backend request timing middleware log excerpt (every line is a single request): ``` [req] POST /refresh 200 12315ms [req] POST /refresh 200 13054ms [req] GET /active 200 14699ms [req] POST /refresh 200 14866ms [req] GET /active 200 14785ms [req] GET /e0a40dca-... 200 14733ms [req] POST /refresh 200 10645ms [req] POST /refresh 200 11484ms [req] POST /refresh 200 11032ms [req] GET /nonexistent 404 14239ms [req] GET /.env 404 10949ms ``` Even 404 responses for random nonexistent URLs take 10-14 seconds because they still pass through the rate-limit middleware. 1. I ruled out: * Postgres slowness (`pg_stat_activity` shows no stuck queries, tables are tiny — 6 rows in `refresh_tokens`) * Query inefficiency (dropped LATERAL joins, added indexes — no change) * Cold starts (Hobby plan, sustained across many requests) * Sentry overhead (removed entirely — no change) * Concurrent request stampede (added dedup on `/auth/refresh` — verified 1 call, still slow) * Rate-limit misconfiguration (simple `INCR` \+ `EXPIRE` per request, tested on a tiny dataset) 2. The app has only 2 test users. Redis has 7 keys. There is zero load. **What I'd like help with:** * Is there a known issue with `redis.railway.internal` or private networking in my region? * Can you confirm traffic between my backend and Redis services is routing correctly? * Any diagnostics from your side that show latency or packet drops on my services' private network? **Update:** Confirmed slowness is NOT specific to private networking. I switched my backend from `redis.railway.internal:6547` to the public proxy URL `trolley.proxy.rlwy.net:558754` and latency is identical (\~11-15s per request). OPTIONS preflight requests (which short-circuit before my rate-limit middleware) complete in \~500ms. GET/POST requests (which do one Redis INCR via ioredis) take 11-15 seconds. Either my backend's connection to Redis has an internal issue (ioredis auto-reconnect storm?) or there's something weirder going on with traffic from this specific backend service.

Redis private networking latency in project myapp

uvelirtuta

HOBBYOP

2 months ago

Evidence this is a networking/Redis-path problem, not my code:

/health endpoint (does not touch Redis) responds in ~500ms consistently:

   $ time curl -s -o /dev/null -w "Total: %{time_total}s\n" https://api.myapp.com/health
   Total: 0.475s
   Total: 0.488s
   Total: 0.715s

/nonexistent (a 404 that passes through my rate-limit middleware which calls Redis INCR) takes 11-15 seconds:

   $ time curl -s -o /dev/null -w "Total: %{time_total}s\n" https://api.myapp.com/nonexistent
   Total: 11.355s
   Total: 15.348s
   Total: 15.747s
   Total: 15.563s

The only difference between these two paths is one Redis call.

Redis itself appears healthy in the Redis service logs:
- Ready to accept connections tcp
- Memory < 1.2 MB with 7 keys loaded
- CPU ~0%
- No OOM, no MISCONF, no slow-log entries
- I restarted/redeployed the Redis service — no change.
Backend request timing middleware log excerpt (every line is a single request):

   [req] POST /refresh 200 12315ms
   [req] POST /refresh 200 13054ms
   [req] GET /active 200 14699ms
   [req] POST /refresh 200 14866ms
   [req] GET /active 200 14785ms
   [req] GET /e0a40dca-... 200 14733ms
   [req] POST /refresh 200 10645ms
   [req] POST /refresh 200 11484ms
   [req] POST /refresh 200 11032ms
   [req] GET /nonexistent 404 14239ms
   [req] GET /.env 404 10949ms

Even 404 responses for random nonexistent URLs take 10-14 seconds because they still pass through the rate-limit middleware.

I ruled out:
- Postgres slowness (pg_stat_activity shows no stuck queries, tables are tiny — 6 rows in refresh_tokens)
- Query inefficiency (dropped LATERAL joins, added indexes — no change)
- Cold starts (Hobby plan, sustained across many requests)
- Sentry overhead (removed entirely — no change)
- Concurrent request stampede (added dedup on /auth/refresh — verified 1 call, still slow)
- Rate-limit misconfiguration (simple INCR + EXPIRE per request, tested on a tiny dataset)
The app has only 2 test users. Redis has 7 keys. There is zero load.

What I'd like help with:

Is there a known issue with redis.railway.internal or private networking in my region?
Can you confirm traffic between my backend and Redis services is routing correctly?
Any diagnostics from your side that show latency or packet drops on my services' private network?

Update: Confirmed slowness is NOT specific to private networking. I switched my backend from redis.railway.internal:6547 to the public proxy URL trolley.proxy.rlwy.net:558754 and latency is identical (~11-15s per request). OPTIONS preflight requests (which short-circuit before my rate-limit middleware) complete in ~500ms. GET/POST requests (which do one Redis INCR via ioredis) take 11-15 seconds. Either my backend's connection to Redis has an internal issue (ioredis auto-reconnect storm?) or there's something weirder going on with traffic from this specific backend service.

Attachments

Screenshot%...

$10 Bounty

5 Replies

Status changed to Open Railway • 2 months ago

bilalnawaz072

TRIAL

2 months ago

Seems like latency issue is due Redis Connection. You might be creating a new Redis connection for every single API request.
1. Creating a TCP handshake + TLS wrap for every request can easily take few seconds under load.
2. If you're using frameworks like Nextjs or a serverless-style backend, then you need to initialize Radis globally. The Fix: Globalize your Redis client instance so it is reused across requests.

bilalnawaz072

* Seems like latency issue is due Redis Connection. You might be creating a **new Redis connection** for every single API request. 1. Creating a TCP handshake + TLS wrap for every request can easily take few seconds under load. 2. If you're using frameworks like Nextjs or a serverless-style backend, then you need to initialize Radis globally. **The Fix:** Globalize your Redis client instance so it is reused across requests.

uvelirtuta

HOBBYOP

2 months ago

Thanks — I checked and my Redis client is already a single top-level instance via export const redis = new Redis(REDIS_URL) in ioredis. Not created per-request. The issue persists even with a confirmed global client, so I don't think connection initialization is the cause. Would you mind checking network path diagnostics on the backend service's private networking?

uvelirtuta

HOBBYOP

2 months ago

Update: confirmed my Redis client is a single global new Redis(REDIS_URL) at module top level, not created per-request. The "Redis connected" line only appears once at container startup. I've also confirmed the latency is consistent across every subsequent request, not just the first one (which rules out TCP/TLS handshake cost).

Switching between redis.railway.internal and the public *.proxy.rlwy.net URL makes no difference — both paths give 10-15s per Redis call.

For now I've worked around by moving rate limiting to in-memory, which makes my backend fast. Still keen on a root-cause investigation on the networking side since I have other legitimate Redis usage.

bilalnawaz072

TRIAL

2 months ago

I'm not officially from railway team. I'm a developer and suggest the solution based on my experience. Your network flow logs should look like this with 0ms latency.

So based on my experience, the possible issue is how you're using redis instead of railway. I use below snippet for redis. You need to export global redis instead of initialized one.

// 1. Define a function to create the instance

const redisClientSingleton = () => {

return new Redis(process.env.REDIS_URL as string);

};

// 2. Extend the global object type

declare global {

var redis: undefined | ReturnType<typeof redisClientSingleton>;

}

// 3. Use the existing global instance or create a new one

export const redis = globalThis.redis ?? redisClientSingleton();

// 4. In development, save the instance to the global object

if (process.env.NODE_ENV !== 'production') {

globalThis.redis = redis;

}

Attachments

image.png

bilalnawaz072

I'm not officially from railway team. I'm a developer and suggest the solution based on my experience. Your network flow logs should look like this with 0ms latency. ![](https://station-server.railway.com/attachments/att_01kpz3e6dcecras68630sd37me) So based on my experience, the possible issue is how you're using redis instead of railway. I use below snippet for redis. You need to export global redis instead of initialized one. `// 1. Define a function to create the instance` `const redisClientSingleton = () => {` ` return new Redis(process.env.REDIS_URL as string);` `};` `// 2. Extend the global object type` `declare global {` ` var redis: undefined | ReturnType<typeof redisClientSingleton>;` `}` `// 3. Use the existing global instance or create a new one` `export const redis = globalThis.redis ?? redisClientSingleton();` `// 4. In development, save the instance to the global object` `if (process.env.NODE_ENV !== 'production') {` ` globalThis.redis = redis;` `}`

uvelirtuta

HOBBYOP

2 months ago

Thanks. My backend is a plain Node/Express app (not Next.js / serverless). I have a single export const redis = new Redis(REDIS_URL) at module scope. The "Redis connected" log fires once at container startup, not per request, confirming the client is reused. Latency is sustained across ALL requests after startup, not just the first, so it's not a connection-establishment cost.

I've worked around the issue by moving rate limiting to in-memory. Hoping someone from the Railway team can look at the networking path.

Welcome!