## Issue Summary I'm experiencing severe internal network latency between my microservices and PostgreSQL/Redis instances, causing frequent timeouts and poor performance despite all services being deployed in the same Railway project. ## Environment - **Region**: All services deployed in US region - **Architecture**: 20+ Go microservices using private networking - **Database**: PostgreSQL (shared by all services) - **Cache**: Redis (shared by all services) ## Symptoms ### 1. Database Performance Issues - Simple COUNT queries taking 10+ seconds: ```sql SELECT count(*) FROM "games" WHERE status = 'published' [10041.353ms] [rows:1]gorm.Open() connection taking 10+ secondsQueries that should be instant (with proper indexes) are extremely slow2. Redis Timeout IssuesFrequent i/o timeout errors even with 30-second timeout configuration:read tcp 10.235.217.216:33404->10.187.111.239:6379: i/o timeoutSimple Redis SET/GET operations timing outTimeouts occur even for small cache entries (<1KB)3. Connection Pool ChallengesHad to reduce connection pools significantly:PostgreSQL: MaxOpenConns from 25 → 2 per service (20 services = 40 total)Redis: PoolSize from 10 → 3 per service (20 services = 60 total)Even with reduced pools, still experiencing timeoutsConfiguration DetailsCurrent Connection Settings// PostgreSQL MaxOpenConns: 2 MaxIdleConns: 1 ConnMaxLifetime: 3 * time.Minute // Redis DialTimeout: 30 * time.Second ReadTimeout: 30 * time.Second WriteTimeout: 30 * time.Second PoolTimeout: 30 * time.Second PoolSize: 3 MinIdleConns: 1QuestionsAre all services (apps, PostgreSQL, Redis) guaranteed to be in the same datacenter/availability zone?Is there known latency in Railway's private network between services?What are the recommended connection pool sizes for this architecture?Should I consider using public endpoints instead of private networking for better performance?Expected BehaviorInternal private network connections should have low latency (<10ms) since all resources are in the same Railway project and region.Actual BehaviorNetwork operations taking 10-30+ seconds, suggesting high inter-service latency or network congestion.ImpactServices unable to handle production trafficUser requests timing outPoor user experience

Internal network latency causing database and Redis connection timeouts

ranyu696

PROOP

5 months ago

## Issue Summary
I'm experiencing severe internal network latency between my microservices and PostgreSQL/Redis instances, causing frequent timeouts and poor performance despite all services being deployed in the same Railway project.

## Environment
- **Region**: All services deployed in US region
- **Architecture**: 20+ Go microservices using private networking
- **Database**: PostgreSQL (shared by all services)
- **Cache**: Redis (shared by all services)

## Symptoms

### 1. Database Performance Issues
- Simple COUNT queries taking 10+ seconds:
  ```sql
  SELECT count(*) FROM "games" WHERE status = 'published'
  [10041.353ms] [rows:1]

gorm.Open() connection taking 10+ seconds
Queries that should be instant (with proper indexes) are extremely slow

2. Redis Timeout Issues

Frequent i/o timeout errors even with 30-second timeout configuration:
```
read tcp 10.235.217.216:33404->10.187.111.239:6379: i/o timeout
```
Simple Redis SET/GET operations timing out
Timeouts occur even for small cache entries (<1KB)

3. Connection Pool Challenges

Had to reduce connection pools significantly:
- PostgreSQL: MaxOpenConns from 25 → 2 per service (20 services = 40 total)
- Redis: PoolSize from 10 → 3 per service (20 services = 60 total)
Even with reduced pools, still experiencing timeouts

Configuration Details

Current Connection Settings

// PostgreSQL
MaxOpenConns:    2
MaxIdleConns:    1
ConnMaxLifetime: 3 * time.Minute

// Redis
DialTimeout:  30 * time.Second
ReadTimeout:  30 * time.Second
WriteTimeout: 30 * time.Second
PoolTimeout:  30 * time.Second
PoolSize:     3
MinIdleConns: 1

Questions

Are all services (apps, PostgreSQL, Redis) guaranteed to be in the same datacenter/availability zone?
Is there known latency in Railway's private network between services?
What are the recommended connection pool sizes for this architecture?
Should I consider using public endpoints instead of private networking for better performance?

Expected Behavior

Internal private network connections should have low latency (<10ms) since all resources are in the same Railway project and region.

Actual Behavior

Network operations taking 10-30+ seconds, suggesting high inter-service latency or network congestion.

Impact

Services unable to handle production traffic
User requests timing out
Poor user experience

Solved

3 Replies

Railway

BOT

5 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

itsrems

EMPLOYEE

5 months ago

Hey there.

Let me start with a friendly observation: your support ticket reads very, very LLM-written. While there's nothing wrong with using an LLM for some help on summing up an issue, the length and format simply makes it harder for us to read, and therefore help you (block format, repeated sentences, weird categorization). We'd appreciate if you keep your answers to the point and uniform next time, as it will make our job easier

Could you point me to your services in order for me to take a look?

Here's the answers to your questions:

if all services are deployed in the same region then yes, this will mean they should land in the same datacenter (we sometimes divert stateless workflows to other datacenters
We are aware of some spike issues with our private networking and our team is working on fixing these this quarter.
We're unable to make recommendation for your architecture. I suggest opening a public thread - our community will be happy to help!
There is no world where we would recommend using the public networking. you'll see increased base latency, and will lose the isolation given by private networking, exposing your service to the internet.

I believe you're running into our known spike issues with private networking. Our team is picking this up asap and we're hoping to resolve it within the next few weeks.

Best,
Nico

Status changed to Awaiting User Response Railway • 5 months ago

ranyu696

PROOP

5 months ago

I don't speak English so I have to ask AI for help. The problem I'm facing now is that the query using the private address link of the database is very slow, but it's normal when I use the public link. I can't be sure whether it's my problem or the platform problem. The production environment I deployed uses the database private network but it's normal.

Status changed to Awaiting Railway Response Railway • 5 months ago

Status changed to Solved ranyu696 • 5 months ago