a month ago
## Issue Summary
I'm experiencing severe internal network latency between my microservices and PostgreSQL/Redis instances, causing frequent timeouts and poor performance despite all services being deployed in the same Railway project.
## Environment
- **Region**: All services deployed in US region
- **Architecture**: 20+ Go microservices using private networking
- **Database**: PostgreSQL (shared by all services)
- **Cache**: Redis (shared by all services)
## Symptoms
### 1. Database Performance Issues
- Simple COUNT queries taking 10+ seconds:
```sql
SELECT count(*) FROM "games" WHERE status = 'published'
[10041.353ms] [rows:1]
gorm.Open()connection taking 10+ secondsQueries that should be instant (with proper indexes) are extremely slow
2. Redis Timeout Issues
Frequent i/o timeout errors even with 30-second timeout configuration:
read tcp 10.235.217.216:33404->10.187.111.239:6379: i/o timeoutSimple Redis SET/GET operations timing out
Timeouts occur even for small cache entries (<1KB)
3. Connection Pool Challenges
Had to reduce connection pools significantly:
PostgreSQL: MaxOpenConns from 25 โ 2 per service (20 services = 40 total)
Redis: PoolSize from 10 โ 3 per service (20 services = 60 total)
Even with reduced pools, still experiencing timeouts
Configuration Details
Current Connection Settings
// PostgreSQL
MaxOpenConns: 2
MaxIdleConns: 1
ConnMaxLifetime: 3 * time.Minute
// Redis
DialTimeout: 30 * time.Second
ReadTimeout: 30 * time.Second
WriteTimeout: 30 * time.Second
PoolTimeout: 30 * time.Second
PoolSize: 3
MinIdleConns: 1
Questions
Are all services (apps, PostgreSQL, Redis) guaranteed to be in the same datacenter/availability zone?
Is there known latency in Railway's private network between services?
What are the recommended connection pool sizes for this architecture?
Should I consider using public endpoints instead of private networking for better performance?
Expected Behavior
Internal private network connections should have low latency (<10ms) since all resources are in the same Railway project and region.
Actual Behavior
Network operations taking 10-30+ seconds, suggesting high inter-service latency or network congestion.
Impact
Services unable to handle production traffic
User requests timing out
Poor user experience
3 Replies
a month ago
Hey there! We've found the following might help you get unblocked faster:
๐งต PostgreSQL Database Connection Failure - "Connection terminated unexpectedly"
๐งต ENOTFOUND and ETIMEDOUT Errors Despite IPv6 Configuration
If you find the answer from one of these, please let us know by solving the thread!
a month ago
Hey there.
Let me start with a friendly observation: your support ticket reads very, very LLM-written. While there's nothing wrong with using an LLM for some help on summing up an issue, the length and format simply makes it harder for us to read, and therefore help you (block format, repeated sentences, weird categorization). We'd appreciate if you keep your answers to the point and uniform next time, as it will make our job easier 
Could you point me to your services in order for me to take a look?
Here's the answers to your questions:
if all services are deployed in the same region then yes, this will mean they should land in the same datacenter (we sometimes divert stateless workflows to other datacenters
We are aware of some spike issues with our private networking and our team is working on fixing these this quarter.
We're unable to make recommendation for your architecture. I suggest opening a public thread - our community will be happy to help!
There is no world where we would recommend using the public networking. you'll see increased base latency, and will lose the isolation given by private networking, exposing your service to the internet.
I believe you're running into our known spike issues with private networking. Our team is picking this up asap and we're hoping to resolve it within the next few weeks.
Best,
Nico
Status changed to Awaiting User Response Railway โข about 1 month ago
a month ago
I don't speak English so I have to ask AI for help. The problem I'm facing now is that the query using the private address link of the database is very slow, but it's normal when I use the public link. I can't be sure whether it's my problem or the platform problem. The production environment I deployed uses the database private network but it's normal.
Status changed to Awaiting Railway Response Railway โข 30 days ago
Status changed to Solved ranyu696 โข 30 days ago