Server Performance Issue on Pro Plan – Project B Not Responding / Login Failing
chaudharyali14
PROOP

16 days ago

Hi Railway Support Team,

I hope this message finds you well. I'm reaching out regarding a critical performance issue I'm experiencing with one

of my projects on Railway.

Account Details:

- Plan: Pro

- Affected Project: Project B

- Working Project: Project A

Issue Description:

My Project B is experiencing severe performance degradation — the server is running extremely slow and is not allowing

users to log in. What makes this particularly confusing is that Project A, which runs the exact same codebase, is

functioning perfectly without any issues.

Steps I've Already Taken:

- Reviewed and compared the full codebase of both projects — no differences found

- Confirmed the code logic is correct and working (as evidenced by Project A)

- The issue appears to be infrastructure or environment-specific, not code-related

Expected Behavior:

Project B should perform identically to Project A since they share the same code.

Actual Behavior:

- Server response is extremely slow

- Login endpoint is failing / not responding

- Users are unable to authenticate

Could you please investigate whether there is a resource allocation issue, region-specific problem, or any

infrastructure anomaly affecting Project B specifically?

$30 Bounty

5 Replies

16 days ago

Hello,

We would be happy to look into this, but we are going to need a more in-depth explanation of what's going on, as opposed to the high-level issues you mentioned above. We would also appreciate reproducible steps.


Status changed to Awaiting User Response Railway 16 days ago


chaudharyali14
PROOP

16 days ago

Thank you for getting back to me. Below is a detailed technical breakdown with exact reproducible behavior.

---

Environment

┌──────────────────┬─────────────────────────────────────────────────────────┐

│ Component │ Details │

├──────────────────┼─────────────────────────────────────────────────────────┤

│ Plan │ Pro │

├──────────────────┼─────────────────────────────────────────────────────────┤

│ Affected Service │ Project B — NestJS 11 Backend │

├──────────────────┼─────────────────────────────────────────────────────────┤

│ Working Service │ Project A — Identical codebase │

├──────────────────┼─────────────────────────────────────────────────────────┤

│ Database │ PostgreSQL (Railway managed, postgres.railway.internal) │

├──────────────────┼─────────────────────────────────────────────────────────┤

│ Cache │ Redis (Railway managed, redis.railway.internal) │

├──────────────────┼─────────────────────────────────────────────────────────┤

│ Frontend │ Next.js on Vercel │

└──────────────────┴─────────────────────────────────────────────────────────┘

---

Exact Symptoms

- The server does not slow down — it completely stops accepting connections

- All endpoints become unreachable (including GET /health)

- The only fix is a manual redeploy — after which it works again for ~1–1.5 hours, then dies again

- Project A with identical code runs without any issues

---

Critical Observation — Locally vs Deployed

This is the most important clue:

──────────────────────────────┬───────────┬─────────────────────────

│ Environment │ Database │ Redis │ Result │

├──────────────────────────────┼────────────────────────────────────┼──────────────────────────

│ Local machine │ Railway PostgreSQL (same instance) │ Railway Redis (same instance) │ white_check_mark emoji Works perfectly, no drops │

├──────────────────────────────┼────────────────────────────────────┼──────────────────────────

│ Railway deployed (Project B) │ Railway PostgreSQL (same instance) │ Railway Redis (same instance) │ x emoji Dies after ~1–1.5 hours, requires redeploy

├──────────────────────────────┼────────────────────────────────────┼───────────

│ Railway deployed (Project A) │ Railway PostgreSQL │ Railway Redis │ white_check_mark emoji Works perfectly │

└──────────────────────────────┴────────────────────────────────────┴───────────

The database and Redis are the same instances in all cases. The only variable is whether the NestJS process is running locally or deployed inside Railway

Project B.

This rules out any issue with my code or database configuration.

---

Failure Pattern (Reproducible)

Step 1: Deploy Project B on Railway

Step 2: Server starts — everything works fine

Step 3: After approximately 1 to 1.5 hours — server completely stops responding

Step 4: Railway health checks fail

Step 5: Manual redeploy required to restore service

Step 6: Repeat from Step 2

This cycle repeats every single deployment, consistently after ~1–1.5 hours.

---

What I Have Already Ruled Out

- x emoji Not a code issue — same code works in Project A and locally

- x emoji Not a database issue — same DB works fine locally and in Project A

- x emoji Not a Redis issue — same Redis works fine locally and in Project A

- x emoji Not a CORS or authentication issue — requests never reach the server after the drop

---


Status changed to Awaiting Railway Response Railway 16 days ago


16 days ago

Going to open this up to the community, as we aren't able to help with application-level or configuration issues.


Status changed to Awaiting User Response Railway 16 days ago


Railway
BOT

16 days ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 16 days ago


chaudharyali14

Thank you for getting back to me. Below is a detailed technical breakdown with exact reproducible behavior.---Environment┌──────────────────┬─────────────────────────────────────────────────────────┐│ Component │ Details │├──────────────────┼─────────────────────────────────────────────────────────┤│ Plan │ Pro │├──────────────────┼─────────────────────────────────────────────────────────┤│ Affected Service │ Project B — NestJS 11 Backend │├──────────────────┼─────────────────────────────────────────────────────────┤│ Working Service │ Project A — Identical codebase │├──────────────────┼─────────────────────────────────────────────────────────┤│ Database │ PostgreSQL (Railway managed, postgres.railway.internal) │├──────────────────┼─────────────────────────────────────────────────────────┤│ Cache │ Redis (Railway managed, redis.railway.internal) │├──────────────────┼─────────────────────────────────────────────────────────┤│ Frontend │ Next.js on Vercel │└──────────────────┴─────────────────────────────────────────────────────────┘---Exact Symptoms- The server does not slow down — it completely stops accepting connections- All endpoints become unreachable (including GET /health)- The only fix is a manual redeploy — after which it works again for ~1–1.5 hours, then dies again- Project A with identical code runs without any issues---Critical Observation — Locally vs DeployedThis is the most important clue:──────────────────────────────┬───────────┬─────────────────────────│ Environment │ Database │ Redis │ Result │├──────────────────────────────┼────────────────────────────────────┼──────────────────────────│ Local machine │ Railway PostgreSQL (same instance) │ Railway Redis (same instance) │ Works perfectly, no drops │├──────────────────────────────┼────────────────────────────────────┼──────────────────────────│ Railway deployed (Project B) │ Railway PostgreSQL (same instance) │ Railway Redis (same instance) │ Dies after ~1–1.5 hours, requires redeploy├──────────────────────────────┼────────────────────────────────────┼───────────│ Railway deployed (Project A) │ Railway PostgreSQL │ Railway Redis │ Works perfectly │└──────────────────────────────┴────────────────────────────────────┴───────────The database and Redis are the same instances in all cases. The only variable is whether the NestJS process is running locally or deployed inside RailwayProject B.This rules out any issue with my code or database configuration.---Failure Pattern (Reproducible)Step 1: Deploy Project B on RailwayStep 2: Server starts — everything works fineStep 3: After approximately 1 to 1.5 hours — server completely stops respondingStep 4: Railway health checks failStep 5: Manual redeploy required to restore serviceStep 6: Repeat from Step 2This cycle repeats every single deployment, consistently after ~1–1.5 hours.---What I Have Already Ruled Out- Not a code issue — same code works in Project A and locally- Not a database issue — same DB works fine locally and in Project A- Not a Redis issue — same Redis works fine locally and in Project A- Not a CORS or authentication issue — requests never reach the server after the drop---

chaudharyali14
PROOP

16 days ago

I appreciate the response, but I must respectfully push back on the classification of this as an "application-level" issue. The

evidence clearly points to a Railway infrastructure problem.

Here is why this is NOT an application issue:

┌─────────────────────┬──────┬─────────────────┬────────────────────┬────────────────────────────┐

│ Test │ Code │ Database │ Redis │ Result │

├─────────────────────┼──────┼─────────────────┼────────────────────┼────────────────────────────┤

│ Local machine │ Same │ Same Railway DB │ Same Railway Redis │ white_check_mark emoji Runs for days, no drops │

├─────────────────────┼──────┼─────────────────┼────────────────────┼────────────────────────────┤

│ Railway — Project A │ Same │ Same │ Same │ white_check_mark emoji Runs perfectly │

├─────────────────────┼──────┼─────────────────┼────────────────────┼────────────────────────────┤

│ Railway — Project B │ Same │ Same │ Same │ x emoji Dies every 1–1.5 hours │

└─────────────────────┴──────┴─────────────────┴────────────────────┴────────────────────────────┘

The code, database, and Redis are identical across all three environments.

The only variable is Railway's Project B container/instance itself.

If this were an application or configuration issue, it would fail in all three environments equally. It does not.

Specific questions that only Railway can answer:

1. Are there any container OOM kills, CPU throttles, or process signals sent to Project B after ~1 hour?

2. Is there a difference in private networking stability between Project A and Project B instances?

3. Are there any Railway-side infrastructure logs for Project B showing what happens at the ~1–1.5 hour mark?

These are not questions I can answer from my application logs. Only Railway's infrastructure team has access to this data.

I am a Pro plan customer and this service has been unusable in production. I would greatly appreciate escalation to your

infrastructure team rather than the community forum, as this requires server-side log access that community members do not have.


chaudharyali14
PROOP

16 days ago

502 Bad Gateway Railway/Infrastructure

Attachments


Loading...