slow application - Railway Central Station

slow application

dcl

PROOP

8 months ago

I have an application which I've attached to this project, I've tried increasing the number of instance to see if the application response time will be faster but it's still slow.. getting an average of 3000ms as response time which i want it to be 500ms. What else can i do to make my service respond faster, Note that the database and cache are all within my railway network and all communications are kept internal to reduce latency

$20 Bounty

12 Replies

Railway

BOT

8 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

0x5b62656e5d

MODERATOR

8 months ago

I would make sure that all connections between service (eg, from backend or whatever service to your cache/db) is using the railway.internal URL instead of its public URL.

dcl

PROOP

8 months ago

All is using internal railway connection

0x5b62656e5d

MODERATOR

8 months ago

Unless your code has a severe resource leak/lengthy execution time, I would double-check to make sure all URLs used to communicate between services end with railway.internal.

Those are pretty much the only reasons a request would take several seconds to process.

dcl

PROOP

8 months ago

When i tried on my local, and then tunneling for public access, the response time was between 120-200ms

dcl

PROOP

8 months ago

Still waiting for railway support to look into this

dcl

PROOP

7 months ago

Am i still going to get a response to this from railway team???

bytekeim

PRO

7 months ago

hey so the fact that ur getting 120-200ms locally but 3000ms on railway even tho ur using railway.internal means theres definitely something specific going on. since u already confirmed the internal connections are setup right, heres what i think is happening

most likely its a database thing :

N+1 queries problem - this is super common and matches ur symptoms exactly. basically ur app might be doing like hundreds of tiny database calls instead of just doing one proper query. works fine locally with like 10 test records but absolutely dies when theres real production data
missing indexes - if railway has way more data than ur local db, and u dont have indexes on the columns ur searching/filtering by, queries that took 5ms locally will take 2000ms in production
connection pool issues - maybe ur running out of database connections and requests are just sitting there waiting

how to actually figure out whats wrong:

u need to add some logging to see where the time is going. like just log before and after ur database calls, log how many queries ur making per request. then check railway logs and ull see immediately if its spending 2500ms on database stuff or somewhere else

things to check

connect directly to ur railway database and run a query, see if its fast or slow
check if all ur services are in the same railway region (this matters)
look if ur calling any external apis that might be timing out
check railway metrics to see if cpu or memory is spiking

honestly from what ive seen this is usually either n+1 queries (like ur making 50 queries when u should be making 2) or missing database indexes. the huge difference between local and railway performance screams "data volume problem" to me

what stack r u using btw? like nodejs, python, rails? and what database? if u can add some timing logs and see where the 3000ms is actually going that would tell us exactly whats wrong. right now were kinda guessing but the symptoms really point to database query optimization issues

dcl

PROOP

7 months ago

Using Python and also added silk to monitor queries to the database

bytekeim

PRO

7 months ago

oh nice silk is perfect for this. so what does silk show you? like how many queries is it making per request and how long are they taking?

if silk is showing like 50+ queries per request thats definitely ur n+1 problem right there. should be way less than that for most endpoints

also check in silk:

are there any queries taking like 500ms+ each? those need indexes badly

whats the total db time vs the total request time? if db time is like 2800ms out of 3000ms then we know its 100% database related

for python/django specifically the common issues are:

not using select_related() or prefetch_related() for foreign keys - this causes n+1 queries every time

missing db_index=True on model fields that u filter or order by

if ur using django rest framework, nested serializers without prefetch can absolutely murder performance

can u share what silk is showing? like:

how many queries per slow request

what the slowest queries are

total db time vs total response time

once we see that itll be super obvious whats wrong. also if silk shows a bunch of similar queries happening over and over thats the smoking gun for n+1

also quick question - did u run migrations on railway? sometimes ppl forget to add indexes in production that they have locally

dcl

PROOP

7 months ago

This is a copy of my silk environ

Attachments

image.png

bytekeim

PRO

7 months ago

ok so i found ur problem. look at this:

13736ms total time but only 3720ms on queries

that means like 10,000ms (10 SECONDS!) is being spent OUTSIDE of the database. the database is actually fine - 9 queries in 3720ms is not great but its not causing ur main issue

so the real problem isnt the database at all. something else in ur app is eating up 10 seconds per request

things to check:

are u calling any external APIs? like payment gateways, email services, third party apis? those could be timing out or just being slow
are u doing any heavy file processing? like image resizing, pdf generation, csv processing?
are u using celery or background tasks? maybe something that should be async is running synchronously
check if theres any time.sleep() calls accidentally left in the code lol (ive seen this before)
are u doing any complex calculations or data processing in the view/serializer?

to find it,add some timing prints in ur view to see where the 10 seconds is going:

python

import time
start = time.time()
# after each major operation
print(f"after X: {time.time() - start}")

just sprinkle those throughout ur view and check railway logs. ull see immediately which operation is taking forever

also check:

django middleware - maybe u have some middleware thats doing something expensive
serializers if ur using DRF, nested serializers can do crazy stuff sometimes
signals, django signals can add hidden processing time

the database stuff could be optimized (those 400ms queries with joins=1 could probably be faster) but thats not ur main issue. ur losing 10 whole seconds somewhere in python code or external api calls

what does this endpoint actually do? like what operations is it performing?

Welcome!