2 years ago
Since around 3 pm (Paris) our users encounter major slow downs on our app. All requests to our api service remain in "pending" status for a while and the page needs several minutes to load fully.
This occurs the day we have a lot of new users / users beginning to use our app intensively (as they are back to school)
My api service serves a Directus app with PRESSURELIMITERENABLED but in our case we do not get any 503 error like the pressure limiter throws in case of overloading https://docs.directus.io/self-hosted/config-options.html#pressure-based-rate-limiter
The metrics doesn't seem to be saturated (in attachment)
I have no unusual errors in my logs
My other service Grafana works like a charm querying the same Postgres DB
I use the v2 runtime since last night on all my services
Our project is hosted in Amsterdam mostly for french users
We subscribed to a pro plan
It looks like there is some bottleneck / throttle somewhere on the network that I can not access to. So, after checking evrything I could, I need your help.
Thanks in advance!!
Video of the user experience https://youtu.be/e8wVv_bSaXM
Project Id: 65aff0db-6586-4be0-8420-b2e67ae4378d
84 Replies
2 years ago
hello, do you have any idea on how many RPS you may be seeing?
2 years ago
perfect!
2 years ago
backend is the directus service right?
2 years ago
perfect
You can get them here https://api.hiphiphip.app/status
2 years ago
the current RPS that is being reported would be lower than our RPS limit, so you aren't running into any kind of platform limitations at the moment.
keep an eye on this RPS number when / if you see issues again and feel free to ping me with that info
2 years ago
at this time, id have to say this is an application level issue
2 years ago
maybe you could try something like increase the postgres pool count?
But Directus has no query limit by default, they shouldn't be in pending mode, right?
2 years ago
im sure there are more factors at play here, can you help me to understand your infra more?
Can you give me the limit rps rate so I know if it at an app level or noit plz?
2 years ago
i dont know if i can give out the current values for that, sorry, but you are currently well under the limit
2 years ago
i do, but id like to understand how it all works together
2 years ago
for example, im now seeing the rps for the api, but you said requests to directus are pending, not the api ?
Request to api.hiphiphip.app are pending
2 years ago
the api calls directus via the private network?
bo.hiphiphip.app is a Directus instance with the admin enabled.
api.hiphiphip.app is the same Directus (cloned) without the admin ^panel enable
www.hiphiphip.app calls api.hiphiphip.app (never the bo directly)
the 2 directus services, bo and api access Postgres and Redis through the private network only
there is also a grafana service using Postgres and Redis (via the private network) and a last service backuping Postgres at 5am to AWS
2 years ago
are you absolutely positive you are doing all the communicate that you can over the private network?
2 years ago
haha yeah that can happen
As you can see, the last 3 remaining services having egress are Frontend (green), api (yellow) and grafana (red which had metrics published on our blog until today)
2 years ago
gotcha, thank you for the rundown
2 years ago
is there anywhere i could go to see these pending requests?
2 years ago
thanks!
2 years ago
not seing anyting that would indicate an issue on our side of things, perhaps you could give the api more replicas?
2 years ago
start with 3
id: brody.the.savior@expensive-railway.app
pwd: DoYouMakeEgressDiscounts?123
2 years ago
we do not lol, but kudo's for trying 😆
Thank you, my brain is totally out of use presently <:oop:1231933790671208499>
2 years ago
can you go ahead and add 3 replicas to the api?
2 years ago
if one of your api services in not able to handle your volume of traffic, 3 might be able to
2 years ago
you would want that off, yes
2 years ago
off on everything, the new proxy is far superior
2 years ago
at the same time go ahead and add those 3 replicas
2 years ago
2 is close enough to 3 haha
2 years ago
can you disable the legacy proxy on your other services too please
hum, railway seems to be buggy: it doesn't propose to deploy when i disable the legacy proxy
2 years ago
thats normal, that change is not a part of the staged changes
is there a way to display the equivalent of https://api.hiphiphip.app/status but for each replica?
2 years ago
nope, you'd only ever see the page for one replica since incoming requests are round robin
2 years ago
okay cool so it seems directus was just a little stressed out is all
2 years ago
if you gain more userbase, you can always add another replica!
2 years ago
happy it was an easy fix!
2 years ago
nah dont worry about it, it took me until now to suggest it too lol
2 years ago
happy to help! i wish you all the best with your service and its growth!
Just a question: Is there some autoscaling feature in the pipe, depending on the moment of the day this could save resources and money?
2 years ago
we do not have any immediate plans for auto h-scaling

