4 months ago
Hey guys, Im trying to solve my software problem without the need of a gigantornomous refactor.
Basically, I make requests to n8n (hosted on railway) through cloudflare worker proxy, but these requests take so damn long to finish and the payloads sometimes are gigantic.
So, as Im a dumb folk, I found that instead of creating a trigger to load everything on database as soon as new user is created to pre-fetch the whole data and then trigger by cron (this would take a lot of refactor on database), I've chosen to just intercept requests on proxy, and save them on cloudflare worker KV, to use it as cache, and whenever there is a cache hit, loads from KV instead of making the request again, but that is exploding whenever I request something actually huge for my server to handle, so Im very lost rn and want to find a solution for this.
16 Replies
4 months ago
i think if i understand this correctly, you could implement a small change at the worker/cache layer which can solve this. KV have a per value of 25MB, you can attempt to store responses under 20MB (to be safe), and for any bigger ones, proxy through without caching, or use the Cache API limits (which can store up to 512MB)
a simpler approach is streaming instead of buffering in KV, avoid reading the request.body or response unless you need to, and forward the requests directly to n8n (usign fetch)
4 months ago
Hmmm, what do you mean by streaming in kv?
4 months ago
take a look at this https://developers.cloudflare.com/workers/runtime-apis/streams/
It explains it all pretty well but you’d use ReadableStream, TransformStream, pipeTo() to avoid buffering
4 months ago
It's very, very veeery big jsons, like 12 pages of 10000 lines, do you think this can handle?
4 months ago
what’s the typical size like?
4 months ago
like 1gb+?
4 months ago
Each json file have 500kb
4 months ago
I think you’d be fine but you might have to break it into chunks if necessary
4 months ago
The problem will be the initial load as well, basically I fetch data from ttk shop api. But they designed api for very atomic info, so if I want to calculate anything related to for example, GMV. I need to fetch ALL orders from a period and add them. This always takes too long.
4 months ago
So I want to reduce the time impact user will feel to see this kind of data by caching if possible or convinient
4 months ago
do you need to load everything all the time?
4 months ago
Yes but dont need in real time
4 months ago
5 minutes interval or 15 is fine
4 months ago
if you’re trying to solve the time to user impact, you can probably implement a cache key, and then re-queue it when it becomes stale (ie every 10 or 15 minutes or so), so the user experience doesn’t feel that bad. it’ll just load the new data then, but as to your other problem, I’d definitely look into streaming + chunking your json
4 months ago
even if I need the whole that to calculate would be better chuncking, isn't this strategy provide incorrect data?
4 months ago
it depends on like the problem you’re trying to solve. if it’s more the data aspect, then focus on chunking and there’s no need to implement cache keys, but providing the user with x data & then showing them the delta is usually pretty fine (unless it’s a large difference)