15 days ago
Hey guys, Im trying to solve my software problem without the need of a gigantornomous refactor.
Basically, I make requests to n8n (hosted on railway) through cloudflare worker proxy, but these requests take so damn long to finish and the payloads sometimes are gigantic.
So, as Im a dumb folk, I found that instead of creating a trigger to load everything on database as soon as new user is created to pre-fetch the whole data and then trigger by cron (this would take a lot of refactor on database), I've chosen to just intercept requests on proxy, and save them on cloudflare worker KV, to use it as cache, and whenever there is a cache hit, loads from KV instead of making the request again, but that is exploding whenever I request something actually huge for my server to handle, so Im very lost rn and want to find a solution for this.
0 Replies
15 days ago
i think if i understand this correctly, you could implement a small change at the worker/cache layer which can solve this. KV have a per value of 25MB, you can attempt to store responses under 20MB (to be safe), and for any bigger ones, proxy through without caching, or use the Cache API limits (which can store up to 512MB)
a simpler approach is streaming instead of buffering in KV, avoid reading the request.body or response unless you need to, and forward the requests directly to n8n (usign fetch)
15 days ago
take a look at this https://developers.cloudflare.com/workers/runtime-apis/streams/
It explains it all pretty well but you’d use ReadableStream, TransformStream, pipeTo() to avoid buffering
It's very, very veeery big jsons, like 12 pages of 10000 lines, do you think this can handle?
15 days ago
what’s the typical size like?
15 days ago
like 1gb+?
15 days ago
I think you’d be fine but you might have to break it into chunks if necessary
The problem will be the initial load as well, basically I fetch data from ttk shop api. But they designed api for very atomic info, so if I want to calculate anything related to for example, GMV. I need to fetch ALL orders from a period and add them. This always takes too long.
So I want to reduce the time impact user will feel to see this kind of data by caching if possible or convinient
15 days ago
do you need to load everything all the time?
15 days ago
if you’re trying to solve the time to user impact, you can probably implement a cache key, and then re-queue it when it becomes stale (ie every 10 or 15 minutes or so), so the user experience doesn’t feel that bad. it’ll just load the new data then, but as to your other problem, I’d definitely look into streaming + chunking your json
even if I need the whole that to calculate would be better chuncking, isn't this strategy provide incorrect data?
14 days ago
it depends on like the problem you’re trying to solve. if it’s more the data aspect, then focus on chunking and there’s no need to implement cache keys, but providing the user with x data & then showing them the delta is usually pretty fine (unless it’s a large difference)