tansandoteth
PRO
3 months ago
I was just testing out LLMs on Railway to see how they fair. Currently just deployed the ollama template and tried testing with hobby and pro, but but seems pretty slow in a basic response. Testing with llama3:latest
and noticed memory is only hitting 10gb.
Here's the command i am running:
curl http://ollama.railway.internal:11434/api/chat -H "Content-Type: application/json" -d '{"model":"llama3:latest","stream":false,"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Why is the sky blue?"}]}'
Is this an LLM issue? or something to do with vram bandwidth? Any thoughts appreciated.
0 Replies
3 months ago
Yep, that’s very likely it
3 months ago
!s
Status changed to Solved adam • 3 months ago