tansandoteth
PRO
a month ago
I was just testing out LLMs on Railway to see how they fair. Currently just deployed the ollama template and tried testing with hobby and pro, but but seems pretty slow in a basic response. Testing with llama3:latest
and noticed memory is only hitting 10gb.
Here's the command i am running:
curl http://ollama.railway.internal:11434/api/chat -H "Content-Type: application/json" -d '{"model":"llama3:latest","stream":false,"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Why is the sky blue?"}]}'
Is this an LLM issue? or something to do with vram bandwidth? Any thoughts appreciated.
0 Replies
a month ago
Yep, that’s very likely it
a month ago
!s
Status changed to Solved adam • 27 days ago