7 months ago
I'm trying to host a ollama server with a tinyllama model that's 670MB. I'm using the hobby plan with 8GB of RAM and 8 cpu per service. I set up the service using a Docker file that pulls the model weights and runs `ollama serve`. When I try to make api requests, the requests are stuck without a response.
I've tried to ssh into the service and make the curl request within the docker container too but it's also stuck without giving any errors. Is the issue the RAM or CPU quality available to the hobby plan or does railway prevent hosting local instances?
I tried this template service but the template does not pull specific models by default - https://railway.com/deploy/T9CQ5w
Lastly, the estimated usage is showing 209 cpus with an estimated bill of >$70 per month. It doesn't make sense to pay that much for a service that's not even working.
I'll appreciate any help with resolving this. My dockerfile is
```docker
# Use Ollama base image
FROM ollama/ollama
# Install curl
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Preload model - Start ollama serve in the background, wait for it, then pull the model
RUN ollama serve & \
until curl -s http://localhost:11434 > /dev/null; do \
echo "Waiting for Ollama server..."; \
sleep 1; \
done && \
ollama pull tinyllama && \
pkill ollama # Optional: kill the server after pull
# Expose Ollama's default port
EXPOSE 11434
# Ollama runs automatically on container start
CMD ["serve"]
```
Attachments
Pinned Solution
7 months ago
I guess that the server was so overwhelmed that the observability chart could not be formulated, as you already mentioned that you surpassed the max limits of what the Hobby plan can provide. And once that is breached, the deployed instances would stop handling incoming requests.
As per your current needs, it seems that you might need a higher plan. You can connect with the team (as previously pointed out in the replies) to get a better understanding of how the infrastructure could support your project, if you need to deploy for the long run here.
I hope that helps 

11 Replies
7 months ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
7 months ago
Railway does provide a ton of templates to be used, and also provides the flexibility for the devs to create their services. But hosting an entire OLLAMA model on Railway does not seem feasible, & monetarily possible.
If you are very keen on using Railway to host such servers, then you may connect with the team & discuss your requirements there.
Attachments
clashing
Railway does provide a ton of templates to be used, and also provides the flexibility for the devs to create their services. But hosting an entire OLLAMA model on Railway does not seem feasible, & monetarily possible.If you are very keen on using Railway to host such servers, then you may connect with the team & discuss your requirements there.https://railway.com/pricing#enterprise-calendar-embedhttps://railway.com/pricing
7 months ago
Thanks clashing,
The ollama template which I linked above didn't come with any downloaded models and I couldn't update it. What's the difference between a template's compute and a regular railway service compute?
There's also no clarity about whether it's a compute or ram limitation. I'm not keen on hosting it, just seeking to clarify requirements. API calls are currently cheaper than railway hosting costs based on the estimate.
7 months ago
You can go to the "OBSERVABILITY" section on the Railway dashboard, and see CPU/Memory usage. That's the max one can do to see what the service is eating in terms of memory/CPU. Network ingress is also a monetary factor, which is the totoal of outgoing data from your server.
I hope that clarifies something
7 months ago
The observability tab doesn't say much about actual compute since a lot of it is virtual hardware resources. In my attached picture, the vcpu and ram usage exceed the 8cpu/8gb ram limit of the hobby plan but my observability chart never topped 2gb ram consumption. And even then the service couldn't respond to any requests.
If I need to consider upgrading my plans, I want to understand the resource consumption better.
7 months ago
I guess that the server was so overwhelmed that the observability chart could not be formulated, as you already mentioned that you surpassed the max limits of what the Hobby plan can provide. And once that is breached, the deployed instances would stop handling incoming requests.
As per your current needs, it seems that you might need a higher plan. You can connect with the team (as previously pointed out in the replies) to get a better understanding of how the infrastructure could support your project, if you need to deploy for the long run here.
I hope that helps 

7 months ago
I am sure your doubts must be clear by now.
clashing
Railway does provide a ton of templates to be used, and also provides the flexibility for the devs to create their services. But hosting an entire OLLAMA model on Railway does not seem feasible, & monetarily possible.If you are very keen on using Railway to host such servers, then you may connect with the team & discuss your requirements there.https://railway.com/pricing#enterprise-calendar-embedhttps://railway.com/pricing
7 months ago
josesx506, can you please mark my intiial post as the solution, as it helped you to clear off the doubts 
clashing
josesx506, can you please mark my intiial post as the solution, as it helped you to clear off the doubts
7 months ago
The initial post wasn't a valid solution. I already knew templates existed and even linked a template that I tested. The doubts were with the railway containers not handling the compute required to provide inference.
josesx506
The initial post wasn't a valid solution. I already knew templates existed and even linked a template that I tested. The doubts were with the railway containers not handling the compute required to provide inference.
7 months ago
Thanks for letting me know, and marking a post as the solution
Status changed to Solved brody • 7 months ago
