Railway issues with hosting ollama

josesx506

HOBBYOP

a year ago

I'm trying to host a ollama server with a tinyllama model that's 670MB. I'm using the hobby plan with 8GB of RAM and 8 cpu per service. I set up the service using a Docker file that pulls the model weights and runs `ollama serve`. When I try to make api requests, the requests are stuck without a response.

I've tried to ssh into the service and make the curl request within the docker container too but it's also stuck without giving any errors. Is the issue the RAM or CPU quality available to the hobby plan or does railway prevent hosting local instances?

I tried this template service but the template does not pull specific models by default - https://railway.com/deploy/T9CQ5w

Lastly, the estimated usage is showing 209 cpus with an estimated bill of >$70 per month. It doesn't make sense to pay that much for a service that's not even working.

I'll appreciate any help with resolving this. My dockerfile is

```docker

# Use Ollama base image
FROM ollama/ollama

# Install curl
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Preload model - Start ollama serve in the background, wait for it, then pull the model
RUN ollama serve & \
    until curl -s http://localhost:11434 > /dev/null; do \
        echo "Waiting for Ollama server..."; \
        sleep 1; \
    done && \
    ollama pull tinyllama && \
    pkill ollama # Optional: kill the server after pull


# Expose Ollama's default port
EXPOSE 11434

# Ollama runs automatically on container start
CMD ["serve"]

```

Attachments

Screenshot%...

Solved$10 Bounty

Pinned Solution

clashing

HOBBY

a year ago

I guess that the server was so overwhelmed that the observability chart could not be formulated, as you already mentioned that you surpassed the max limits of what the Hobby plan can provide. And once that is breached, the deployed instances would stop handling incoming requests.

As per your current needs, it seems that you might need a higher plan. You can connect with the team (as previously pointed out in the replies) to get a better understanding of how the infrastructure could support your project, if you need to deploy for the long run here.

I hope that helps ✌✌

11 Replies

Railway

BOT

a year ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

clashing

HOBBY

a year ago

If you are very keen on using Railway to host such servers, then you may connect with the team & discuss your requirements there.

https://railway.com/pricing#enterprise-calendar-embed

https://railway.com/pricing

Attachments

image.png

clashing

Railway does provide a ton of templates to be used, and also provides the flexibility for the devs to create their services. But hosting an entire OLLAMA model on Railway does not seem feasible, & monetarily possible. If you are very keen on using Railway to host such servers, then you may connect with the team & discuss your requirements there. ![](https://station-server.railway.com/attachments/att_01k1b5hg25f26ts35tphm323ft) <https://railway.com/pricing#enterprise-calendar-embed> <https://railway.com/pricing>

josesx506

HOBBYOP

a year ago

Thanks clashing,

The ollama template which I linked above didn't come with any downloaded models and I couldn't update it. What's the difference between a template's compute and a regular railway service compute?

There's also no clarity about whether it's a compute or ram limitation. I'm not keen on hosting it, just seeking to clarify requirements. API calls are currently cheaper than railway hosting costs based on the estimate.

clashing

HOBBY

a year ago

You can go to the "OBSERVABILITY" section on the Railway dashboard, and see CPU/Memory usage. That's the max one can do to see what the service is eating in terms of memory/CPU. Network ingress is also a monetary factor, which is the totoal of outgoing data from your server.

I hope that clarifies something

josesx506

HOBBYOP

a year ago

The observability tab doesn't say much about actual compute since a lot of it is virtual hardware resources. In my attached picture, the vcpu and ram usage exceed the 8cpu/8gb ram limit of the hobby plan but my observability chart never topped 2gb ram consumption. And even then the service couldn't respond to any requests.

If I need to consider upgrading my plans, I want to understand the resource consumption better.

clashing

HOBBY

a year ago

I hope that helps ✌✌

clashing

HOBBY

a year ago

I am sure your doubts must be clear by now.

clashing

HOBBY

a year ago

josesx506, can you please mark my intiial post as the solution, as it helped you to clear off the doubts ✌

clashing

**josesx506**, can you please mark my intiial post as the solution, as it helped you to clear off the doubts ✌

josesx506

HOBBYOP

a year ago

The initial post wasn't a valid solution. I already knew templates existed and even linked a template that I tested. The doubts were with the railway containers not handling the compute required to provide inference.

josesx506

clashing

HOBBY

a year ago

Thanks for letting me know, and marking a post as the solution

josesx506

HOBBYOP

a year ago

Thanks for taking the time to answer my question

Status changed to Solved brody • 11 months ago

Welcome!