Dear Railway Support Team,I am experiencing a critical and highly unusual performance issue with my Ollama service deployed on Railway, which is severely impacting its usability. My service is configured with a Pro Plan providing 32 vCPU and 32 GB of RAM, and is currently deployed in the [Your Current Region, e.g., Asia Pacific (Singapore)] region.Service Details:Project ID: 2260e007-624d-43c0-ba6d-86056e544cb7Ollama Service ID: 2c0832c7-058b-4080-91d5-44ea02269435Ollama Docker Image:ollama/ollamaAllocated Resources: 32 vCPU, 32 GB RAM (confirmed in service settings)Problem Description:The Ollama service exhibits extreme and anomalous slowness, even with very small language models and simple prompts, leading to client-side timeouts. Specifically, when attempting to generate a response from the gemma:2b model (approx. 1.5 GB in size) with a simple prompt like "Hello", the inference takes over 15 minutes before the client (n8n) times out.Key Observations & Diagnostic Findings:Abnormal Memory Usage:During inference, the Ollama service's memory usage immediately spikes to and consistently stays at around 25-27 GB on the Railway metrics dashboard. This is highly disproportionate and unexpected for a 1.5 GB model like gemma:2b.This persistent, excessively high memory consumption suggests severe memory allocation issues, potential heavy swapping to disk, or an underlying memory leak within the containerized environment, drastically impacting performance.CPU Maxed Out:Concurrently with the memory spike, the CPU usage immediately goes to and remains at 32 vCPU (maxed out) for the entire duration of the request (over 15 minutes), until the client connection aborts. This indicates a severe processing bottleneck, likely stemming from the memory issues.Lack of Specific Ollama Crash Logs:Despite the extreme slowness and effective unresponsiveness, the Ollama logs on Railway do not show explicit "Out of Memory (OOM)", "killed", "mmap_failed", or other fatal error messages at the moment of the timeout.The logs only indicate [GIN] ... | 200 | 15m25s | ... POST "/api/generate" followed by msg="aborting completion request due to client closing the connection", confirming the client timeout rather than a server-side crash. (Reference your log snippets: chrome_4oZV2yLSSu.png, image_20436f.jpg or paste them in).Issue Persists Across Models:The issue is not isolated to gemma:2b. Larger models like mistral:7b also exhibited extreme slowness and mmap_failed errors in previous tests, and mistral-small3.2:latest (24B parameters) also caused severe bottlenecking. This suggests a systemic issue with how Ollama is being allocated/utilizing resources on the Railway infrastructure for CPU-only inference.Region Change Ineffective:I have already attempted changing the deployment region from US West (California), USA to Asia Pacific (Singapore), but the exact same performance and memory issues persist, ruling out region-specific hardware problems.Request for Assistance:Given the highly anomalous and debilitating memory behavior for even a small model like gemma:2b, I kindly request your assistance in performing a deeper investigation. Could you please:Examine the underlying host server metrics and container resource allocation more thoroughly for my Ollama service?Investigate potential memory leaks or inefficiencies related to Ollama's operation within the Railway environment, especially concerning memory mapping (mmap) and swap usage?Advise on any specific configurations or known issues that could lead to gemma:2b consuming 25-27GB of RAM and taking 15+ minutes for a "Hello" prompt on a 32GB/32vCPU setup.Please let me know if you require any further information or access to my project. I am available to provide details or run specific diagnostics.Thank you for your time and assistance.Sincerely,[Your Name] [Your Contact Email/Phone (Optional)]

Issue: Ollama Service - Extreme Slowness & Memory Usage (Project ID: 2260e007-624d-43c0-ba6d-86056e544cb7)

minutesgrowthnow

FREEOP

7 months ago

Dear Railway Support Team,

I am experiencing a critical and highly unusual performance issue with my Ollama service deployed on Railway, which is severely impacting its usability. My service is configured with a Pro Plan providing 32 vCPU and 32 GB of RAM, and is currently deployed in the [Your Current Region, e.g., Asia Pacific (Singapore)] region.

Service Details:

Project ID: 2260e007-624d-43c0-ba6d-86056e544cb7
Ollama Service ID: 2c0832c7-058b-4080-91d5-44ea02269435
Ollama Docker Image:ollama/ollama
Allocated Resources: 32 vCPU, 32 GB RAM (confirmed in service settings)

Problem Description:

The Ollama service exhibits extreme and anomalous slowness, even with very small language models and simple prompts, leading to client-side timeouts. Specifically, when attempting to generate a response from the gemma:2b model (approx. 1.5 GB in size) with a simple prompt like "Hello", the inference takes over 15 minutes before the client (n8n) times out.

Key Observations & Diagnostic Findings:

Abnormal Memory Usage:
- During inference, the Ollama service's memory usage immediately spikes to and consistently stays at around 25-27 GB on the Railway metrics dashboard. This is highly disproportionate and unexpected for a 1.5 GB model like gemma:2b.
- This persistent, excessively high memory consumption suggests severe memory allocation issues, potential heavy swapping to disk, or an underlying memory leak within the containerized environment, drastically impacting performance.
CPU Maxed Out:
- Concurrently with the memory spike, the CPU usage immediately goes to and remains at 32 vCPU (maxed out) for the entire duration of the request (over 15 minutes), until the client connection aborts. This indicates a severe processing bottleneck, likely stemming from the memory issues.
Lack of Specific Ollama Crash Logs:
- Despite the extreme slowness and effective unresponsiveness, the Ollama logs on Railway do not show explicit "Out of Memory (OOM)", "killed", "mmap_failed", or other fatal error messages at the moment of the timeout.
- The logs only indicate [GIN] ... | 200 | 15m25s | ... POST "/api/generate" followed by msg="aborting completion request due to client closing the connection", confirming the client timeout rather than a server-side crash. (Reference your log snippets: chrome_4oZV2yLSSu.png, image_20436f.jpg or paste them in).
Issue Persists Across Models:
- The issue is not isolated to gemma:2b. Larger models like mistral:7b also exhibited extreme slowness and mmap_failed errors in previous tests, and mistral-small3.2:latest (24B parameters) also caused severe bottlenecking. This suggests a systemic issue with how Ollama is being allocated/utilizing resources on the Railway infrastructure for CPU-only inference.
Region Change Ineffective:
- I have already attempted changing the deployment region from US West (California), USA to Asia Pacific (Singapore), but the exact same performance and memory issues persist, ruling out region-specific hardware problems.

Request for Assistance:

Given the highly anomalous and debilitating memory behavior for even a small model like gemma:2b, I kindly request your assistance in performing a deeper investigation. Could you please:

Examine the underlying host server metrics and container resource allocation more thoroughly for my Ollama service?
Investigate potential memory leaks or inefficiencies related to Ollama's operation within the Railway environment, especially concerning memory mapping (mmap) and swap usage?
Advise on any specific configurations or known issues that could lead to gemma:2b consuming 25-27GB of RAM and taking 15+ minutes for a "Hello" prompt on a 32GB/32vCPU setup.

Please let me know if you require any further information or access to my project. I am available to provide details or run specific diagnostics.

Thank you for your time and assistance.

Sincerely,

[Your Name] [Your Contact Email/Phone (Optional)]

$10 Bounty

3 Replies

brody

EMPLOYEE

7 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody • 7 months ago

clashing

HOBBY

7 months ago

https://station.railway.com/questions/railway-issues-with-hosting-ollama-0c4e2da0

In this similar thread revolving around Ollama, I have answered many things. Do check it out, and see if something is of use for you or not

clashing

https://station.railway.com/questions/railway-issues-with-hosting-ollama-0c4e2da0In this similar thread revolving around Ollama, I have answered many things. Do check it out, and see if something is of use for you or not

ianferreira

PRO

6 months ago

Thanks ok I at least got Ollama to run but performance is abysmal. So about those GPU nodes.......

ianferreira

Thanks ok I at least got Ollama to run but performance is abysmal. So about those GPU nodes.......

brody

EMPLOYEE

6 months ago

Cooper had previously linked you to templates that are approved to use a specific instruction set. Deploying those templates will work, but deploying your own stuff will result in poor performance.