Llama is a popular open-source reasoning model, as are Deepseek, Qwen, and GPT OSS. The distilled versions make it feasible to self-host these models on Railway. Railway is an optimal place to deploy high-scale models and access them via API, but needs a template to make it easier to get started. This bounty will be paid out when: * A high-quality [Llama 3.2](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model template is presented * All potential template feedback has been incorporated * All requirements are met and tested * Follows all template best practices where applicable Template requirements: * vLLM inference server with specific model * Request batching / caching through Redis * Multiple model sizes are supported * API layer - FastAPI with OpenAI compatible endpoints * Volume-backed storage for all databases using correct mount paths * Service dependencies should be correctly configured using proper startup order and health checks * Environment variables correctly configured for Railway domains using private networking where applicable A few resources to get you started: * [Template Best Practices](https://docs.railway.com/guides/templates-best-practices) * Learn more about [Railway Templates](https://docs.railway.com/reference/templates) * [Railway Documentation](https://docs.railway.com/) * [Llama 3.2](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) information * [vLLM documentation](https://docs.vllm.ai/en/latest/)

Template Request: Llama - Railway Central Station

Template Request: Llama

sarahkb125

EMPLOYEEOP

10 months ago

Llama is a popular open-source reasoning model, as are Deepseek, Qwen, and GPT OSS. The distilled versions make it feasible to self-host these models on Railway.

Railway is an optimal place to deploy high-scale models and access them via API, but needs a template to make it easier to get started.

This bounty will be paid out when:

A high-quality Llama 3.2 model template is presented
All potential template feedback has been incorporated
All requirements are met and tested
Follows all template best practices where applicable

Template requirements:

vLLM inference server with specific model
Request batching / caching through Redis
Multiple model sizes are supported
API layer - FastAPI with OpenAI compatible endpoints
Volume-backed storage for all databases using correct mount paths
Service dependencies should be correctly configured using proper startup order and health checks
Environment variables correctly configured for Railway domains using private networking where applicable

A few resources to get you started: