10 months ago
Llama is a popular open-source reasoning model, as are Deepseek, Qwen, and GPT OSS. The distilled versions make it feasible to self-host these models on Railway.
Railway is an optimal place to deploy high-scale models and access them via API, but needs a template to make it easier to get started.
This bounty will be paid out when:
- A high-quality Llama 3.2 model template is presented
- All potential template feedback has been incorporated
- All requirements are met and tested
- Follows all template best practices where applicable
Template requirements:
- vLLM inference server with specific model
- Request batching / caching through Redis
- Multiple model sizes are supported
- API layer - FastAPI with OpenAI compatible endpoints
- Volume-backed storage for all databases using correct mount paths
- Service dependencies should be correctly configured using proper startup order and health checks
- Environment variables correctly configured for Railway domains using private networking where applicable
A few resources to get you started:
- Template Best Practices
- Learn more about Railway Templates
- Railway Documentation
- Llama 3.2 information
- vLLM documentation
Pinned Solution
10 months ago
Here is the completed llama template as requested!
3 Replies
10 months ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
10 months ago
Here is the completed llama template as requested!
Status changed to Solved sarahkb125 • 10 months ago