6 months ago
I'm trying to build out an ETL worker setup with my Python FastAPI service in Railway. However, I'm not clear how this would scale in Railway.
1) I have a python fastapi service deployed in railway. I want to create an ETL job that processes large PDFs (~100 pages) that might each take 2-5 minutes per job. I want a way to kick off these jobs and queue them.
2) It's unclear whether i need a separate codebase for these jobs or i write them directly in my python fastapi setup.
Any guidance on this would be helpful. I'm currently trying to figure out if I can just run my workers from within existing python fastapi service and scale up the instances as needed. Or is this not the right way to do this?
7 Replies
6 months ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
6 months ago
Hello,
While I cannot provide architecture support to help you build your application, I can recommend looking into Celery for task scheduling.
Best,
Brody
Status changed to Awaiting User Response Railway • 6 months ago
6 months ago
Let me refine my question. Can I deploy a 2 railway services from a single python fastapi repo:
1) for the API server
2) for a redis worker that I can scale the number of instances of
Can i do this from a single railway.json to define two railway services with two separate start commands?
Status changed to Awaiting Railway Response Railway • 6 months ago
6 months ago
You cannot use a single railway.json to define two services, but that doesn't matter much, as that approach is not required.
This would be considered a monorepo setup. Please see our docs on how you would deploy that.
https://docs.railway.com/tutorials/deploying-a-monorepo
Status changed to Awaiting User Response Railway • 6 months ago
6 months ago
I'm already using a monorepo setup but this is more nuanced, which is why I'm confused. The way python job queues seem to be built (e.g. Celery, Dramatiq, etc) is a little different than clean /frontend and /backend folders.
They seem to run via the same folder i.e like /fastapi but require different start commands depending on the service: API (my fastapi server), workers (my actual ETL jobs), GUI for the jobs (Celery's flower GUI).
Commands (examples)
-[Queue GUI] flower -A worker.celery_app --port=5555
-[Server] uvicorn app.main:app --host 0.0.0.0 --port 8000
-[Worker] celery -A worker.celery_app worker -l info -Q etl -c <concurrency>
There's some nuance to this that I'm not understanding how to deploy on Railway. Is the practice here deploying the same folder as 3 distinct services with 3 distinct start commands? That seems wonky/incorrect.
Status changed to Awaiting Railway Response Railway • 6 months ago
6 months ago
Gotcha, you can right-click your existing service on the project canvas and duplicate it, then change the start command in its service settings.
https://docs.railway.com/overview/the-basics#service-settings
https://docs.railway.com/reference/build-and-start-commands#start-command
Status changed to Awaiting User Response Railway • 6 months ago
6 months ago
Hey, I am trying to modify my deploy commands for my duplicated fastapi service
1. worker -- i want to change this deploy command
2. fastapi server -- keep whats in railway.json
3. queue ui -- i want to change this deploy command
It's in my @proto-os project in my develop environment
Status changed to Awaiting Railway Response Railway • 6 months ago