Python FastAPI, Redis, Redis-Queue for RAG ETL Jobs
vihardesu
PROOP

5 months ago

I'm trying to build out an ETL worker setup with my Python FastAPI service in Railway. However, I'm not clear how this would scale in Railway.

1) I have a python fastapi service deployed in railway. I want to create an ETL job that processes large PDFs (~100 pages) that might each take 2-5 minutes per job. I want a way to kick off these jobs and queue them.

2) It's unclear whether i need a separate codebase for these jobs or i write them directly in my python fastapi setup.

Any guidance on this would be helpful. I'm currently trying to figure out if I can just run my workers from within existing python fastapi service and scale up the instances as needed. Or is this not the right way to do this?

$20 Bounty

7 Replies

Railway
BOT

5 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


brody
EMPLOYEE

5 months ago

Hello,

While I cannot provide architecture support to help you build your application, I can recommend looking into Celery for task scheduling.

Best,
Brody


Status changed to Awaiting User Response Railway 5 months ago


vihardesu
PROOP

5 months ago

Let me refine my question. Can I deploy a 2 railway services from a single python fastapi repo:

1) for the API server
2) for a redis worker that I can scale the number of instances of

Can i do this from a single railway.json to define two railway services with two separate start commands?


Status changed to Awaiting Railway Response Railway 5 months ago


brody
EMPLOYEE

5 months ago

You cannot use a single railway.json to define two services, but that doesn't matter much, as that approach is not required.

This would be considered a monorepo setup. Please see our docs on how you would deploy that.

https://docs.railway.com/tutorials/deploying-a-monorepo


Status changed to Awaiting User Response Railway 5 months ago


vihardesu
PROOP

5 months ago

I'm already using a monorepo setup but this is more nuanced, which is why I'm confused. The way python job queues seem to be built (e.g. Celery, Dramatiq, etc) is a little different than clean /frontend and /backend folders.

They seem to run via the same folder i.e like /fastapi but require different start commands depending on the service: API (my fastapi server), workers (my actual ETL jobs), GUI for the jobs (Celery's flower GUI).

Commands (examples)

-[Queue GUI] flower -A worker.celery_app --port=5555
-[Server] uvicorn app.main:app --host 0.0.0.0 --port 8000
-[Worker] celery -A worker.celery_app worker -l info -Q etl -c <concurrency>

There's some nuance to this that I'm not understanding how to deploy on Railway. Is the practice here deploying the same folder as 3 distinct services with 3 distinct start commands? That seems wonky/incorrect.


Status changed to Awaiting Railway Response Railway 5 months ago


brody
EMPLOYEE

5 months ago

Gotcha, you can right-click your existing service on the project canvas and duplicate it, then change the start command in its service settings.

https://docs.railway.com/overview/the-basics#service-settings
https://docs.railway.com/reference/build-and-start-commands#start-command


Status changed to Awaiting User Response Railway 5 months ago


vihardesu
PROOP

5 months ago

Hey, I am trying to modify my deploy commands for my duplicated fastapi service
1. worker -- i want to change this deploy command
2. fastapi server -- keep whats in railway.json
3. queue ui -- i want to change this deploy command

It's in my @proto-os project in my develop environment


Status changed to Awaiting Railway Response Railway 5 months ago


Loading...