Out of Memory Issue with Celery on Railway (Despite 32GB Pro Plan)
vayakakshay
PROOP

a month ago

Hello Team,

I’m facing an Out of Memory issue while running Celery with Django on Railway.
When I checked the service metrics, it shows that Celery is using around 3GB of memory, but my Pro plan allows up to 32GB.

Could you please provide some information on why this error occurs and how I can utilize more memory for my Celery workers?

Thank you,
Akshay

Solved$10 Bounty

26 Replies

Railway
BOT

a month ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


vayakakshay
PROOP

a month ago

hello?


brody
EMPLOYEE

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody about 1 month ago


vayakakshay

hello?

Go to your celery service -> settings -> resource limits (shown below) and confirm it's at 32GB.

if its at ~32 and its still being OOM, can you provide your logs?

Attachments


yeeet

Go to your celery service -> settings -> resource limits (shown below) and confirm it's at 32GB.if its at ~32 and its still being OOM, can you provide your logs?

vayakakshay
PROOP

a month ago

No here its already 32 selected


we need to see your logs, can you provide them?


vayakakshay
PROOP

a month ago

Check file

Attachments


vayakakshay
PROOP

a month ago

Check memory usage

Attachments


vayakakshay
PROOP

a month ago

This is configuration

Attachments


vayakakshay
PROOP

a month ago

?


vayakakshay
PROOP

a month ago

Also check this logs

Attachments


vayakakshay

Also check this logs

hey, you have 2 separate issues.

you have to fix the dup count within /app/keyword_repo/views.py and ensure that the duplicate_count variable is defined

For your celery worker, it's just failing due to memory constantly, but one thing you can set is concurrency and max tasks to see if it resolves the actual memory issue, so in your start command, celery -A your_app.celery worker --loglevel=info --concurrency=4 --max-tasks-per-child=50. without directly seeing your source code, i cant tell if there's a small memory leak, but the max tasks per child will resolve it. you can increase it safely after seeing it doesn't crash to 100/150.


vayakakshay
PROOP

a month ago

I have already added limited concurrency but still getting the issue

and you can check the metrics


vayakakshay
PROOP

a month ago

And regarding the duplicate count — you can ignore it. That issue was on my side, and I’ve already resolved it.


vayakakshay
PROOP

a month ago

I have limit: 32GB

and these are the my celery parameters:

CELERY_WORKER_CONCURRENCY = 7

CELERY_WORKER_PREFETCH_MULTIPLIER = 1

CELERY_TASK_ACKS_LATE = True

CELERY_WORKER_DISABLE_RATE_LIMITS = False

# Aggressive memory management for long-running tasks

CELERY_WORKER_MAX_MEMORY_PER_CHILD = 4194304 # 4 GB in KB

CELERY_WORKER_MAX_TASKS_PER_CHILD = 100

CELERY_WORKER_POOL = 'prefork' # Use solo pool to avoid multiprocessing issues


vayakakshay
PROOP

a month ago

4 GB * 7 = 28GB


vayakakshay

I have limit: 32GBand these are the my celery parameters:CELERY_WORKER_CONCURRENCY = 7CELERY_WORKER_PREFETCH_MULTIPLIER = 1CELERY_TASK_ACKS_LATE = TrueCELERY_WORKER_DISABLE_RATE_LIMITS = False# Aggressive memory management for long-running tasksCELERY_WORKER_MAX_MEMORY_PER_CHILD = 4194304 # 4 GB in KBCELERY_WORKER_MAX_TASKS_PER_CHILD = 100CELERY_WORKER_POOL = 'prefork' # Use solo pool to avoid multiprocessing issues

your celery parameters are causing it. you have 4GB left, but what you're attempting to do is compress everything through zstd, and its likely crashing because it needs more than 4GB of memory to compress everything. your main celery worker starts, loads your app into langsmith into memory, then you fork 7 child processes, and when its then trying to compress the data from all 7 workers at once, it crashes because it only have 4GB memory left.

you have 2 paths to fix this, either reduce the worker concurrency to 6, re-test it and see if it doesn't crash, and your other option is change it to solo pool. if it's still crashing with 6, then you just need to reduce it to 5. alternatively, you can make code-level changes to make 7 worker concurrency work if you work towards batching langsmith + add timeouts


vayakakshay
PROOP

a month ago

before that I am using less but its still crashed


vayakakshay
PROOP

a month ago

and one more thing in matrics why its only 3GB consumed


vayakakshay

and one more thing in matrics why its only 3GB consumed

crash is probably happening in ms time, and probably isnt being picked up in the metrics. if it's using less, how many celery workers did you use? in that case, youre attempting to compress large data and it's still crashing. your only option then is to rewrite and optimize langsmith


vayakakshay
PROOP

a month ago

but if we will decrease it then how to handle the 20+ users on platform?


vayakakshay

but if we will decrease it then how to handle the 20+ users on platform?

you have to optimize langsmith if you want to handle more celery workers, otherwise youd just have to reduce it. without seeing your source code, metrics of your APIs, etc. i cant tell you how itll handle anything


vayakakshay
PROOP

a month ago

okay let me try that one but whats your thought on like its feasible to use railways if we have more than 100 users because I have many agents that runs in background every single for one user


vayakakshay

okay let me try that one but whats your thought on like its feasible to use railways if we have more than 100 users because I have many agents that runs in background every single for one user

realistically you can scale by adding replicas, use routing, or if you just want to scale, then go enterprise, but it has a minimum monthly spend


vayakakshay
PROOP

a month ago

thats expensive


vayakakshay
PROOP

a month ago

Can you suggest me what kind of changes that I can do for langsmith optimization?


vayakakshay

Can you suggest me what kind of changes that I can do for langsmith optimization?

turn off artifacts, store large inputs/outputs in S3, avoid compressing data and store in s3, use something like orjson


Status changed to Solved ray-chen 23 days ago


Loading...