5 days ago
Hello Team,
I’m facing an Out of Memory issue while running Celery with Django on Railway.
When I checked the service metrics, it shows that Celery is using around 3GB of memory, but my Pro plan allows up to 32GB.
Could you please provide some information on why this error occurs and how I can utilize more memory for my Celery workers?
Thank you,
Akshay
26 Replies
5 days ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
5 days ago
hello?
4 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open brody • 4 days ago
vayakakshay
hello?
4 days ago
Go to your celery service -> settings -> resource limits (shown below) and confirm it's at 32GB.
if its at ~32 and its still being OOM, can you provide your logs?
Attachments
monuit
Go to your celery service -> settings -> resource limits (shown below) and confirm it's at 32GB.if its at ~32 and its still being OOM, can you provide your logs?
4 days ago
No here its already 32 selected
4 days ago
we need to see your logs, can you provide them?
4 days ago
?
vayakakshay
Also check this logs
4 days ago
hey, you have 2 separate issues.
you have to fix the dup count within /app/keyword_repo/views.py and ensure that the duplicate_count variable is defined
For your celery worker, it's just failing due to memory constantly, but one thing you can set is concurrency and max tasks to see if it resolves the actual memory issue, so in your start command, celery -A your_app.celery worker --loglevel=info --concurrency=4 --max-tasks-per-child=50. without directly seeing your source code, i cant tell if there's a small memory leak, but the max tasks per child will resolve it. you can increase it safely after seeing it doesn't crash to 100/150.
4 days ago
I have already added limited concurrency but still getting the issue
and you can check the metrics
4 days ago
And regarding the duplicate count — you can ignore it. That issue was on my side, and I’ve already resolved it.
4 days ago
I have limit: 32GB
and these are the my celery parameters:
CELERY_WORKER_CONCURRENCY = 7
CELERY_WORKER_PREFETCH_MULTIPLIER = 1
CELERY_TASK_ACKS_LATE = True
CELERY_WORKER_DISABLE_RATE_LIMITS = False
# Aggressive memory management for long-running tasks
CELERY_WORKER_MAX_MEMORY_PER_CHILD = 4194304 # 4 GB in KB
CELERY_WORKER_MAX_TASKS_PER_CHILD = 100
CELERY_WORKER_POOL = 'prefork' # Use solo pool to avoid multiprocessing issues
4 days ago
4 GB * 7 = 28GB
vayakakshay
I have limit: 32GBand these are the my celery parameters:CELERY_WORKER_CONCURRENCY = 7CELERY_WORKER_PREFETCH_MULTIPLIER = 1CELERY_TASK_ACKS_LATE = TrueCELERY_WORKER_DISABLE_RATE_LIMITS = False# Aggressive memory management for long-running tasksCELERY_WORKER_MAX_MEMORY_PER_CHILD = 4194304 # 4 GB in KBCELERY_WORKER_MAX_TASKS_PER_CHILD = 100CELERY_WORKER_POOL = 'prefork' # Use solo pool to avoid multiprocessing issues
4 days ago
your celery parameters are causing it. you have 4GB left, but what you're attempting to do is compress everything through zstd, and its likely crashing because it needs more than 4GB of memory to compress everything. your main celery worker starts, loads your app into langsmith into memory, then you fork 7 child processes, and when its then trying to compress the data from all 7 workers at once, it crashes because it only have 4GB memory left.
you have 2 paths to fix this, either reduce the worker concurrency to 6, re-test it and see if it doesn't crash, and your other option is change it to solo pool. if it's still crashing with 6, then you just need to reduce it to 5. alternatively, you can make code-level changes to make 7 worker concurrency work if you work towards batching langsmith + add timeouts
4 days ago
before that I am using less but its still crashed
4 days ago
and one more thing in matrics why its only 3GB consumed
vayakakshay
and one more thing in matrics why its only 3GB consumed
4 days ago
crash is probably happening in ms time, and probably isnt being picked up in the metrics. if it's using less, how many celery workers did you use? in that case, youre attempting to compress large data and it's still crashing. your only option then is to rewrite and optimize langsmith
4 days ago
but if we will decrease it then how to handle the 20+ users on platform?
vayakakshay
but if we will decrease it then how to handle the 20+ users on platform?
4 days ago
you have to optimize langsmith if you want to handle more celery workers, otherwise youd just have to reduce it. without seeing your source code, metrics of your APIs, etc. i cant tell you how itll handle anything
4 days ago
okay let me try that one but whats your thought on like its feasible to use railways if we have more than 100 users because I have many agents that runs in background every single for one user
vayakakshay
okay let me try that one but whats your thought on like its feasible to use railways if we have more than 100 users because I have many agents that runs in background every single for one user
4 days ago
realistically you can scale by adding replicas, use routing, or if you just want to scale, then go enterprise, but it has a minimum monthly spend
3 days ago
thats expensive
3 days ago
Can you suggest me what kind of changes that I can do for langsmith optimization?
vayakakshay
Can you suggest me what kind of changes that I can do for langsmith optimization?
3 days ago
turn off artifacts, store large inputs/outputs in S3, avoid compressing data and store in s3, use something like orjson