4 months ago
I have a FastAPI app that runs request is parallel. The app had been working fine but then suddenly started getting this error:
INFO - POST request to /run_app from Address(host='100.64.0.7', port=58342): Successfully received.
ERROR - A child process terminated abruptly, the process pool is not usable anymore
Traceback (most recent call last):
File "/app/src/api/main.py", line 350, in upload_problem
future = loop.run_in_executor(executor, process_request, simulation_request)
File "uvloop/loop.pyx", line 2747, in uvloop.loop.Loop.run_in_executor
File "/mise/installs/python/3.13.9/lib/python3.13/concurrent/futures/process.py", line 791, in submit
raise BrokenProcessPool(self._broken)
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
I did a rebuilt and that of course fixed the problem and requested started processing again.
While shutting down the logs showed the following:
/mise/installs/python/3.13.9/lib/python3.13/multiprocessing/resource_tracker.py:324: UserWarning: resource_tracker: There appear to be 10 leaked semaphore objects to clean up at shutdown: {'/loky-130-tbtbnu5p', '/loky-247-0n67tgdf', '/loky-6-asizrpi6', '/loky-120-td_xhej4', '/loky-154-4pn23wdt', '/loky-6-sq6kxylg', '/loky-6-yhbufqtz', '/loky-211-lma7dd7l', '/loky-6-2sq1syj7', '/loky-6-j4h2014r'}
I do not know if this was the source of the problem.
Do you have any suggestions?
Thanks for your help.
Pinned Solution
4 months ago
I think I found the issue. I was using a custom start command with --workers 17 when I only have 8 VCPU. In my FastAPI I was calling app.state.executor = ProcessPoolExecutor() without specifying max_workers. This, I found, out was spawning a large number of child processes. I am now using: gunicorn -k uvicorn.workers.UvicornWorker --workers 2 main:app -b 0.0.0.0:8080 --max-requests 10 --max-requests-jitter 3 --pid /tmp/gunicorn.pid as the start command. I am attaching the new deployment logs. Would this explain my original problem?
Attachments
4 Replies
4 months ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
Status changed to Awaiting User Response Railway • 4 months ago
4 months ago
I think I found the issue. I was using a custom start command with --workers 17 when I only have 8 VCPU. In my FastAPI I was calling app.state.executor = ProcessPoolExecutor() without specifying max_workers. This, I found, out was spawning a large number of child processes. I am now using: gunicorn -k uvicorn.workers.UvicornWorker --workers 2 main:app -b 0.0.0.0:8080 --max-requests 10 --max-requests-jitter 3 --pid /tmp/gunicorn.pid as the start command. I am attaching the new deployment logs. Would this explain my original problem?
Attachments
Status changed to Awaiting Conductor Response Railway • 4 months ago
carrascocesar
I think I found the issue. I was using a custom start command with --workers 17 when I only have 8 VCPU. In my FastAPI I was calling app.state.executor = ProcessPoolExecutor() without specifying max_workers. This, I found, out was spawning a large number of child processes. I am now using: gunicorn -k uvicorn.workers.UvicornWorker --workers 2 main:app -b 0.0.0.0:8080 --max-requests 10 --max-requests-jitter 3 --pid /tmp/gunicorn.pid as the start command. I am attaching the new deployment logs. Would this explain my original problem?
4 months ago
That would explain the issue, if the issue appears again feel free to create another thread.
Status changed to Awaiting User Response Railway • 4 months ago
Status changed to Solved passos • 4 months ago
Status changed to Solved passos • 4 months ago