python ProcessPoolExecutor not working

vihardesu
PRO

8 months ago

Hi, I'm trying to leverage my instance's entire vCPU (8 vCPUs) to perform OCR on PDFs. My code works locally, but when I try to use the CPUs on my fastapi python railway instance, it never goes past 1 vCPU and it claims CPU usage is 0%. How can I get multi processing to work?

  • I'm using python, fastapi, pytesseract, psutil and ProcessPoolExecutor

            with concurrent.futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
                futures = {executor.submit(process_page_wrapper, page_args): idx for idx, page_args in enumerate(page_args)}
                for future in concurrent.futures.as_completed(futures):
                    idx = futures[future]
                    try:
                        processed_page = future.result()
                        result[idx]= processed_page
                        logging.info(f"processed_page {idx} successfully")
                    except Exception as e:
                        logging.error(f"Failed to process processed_page {idx}: {e}")
Awaiting User Response

2 Replies

vihardesu
PRO

8 months ago

Update: I set the max_workers to 8 explicitly and that seemed to help. However, the processing time per job is significantly slower on this instance than my local machine (3-5x slower per page). A 3-second process is taking 15 seconds on this instance. Why would this be happening? I need to be able to process something like a 600-page pdf in under 5 minutes.


8 months ago

Hello,

Perhaps you could try our metal regions, they have faster CPUs.


Status changed to Awaiting User Response railway[bot] 8 months ago


python ProcessPoolExecutor not working - Railway Help Station