Loading...

python ProcessPoolExecutor not working

vihardesu

PRO

8 months ago

Hi, I'm trying to leverage my instance's entire vCPU (8 vCPUs) to perform OCR on PDFs. My code works locally, but when I try to use the CPUs on my fastapi python railway instance, it never goes past 1 vCPU and it claims CPU usage is 0%. How can I get multi processing to work?

I'm using python, fastapi, pytesseract, psutil and ProcessPoolExecutor

        with concurrent.futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
            futures = {executor.submit(process_page_wrapper, page_args): idx for idx, page_args in enumerate(page_args)}
            for future in concurrent.futures.as_completed(futures):
                idx = futures[future]
                try:
                    processed_page = future.result()
                    result[idx]= processed_page
                    logging.info(f"processed_page {idx} successfully")
                except Exception as e:
                    logging.error(f"Failed to process processed_page {idx}: {e}")

Awaiting User Response

2 Replies

vihardesu

PRO

8 months ago

Update: I set the max_workers to 8 explicitly and that seemed to help. However, the processing time per job is significantly slower on this instance than my local machine (3-5x slower per page). A 3-second process is taking 15 seconds on this instance. Why would this be happening? I need to be able to process something like a 600-page pdf in under 5 minutes.

brody

EMPLOYEE

8 months ago

Hello,

Perhaps you could try our metal regions, they have faster CPUs.

Status changed to Awaiting User Response railway[bot] • 8 months ago