8 months ago
Hi, I'm trying to leverage my instance's entire vCPU (8 vCPUs) to perform OCR on PDFs. My code works locally, but when I try to use the CPUs on my fastapi python railway instance, it never goes past 1 vCPU and it claims CPU usage is 0%. How can I get multi processing to work?
I'm using python, fastapi, pytesseract, psutil and ProcessPoolExecutor
with concurrent.futures.ProcessPoolExecutor(max_workers=max_workers) as executor: futures = {executor.submit(process_page_wrapper, page_args): idx for idx, page_args in enumerate(page_args)} for future in concurrent.futures.as_completed(futures): idx = futures[future] try: processed_page = future.result() result[idx]= processed_page logging.info(f"processed_page {idx} successfully") except Exception as e: logging.error(f"Failed to process processed_page {idx}: {e}")
2 Replies
8 months ago
Update: I set the max_workers to 8 explicitly and that seemed to help. However, the processing time per job is significantly slower on this instance than my local machine (3-5x slower per page). A 3-second process is taking 15 seconds on this instance. Why would this be happening? I need to be able to process something like a 600-page pdf in under 5 minutes.
Status changed to Awaiting User Response railway[bot] • 8 months ago