Im working on a projet where I need to extract informations from resume in pdf format, the problem is when I use libraries like pdfminer ect sometimes the text extracted is not the good result because it gets lines overlapped with other lines from another box of text, thats why I thought of using layout parser first before extracting the text to extract text based on boxes of text
pytesseract.pytesseract.tesseract_cmd ="C/Users/faty/Downloads/tesseract-ocr-w64-setup-v5.1.0.20220510.exe"
poppler_path="C:/Users/faty/Downloads/Release-22.04.0-0/poppler-22.04.0/Library/bin"
model = lp.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.5],
label_map={0: "Text", 1: "Title", 2: "List",
3:"Table",4:"Figure"})
layout_result = model.detect(img)
lp.draw_box(img, layout_result, box_width=5, box_alpha=0.2, show_element_type=True)
I Get this error : AttributeError: module layoutparser has no attribute Detectron2LayoutModel
Actually the attribute Detectron2LayoutModel can be accessed only inside models:
model = lp.models.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.5],
label_map={0: "Text", 1: "Title", 2: "List",
3:"Table",4:"Figure"})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With