Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Paddle OCR not able to extract single digits from image

I am doing an image OCR using Paddle OCR. Below is the sample code I am using:

from paddleocr import PaddleOCR
import os

image_file = "3_496.png"

ocr = PaddleOCR(use_gpu=True)
image_path = os.path.join(os.getcwd(), image_file)
result = ocr.ocr(image_path, cls=True)

if None not in result:
    print(f"OCR result for {image_file}:")
    for line in result[0]:
        print(line[1][0])

Below is the image file:

enter image description here

In this image, Paddle OCR is able to extract almost all the items but failed in extracting 4 (PACK SIZE). I have some more very similar images and have noticed that it fails in extracting single digits. Is it true or may be I am doing something wrong?

Below is the output after running the above code:

OCR result for 3_496.png: PAGLNILE. PIGRWGIGHT 400g DATCHNUMHERE USLOYI 205257 31/07/2024 12.22. CUTIN NZ

like image 781
S Andrew Avatar asked Oct 29 '25 02:10

S Andrew


1 Answers

Try to use other model, in my case I was able to solve this by using the english model

ocr = PaddleOCR(use_angle_cls=True, lang='en')

If you still struggle with this try preprocessing image and invert colours

import cv2
import numpy as np

img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
_, img_thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)  # Invert colors
cv2.imwrite("processed_image.png", img_thresh)

And check these parameters values you are using : det_db_box_thresh, det_db_thresh, det_algorithm (full documentation)

like image 89
Zikofs Avatar answered Oct 30 '25 18:10

Zikofs



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!