Why does pytesseract fail to recognize digits in this simple image?

Question

I'm trying to use pytesseract to recognize two numbers from an image:

enter image description here

I have tried --psm 6 up to 10
I have tried -c tessedit_char_whitelist=0123456789'

None of the above returns 49 number. Closest I got is returned 4 without 9

Do you have any tips about how to make tesseract recognize it ?

Davide Fiocco · Accepted Answer

Try --psm 13 --oem 3 (oem = 1 or 2 should do also)

import pytesseract
from PIL import Image
import requests
import io

response = requests.get('https://i.sstatic.net/oAAXR.png')
text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
                    config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

print(text)

yields 49 as you expect on my machine.

I get the same result by downloading the image locally and firing

tesseract oAAXR.png output --oem 3 --psm 13 -l eng

For reference my tesseract --version gives tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.

Why does pytesseract fail to recognize digits in this simple image?

Tags:

python

ocr

tesseract

python-tesseract

Povilas

1 Answers

Davide Fiocco

Recent Activity

Donate For Us

Why does pytesseract fail to recognize digits in this simple image?

Tags:

python

ocr

tesseract

python-tesseract

Povilas

1 Answers

Davide Fiocco

Related questions

Recent Activity

Donate For Us