Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pytesseract fail to recognize digits in this simple image?

I'm trying to use pytesseract to recognize two numbers from an image:

enter image description here

  • I have tried --psm 6 up to 10
  • I have tried -c tessedit_char_whitelist=0123456789'

None of the above returns 49 number. Closest I got is returned 4 without 9

Do you have any tips about how to make tesseract recognize it ?

like image 684
Povilas Avatar asked Oct 15 '25 15:10

Povilas


1 Answers

Try --psm 13 --oem 3 (oem = 1 or 2 should do also)

import pytesseract
from PIL import Image
import requests
import io

response = requests.get('https://i.sstatic.net/oAAXR.png')
text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
                    config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

print(text)

yields 49 as you expect on my machine.

I get the same result by downloading the image locally and firing

tesseract oAAXR.png output --oem 3 --psm 13 -l eng

For reference my tesseract --version gives tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.

like image 60
Davide Fiocco Avatar answered Oct 18 '25 09:10

Davide Fiocco



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!