Improve a picture to detect the characters within an area

Question

My goal is to detect the characters on images of this kind. Input Image

I need to improve the image so that Tesseract does a better recognition, probably by doing the following steps:

Rotate the image so that the blue rectangle is horizontal [Need help on this]
Crop the image according to the blue rectangle [Need help on this]
Apply a thresholding filter and a gaussian blur

Use Tesseract to detect the characters

img = Image.open('grid.jpg')
image = np.array(img.convert("RGB"))[:, :, ::-1].copy()


# Need to rotate the image here and fill the blanks
# Need to crop the image here

# Gray  the image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Otsu's thresholding
ret3, th3 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# Gaussian Blur
blur = cv2.GaussianBlur(th3, (5, 5), 0)

# Save the image
cv2.imwrite("preproccessed.jpg", blur)

# Apply the OCR
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
tessdata_dir_config = r'--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata" --psm 6'

preprocessed = Image.open('preproccessed.jpg')
boxes = pytesseract.image_to_data(preprocessed, config=tessdata_dir_config)

Here is the output image I get which is not perfect for the OCR: output

OCR problems:

The blue rectangle is sometimes recognized as characters, this is why I would like to crop the image
Sometimes Tesseract recognizes the characters on a line as a word (GCVDRTEUQCEBURSIDEEC) and some other times as individual letters. I would like it to be always a word.
The little pyramid at the bottom right is recognized as a character

Any other suggestions to improve the recognition are welcome

Mark Setchell · Accepted Answer

Here's one idea for a way to proceed...

Convert to HSV, then start in each corner and progress towards the middle of the picture looking for the nearest pixel to each corner that is somewhat saturated and has a hue matching your blueish surrounding rectangle. That will give you the 4 points marked in red:

enter image description here

Now use a perspective transform to shift each of those points to the corner to make the image rectilinear. I used ImageMagick but you should be able to see that I translate the top-left red dot at coordinates (210,51) into the top-left of the new image at (0,0). Likewise, the top-right red dot at (1754,19) gets shifted to (2064,0). The ImageMagick command in Terminal is:

convert wordsearch.jpg \
  -distort perspective '210,51,0,0 1754,19,2064,0 238,1137,0,1161 1776,1107,2064,1161' result.jpg

That results in this:

enter image description here

The next issue is uneven lighting - namely the bottom-left is darker than the rest of the image. To offset this, I clone the image and blur it to remove high frequencies (just a box-blur, or box-average is fine) so it now represents the slowly varying illumination. I then subtract the image from this so I am effectively removing background variations and leaving only high-frequency things - like your letters. I then normalize the result to make whites white and blacks black and threshold at 50%.

convert result.jpg -colorspace gray $ +clone -blur 50x50 $ \
   -compose difference -composite  -negate -normalize -threshold 50% final.jpg

enter image description here

The result should be good for template matching if you know the font and letters or for OCR if you don't.

Improve a picture to detect the characters within an area

Tags:

python

opencv

computer-vision

ocr

Yohan D

1 Answers

Mark Setchell

Recent Activity

Donate For Us

Improve a picture to detect the characters within an area

Tags:

python

opencv

computer-vision

ocr

Yohan D

1 Answers

Mark Setchell

Related questions

Recent Activity

Donate For Us