My goal is to detect the characters on images of this kind.

I need to improve the image so that Tesseract does a better recognition, probably by doing the following steps:
Use Tesseract to detect the characters
img = Image.open('grid.jpg')
image = np.array(img.convert("RGB"))[:, :, ::-1].copy()
# Need to rotate the image here and fill the blanks
# Need to crop the image here
# Gray the image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Otsu's thresholding
ret3, th3 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Gaussian Blur
blur = cv2.GaussianBlur(th3, (5, 5), 0)
# Save the image
cv2.imwrite("preproccessed.jpg", blur)
# Apply the OCR
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
tessdata_dir_config = r'--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata" --psm 6'
preprocessed = Image.open('preproccessed.jpg')
boxes = pytesseract.image_to_data(preprocessed, config=tessdata_dir_config)
Here is the output image I get which is not perfect for the OCR:
OCR problems:
Any other suggestions to improve the recognition are welcome
Here's one idea for a way to proceed...
Convert to HSV, then start in each corner and progress towards the middle of the picture looking for the nearest pixel to each corner that is somewhat saturated and has a hue matching your blueish surrounding rectangle. That will give you the 4 points marked in red:

Now use a perspective transform to shift each of those points to the corner to make the image rectilinear. I used ImageMagick but you should be able to see that I translate the top-left red dot at coordinates (210,51) into the top-left of the new image at (0,0). Likewise, the top-right red dot at (1754,19) gets shifted to (2064,0). The ImageMagick command in Terminal is:
convert wordsearch.jpg \
-distort perspective '210,51,0,0 1754,19,2064,0 238,1137,0,1161 1776,1107,2064,1161' result.jpg
That results in this:

The next issue is uneven lighting - namely the bottom-left is darker than the rest of the image. To offset this, I clone the image and blur it to remove high frequencies (just a box-blur, or box-average is fine) so it now represents the slowly varying illumination. I then subtract the image from this so I am effectively removing background variations and leaving only high-frequency things - like your letters. I then normalize the result to make whites white and blacks black and threshold at 50%.
convert result.jpg -colorspace gray \( +clone -blur 50x50 \) \
-compose difference -composite -negate -normalize -threshold 50% final.jpg

The result should be good for template matching if you know the font and letters or for OCR if you don't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With