Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve contrast and quality of barely visible old text written with diluted ink using OpenCV

Following is an image of page from old parish records. As you can see, the text is barely visible, this is due to use of ink diluted with little too much water... Still, if you try hard enough, you can actually see the letters. I would like to figure out a way to automatically fix such pages to make the text better visible/readable.

enter image description here

Now I have tried manually in IrfanView some basic effects, the best I got was using edge detection, but still it was from from readable. Now I am trying opencv in Python and with binary threshold I am achieving some results:

img = cv2.imread('parish_page.png',cv2.IMREAD_GRAYSCALE)
img = cv2.threshold(img, 240, 255, cv2.THRESH_BINARY)[1]
cv2.imwrite('processed.png',img)

enter image description here

However this seems to create lots of noise around, also it kind of destroyed right borders of the page. Is there a way to make it cleaner, and/or perhaps even more readable?

I'll be glad for any tips, thanks in advance.

like image 303
Sil Avatar asked Oct 14 '25 18:10

Sil


2 Answers

In Imagemagick, you could use local area thresholding. (OpenCV has something similar called adaptive thresholding.)

Input:

enter image description here

convert img.png -negate -lat 20x20+2% -negate result.png


enter image description here

Lower/raise the 2% to get more/less gain.

like image 88
fmw42 Avatar answered Oct 17 '25 06:10

fmw42


Here's a potential approach

  • Perform adaptive histogram equalization (CLAHE)
  • Apply a sharpen filter using cv2.filter2D()
  • Adaptive threshold

CLAHE

Now we apply a sharpen kernel using cv2.filter2D(). You could try other filters.

[ 0 -1  0]
[-1  5 -1]
[ 0 -1  0]

Finally we perform adaptive thresholding

Other potential steps after this could be to perform morphological transformations to remove noise and further filter the image but since the particles are so tiny, even a (3x3) kernel removes too much detail

import cv2
import numpy as np

image = cv2.imread('1.png', 0)
clahe = cv2.createCLAHE().apply(image)

sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(clahe, -1, sharpen_kernel)

thresh = cv2.threshold(sharpen, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

cv2.imshow('clahe', clahe)
cv2.imwrite('clahe.png', clahe)
cv2.imshow('sharpen', sharpen)
cv2.imwrite('sharpen.png', sharpen)
cv2.imshow('thresh', thresh)
cv2.imwrite('thresh.png', thresh)
cv2.waitKey()
like image 39
nathancy Avatar answered Oct 17 '25 07:10

nathancy