I'm trying to extract digits from a sudoku board. after detecting the board, its corners and transforming, I'm left with a pretty lined up image of only the board. Now I'm trying to recognize the digits using Tesseract android implementation, Tess-Two. I split the image to 9 parts by
currentCell = undistortedThreshed.submat(rect);
where rect is the rectangle that surrounds the image.
Now to the digits recognition.
Some digits, like 4, it recognize perfectly. Some, mostly 6,7,8 are recognized as 0s or nothing.
I want to help tesseract as much as I can by cleaning the currentCell image. at the moment it looks like this
 . (also tried without the Inverted thresholding). I want to get rid of the white lines (the sudoku lines).
I've tried something like this (taken from here)
. (also tried without the Inverted thresholding). I want to get rid of the white lines (the sudoku lines).
I've tried something like this (taken from here)
Imgproc.Canny(currentCell, currentCell, 80, 90);
Mat lines = new Mat();
int threshold = 50;
int minLineSize = 5;
int lineGap = 20;
Imgproc.HoughLinesP(currentCell, lines, 1, Math.PI / 180,
        threshold, minLineSize, lineGap);
for (int x = 0; x < lines.cols() && x < 1; x++) {
    double[] vec = lines.get(0, x);
    double x1 = vec[0], y1 = vec[1], x2 = vec[2], y2 = vec[3];
    Point start = new Point(x1, y1);
    Point end = new Point(x2, y2);
    Core.line(currentCell, start, end, new Scalar(255), 10);
}
but it doesn't draw anything, I tried messing with the line's width and color, but still nothing. Tried drawing the line on the large image, on the unthreshed image, nothing works..
Any suggestions?
EDIT
For some reason, it can't seems to find any lines.
This is what that image looks after applying canny to it  but the
 but the HoughLines doesn't detect any lines. Tried both HoughLines and HoughLinesP with different values, as shown in the OpenCV documentation, but nothing works...
Those are pretty obvious lines.. what am I doing wrong?
Thanks!
I ended up doing something different.
I used findContours to get the biggest contour, which is the digit.
Got its bounding box by using boundingRect.
Extracted this using submat and voilla. I got only the digit.
Unfortunately, it seems to make no difference at all. Tesseract still can't recognize the digits correctly. Sometimes it gives no result, sometimes, after dilating the digits it recognizes the 6 as 0. But that's an issue for another question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With