From the tesseract v3.03 's release note, tesseract is now supporting render PDF output with searchable text, but I don't know how to use this feature in my code.
Currently I use tess-two for my android app, then I just wonder can this feature work for android? 
It would be great if you can give me an example that uses tesseract api to render pdf, and then I will try to port missing functions for tess-two library.
Thanks in advance.
P/s: I can see the pdfrenderer file which may handle render pdf output, but I don't know how to apply it with base api.
Update: here is my try:
 tesseract::TessResultRenderer* renderer = new tesseract::TessPDFRenderer(nat->api.GetDatapath());
__android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "data path = %s", nat->api.GetDatapath());
if (!nat->api.ProcessPages(c_file_name, NULL, 0, renderer)) {
    __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "process page failed");
    delete renderer;
    return;
}
FILE* fout = fopen(c_pdf_file_name, "wb");
if (fout == NULL) {
    __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot create output file %s\n", c_pdf_file_name);
    delete renderer;
    return;
}
const char* data;
int dataLength;
bool boolValue = renderer->GetOutput(&data, &dataLength);
if (boolValue) {
    fwrite(data, 1, dataLength, fout);
    if (fout != stdout)
        fclose(fout);
    else
        clearerr(fout);
}else{
    __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot get output file");
}
    
delete renderer;
My code is failed at ProcessPages method. After write log (I have a problem with debugging in ndk), I found pdfrender BeginDocument always return false in TessBaseAPI::ProcessPages method of baseapi.cpp:
if (renderer && !renderer->BeginDocument(kUnknownTitle)) {
    success = false;
 }
Do I miss something?
P/s: I use tess-two, which prefer baseapi to capi
It's as follows:
TessResultRenderer renderer = api.TessPDFRendererCreate(dataPath);
api.TessBaseAPIProcessPages1(handle, image, null, 0, renderer);
PointerByReference data = new PointerByReference();
IntByReference dataLength = new IntByReference();
api.TessResultRendererGetOutput(renderer, data, dataLength);
byte[] bytes = data.getValue().getByteArray(0, dataLength);
// then write bytes array to a file with PDF extension.
If you have problem following the codes, check out the renderer example in this post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With