Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using pytesseract to generate a PDF from image

I am using the following code to generate a PDF from image.

PDF=pytesseract.image_to_pdf_or_hocr(test_image,lang='dan',config='',nice=0,extension='pdf')

and the type of PDF variable is being shown as BYTES.

HOw Do i publish or get the PDF generated?

like image 611
sayan_sen Avatar asked Nov 16 '25 16:11

sayan_sen


2 Answers

I have found the answer. Just to close the thread, posting the same.

 f = open("demofile.pdf", "w+b")
 f.write(bytearray(pdf))
 f.close()

demofile.pdf happens to be resultant pdf which gets published in the workspace.

like image 165
sayan_sen Avatar answered Nov 19 '25 05:11

sayan_sen


From Pytesseract-PYPI:

Get a searchable PDF

pdf = pytesseract.image_to_pdf_or_hocr('test.png', extension='pdf')
with open('test.pdf', 'w+b') as f:
    f.write(pdf) # pdf type is bytes by default
like image 37
lousycoder Avatar answered Nov 19 '25 05:11

lousycoder