Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to extract a picture from an excel file using R? It could then be placed into the tesseract ocr

Tags:

r

excel

tesseract

I have multiple excel files with pictures in one of the sheets. Is there a way to extract the image (image path) into R to then be placed into the tesseract ocr.

Previously I used the openxlsx package's function loadWorkbook:

wb <- openxlsx::loadWorkbook("C:/Users/.../test_file.xlsx")

when you output wb:

A Workbook object.

Worksheets:
 Sheet 1: "Sheet1"



Images:
 Image 1: "C:/Users/..../AppData/Local/Temp/RtmpuUQZm7//file41e..._openxlsx_loadworkbook/xl/media/image1.png"
 Worksheet write order: 1

Is there anyway to get this image path? The type variable is a workbook object and when you do type of it is "S4" so it appears that I can't convert it to a character and pull out the path.

like image 521
brenda e Avatar asked Nov 05 '25 05:11

brenda e


1 Answers

You can access the image path with the @media slot of your workbook object.

Here's a reprex of plotting a PNG stored within an xlsx file:

require(png)
require(openxlsx)
require(grid)

wb  <- openxlsx::loadWorkbook("~/img.xlsx")
img <- png::readPNG([email protected]$media[1])
grid::grid.newpage()
grid::grid.raster(img)

Created on 2020-03-04 by the reprex package (v0.3.0)

like image 136
Allan Cameron Avatar answered Nov 06 '25 17:11

Allan Cameron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!