Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting equations and images from Word

Is there a programmatic way to extract equations (and possibly images) from an MS Word document? I've googled all over, but have yet to find anything that I can sink my teeth into and work from. If possible, I'd like to be able to do this with VB.NET or C#, but I can pick up enough of any language to hack out a DLL. Thanks!

EDIT: Right now I'm looking at extracting the equations from Word 2003, but if converting it to 2007/Open XML is required, that's fine.

like image 711
AndrewBurton Avatar asked Dec 23 '22 13:12

AndrewBurton


2 Answers

What Word format are your documents in? If they are in Open XML (file extension .docx) you could use the Open XML SDK available from Microsoft to extract images and embedded content.

An Open XML file is nothing but a zip archive using a special structure. You will find examples in the SDK how to access parts of that zip archive. Actually you could use any zip-capable library to extract the content from the document package.

If the documents still use the older binary format things are a bit more complicated. I think the easiest way would be to convert the documents to the Open XML format. There are several ways to do this:

  • Get the free and open b2xtranslator from SourceForge which offers you C# dlls for file conversion.
  • Install Microsoft's Compatibility Pack and use the following command line for conversion:

    "C:\Program Files\Microsoft Office\Office12\wordconv.exe" -oice -nme input\_file output_file

where input_file and output_file must be full path names.

like image 159
Dirk Vollmar Avatar answered Jan 08 '23 18:01

Dirk Vollmar


I don't know if any of this will help, but the object model in Word 2000/2003 has an InlineShapes collection as part of the Document object which represents embedded images and possibly similar objects like equations.

Some VBA code to copy the first item onto the clipboard, which might help you extract them:

ThisDocument.InlineShapes.Items(1).Select
Selection.Copy

It's accessible in .NET too, MSDN link.

like image 34
xahtep Avatar answered Jan 08 '23 16:01

xahtep



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!