I am searching for a JavaScript library, which can read .doc - and .docx - files. The focus is only on the text content. I am not interested in pictures, formulas or other special structures in MS-Word file.
It would be great if the library works with to JavaScript FileReader as shown in the code below.
function readExcel(currfile) {
  var reader = new FileReader();
  reader.onload = (function (_file) {
      return function (e) {
          //here should the magic happen
      };
  })(currfile);
  reader.onabort = function (e) {
      alert('File read canceled');
  };
  reader.readAsBinaryString(currfile);
}
I searched through the internet, but I could not get what I was looking for.
First install NodeJS file system. Second is pdf reader. Install Xlsx for reading Xls, xlsx workbooks. node-stream-zip is to read doc and Docx file.
From the document library select- Settings > Document Library Settings > General Settings > Advanced Settings > Browser-enabled Documents > Select the "Display as a Web page" option.
Just append your src attribute with an appropriate URL to a specific doc viewer, it will download your file from URL and then generate an HTML page from it, and then you direct your iframe to it and voila!
If you wanted to pre-process your DOCX files, rather than waiting until runtime you could convert them into HTML first by using a file conversion API such as Zamzar. You could use the API to programatically convert from DOCX to HMTL, save the output to your server and then serve that HTML up to your end users.
You can use docxtemplater for this (even if normally, it is used for templating, it can also just get the text of the document) :
var zip = new JSZip(content);
var doc=new Docxtemplater().loadZip(zip)
var text= doc.getFullText();
console.log(text);
See the Doc for installation information (I'm the maintainer of this project)
However, it only handles docx, not doc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With