Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate line by line through a .docx file using apache POI [closed]

My object is to read a .docx file and to display the text of that on the view(Webpage).

I am using apache POI to read a .docx file in Grails Application Please suggest me a way to display the output on view without loosing Blankspaces and LineBreaks.

My .docx document content

This is a .docx document ...
this is second line
this is third line

Result on Groovy console after reading when i am printing :

This is a .docx document ...
this is second line
this is third line

But when i pass the output to view It becomes

This is a .docx document ... this is second line this is third line

.

My code is : 

    import org.apache.poi.xwpf.usermodel.XWPFDocument
    import org.apache.poi.xwpf.extractor.XWPFWordExtractor

    ...
            String str = "E:\\Query.docx"
            File docFile = null;
            docFile = new File(str);
            FileInputStream fis=new FileInputStream(docFile.getAbsolutePath());
            XWPFDocument doc = new XWPFDocument(fis)
            XWPFWordExtractor docExtractor =  new XWPFWordExtractor(doc)
            println docExtractor.getText()
    ...

if one can suggest me the way to iterate through each line of the document then i can easily get my result. Please help me i have got stucked.

like image 301
vishu Avatar asked Dec 08 '25 13:12

vishu


1 Answers

HTML ignores line breaks. So, while a string like "Hello there\nLine 2\n" renders fine in the console as

Hello There
Line 2

As HTML it'll all show on the same line. You'll need to replace the newline characters with some suitable HTML, eg <br /> or wrapping things in paragraph/div tags.

like image 93
Gagravarr Avatar answered Dec 10 '25 08:12

Gagravarr



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!