Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For some cases, convert XHTML to docx with docx4j lose original style

during my test, I find when converting following html content to docx, the original style will be lost.

  1. < a href="http://www.google.com">Google< /a>

    result in docx: no underline.

  2. < p>< span style="text-decoration: underline;">underline text< /span>< /p>

    result in docx: no underline

  3. < p>< span style="text-decoration: line-through;">delete text< /span>< /p>

    result in docx: no delete line

  4. < p style="margin-left:30.0px;">indent text< /p>

    result in docx: no any indent

  5. < h1>header line< /h1>

    result in docx: only plain text

  6. < p>< span style="background-color: rgb(255,255,0);">background color< /span>< /p>

    result in docx:no any background-color

  7. < hr/>

    result in docx: empty

  8. < table style="border-width:1px;">...

    result in docx: no border

  9. < span style="font-family: arial , helvetica , sans-serif;font-size: large;">...

    result in docx: font setting all missed.

Any one know how to deal with these issues? or any workaround?

like image 933
simpletosimple Avatar asked Nov 27 '25 16:11

simpletosimple


1 Answers

The comment in the XHTMLImporter source code notes that some of these things remain to be implemented.

Re your #4, I think indent is supported. Maybe just not for the units you have used?

Re your #8 table borders, there is some support for these; Google for other posts.

Implementing underline, delete, and background-color all ought to be straightforward.

If you'd like to do that, we're happy to accept a pull request.

like image 193
JasonPlutext Avatar answered Nov 30 '25 10:11

JasonPlutext