during my test, I find when converting following html content to docx, the original style will be lost.
< a href="http://www.google.com">Google< /a>
result in docx: no underline.
< p>< span style="text-decoration: underline;">underline text< /span>< /p>
result in docx: no underline
< p>< span style="text-decoration: line-through;">delete text< /span>< /p>
result in docx: no delete line
< p style="margin-left:30.0px;">indent text< /p>
result in docx: no any indent
< h1>header line< /h1>
result in docx: only plain text
< p>< span style="background-color: rgb(255,255,0);">background color< /span>< /p>
result in docx:no any background-color
< hr/>
result in docx: empty
< table style="border-width:1px;">...
result in docx: no border
< span style="font-family: arial , helvetica , sans-serif;font-size: large;">...
result in docx: font setting all missed.
Any one know how to deal with these issues? or any workaround?
The comment in the XHTMLImporter source code notes that some of these things remain to be implemented.
Re your #4, I think indent is supported. Maybe just not for the units you have used?
Re your #8 table borders, there is some support for these; Google for other posts.
Implementing underline, delete, and background-color all ought to be straightforward.
If you'd like to do that, we're happy to accept a pull request.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With