I'm having difficulties understanding the "run" object. As described, run object identifies the same style of text continuation. However, when I run a paragraph which all words are in the same style, "runs" still return me more than one line.
To make sure I didn't miss any possible style issues, I created a new word doc and typed as below
Hjkhkuhu joiuiuouoiuo iouiouououoi iouiououiuiuiui hhvvhgh hgjjhjhhh hjhjhjhjhjhj hjhjhj, jjkjkjk jkjkjkjkiuio uiouiouoo! jkjkjlkjlk
And I run below code:
from docx import Document
doc = Document('test.docx')
for p in doc.paragraphs:
for run in p.runs:
print(run.text)
And here is the result I got:
Hjkhkuhu
joiuiuouoiuo
iouiouououoi
iouiououiuiuiui
hhvvhgh
hgjjhjhhh
hjhjhjhjhjhj
hjhjhj
jjkjkjk
jkjkjkjkiuio
uiouiouoo ! jkjkjlkjlk
Why is this the case? Did I miss anything?
Having spent a few days tussling with docx runs now..
Paragraphs <w:p> contain one or more "style" runs <w:r> containing one or more texts <w:t>
But Docx runs are very easy to break, and when broken they can hide it very well.
Just having two texts the same format isn't necessarily enough to make it the same run. They don't automatically join, and changing format on text in a run then changing it back is enough to give you two separate but identically formatted runs.
(Greater experts than I dig more into runs/text here DOCX w:t (text) elements crossing multiple w:r (run) elements?)
This caused me a lot of problems with a 'tag substitution within runs' task, leading me to conclude that the only way to guarantee text is all in one run, is to enter it yourself (with unchanging format) in one fell swoop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With