Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does "runs" works in python docx

Tags:

python

docx

I'm having difficulties understanding the "run" object. As described, run object identifies the same style of text continuation. However, when I run a paragraph which all words are in the same style, "runs" still return me more than one line.

To make sure I didn't miss any possible style issues, I created a new word doc and typed as below

Hjkhkuhu joiuiuouoiuo iouiouououoi iouiououiuiuiui hhvvhgh hgjjhjhhh hjhjhjhjhjhj hjhjhj, jjkjkjk jkjkjkjkiuio uiouiouoo! jkjkjlkjlk

And I run below code:

from docx import Document

doc = Document('test.docx')

for p in doc.paragraphs:
    for run in p.runs:
        print(run.text)

And here is the result I got:

Hjkhkuhu

joiuiuouoiuo

iouiouououoi

iouiououiuiuiui

hhvvhgh

hgjjhjhhh

hjhjhjhjhjhj

hjhjhj

jjkjkjk

jkjkjkjkiuio

uiouiouoo ! jkjkjlkjlk

Why is this the case? Did I miss anything?

like image 916
Ariel Avatar asked Oct 24 '25 03:10

Ariel


1 Answers

Having spent a few days tussling with docx runs now..

Paragraphs <w:p> contain one or more "style" runs <w:r> containing one or more texts <w:t>

But Docx runs are very easy to break, and when broken they can hide it very well.

Just having two texts the same format isn't necessarily enough to make it the same run. They don't automatically join, and changing format on text in a run then changing it back is enough to give you two separate but identically formatted runs.

(Greater experts than I dig more into runs/text here DOCX w:t (text) elements crossing multiple w:r (run) elements?)

This caused me a lot of problems with a 'tag substitution within runs' task, leading me to conclude that the only way to guarantee text is all in one run, is to enter it yourself (with unchanging format) in one fell swoop.

like image 115
Kingsley Avatar answered Oct 26 '25 16:10

Kingsley