Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Edit content in header of document Python-docx

I am trying to find and replace text in a Textbox in Header of document. But after searching for awhile, it seems there is no way to access the content in the Header or "float" text boxes via python-docx (I read issue here)

So, it means we have to find and replace directly on the xml format of document. Do you know anyway to do that?

like image 484
Khanh Le Avatar asked Oct 31 '25 22:10

Khanh Le


2 Answers

I found a way to solve this problem. For instance, I have a template.docx file, and I want to change text in a Textbox in Header as described above. Flow steps below, I solved my problems:

  1. Rename file template.docx to template.zip
  2. Unzip template.zip to template folder
  3. Find and replace text I want to change in one of the header<number>.xml files in /template/word/ folder.
  4. Zip all files in /template folder back to template.zip
  5. Rename template.zip back to template.docx

I used Python to manipulate these

import os
import shutil
import zipfile

WORKING_DIR = os.getcwd()
TEMP_DOCX = os.path.join(WORKING_DIR, "template.docx")
TEMP_ZIP = os.path.join(WORKING_DIR, "template.zip")
TEMP_FOLDER = os.path.join(WORKING_DIR, "template")

# remove old zip file or folder template
if os.path.exists(TEMP_ZIP):
    os.remove(TEMP_ZIP)
if os.path.exists(TEMP_FOLDER):
    shutil.rmtree(TEMP_FOLDER)

# reformat template.docx's extension
os.rename(TEMP_DOCX, TEMP_ZIP)

# unzip file zip to specific folder
with zipfile.ZipFile(TEMP_ZIP, 'r') as z:
    z.extractall(TEMP_FOLDER)

# change header xml file
header_xml = os.path.join(TEMP_FOLDER, "word", "header1.xml")
xmlstring = open(header_xml, 'r', encoding='utf-8').read()
xmlstring = xmlstring.replace("#TXTB1", "Hello World!")
with open(header_xml, "wb") as f:
    f.write(xmlstring.encode("UTF-8"))

# zip temp folder to zip file
os.remove(TEMP_ZIP)
shutil.make_archive(TEMP_ZIP.replace(".zip", ""), 'zip', TEMP_FOLDER)

# rename zip file to docx
os.rename(TEMP_ZIP, TEMP_DOCX)
shutil.rmtree(TEMP_FOLDER)
like image 188
Khanh Le Avatar answered Nov 03 '25 01:11

Khanh Le


Another way to do this, in memory:

def docx_setup_header(doc_sio_1, new_title):
    """
    Returns a StringIO having replaced #TITLE# placeholder in document header with new_title
    :param doc_sio_1:  A StringIO instance of the docx document.
    :param new_title:  The new title to be inserted into header, replacing #TITLE# placeholder
    :return: A new StringIO instance, with modified document.
    """

    HEADER_PATH = 'word/header1.xml'

    doc_zip_1 = zipfile.ZipFile(doc_sio_1, 'r')
    header = doc_zip_1.read(HEADER_PATH)
    header = header.replace("#TITLE#", new_title)

    doc_sio_2 = StringIO.StringIO()
    doc_zip_2 = zipfile.ZipFile(doc_sio_2, 'w')
    for item in doc_zip_1.infolist():
        content = doc_zip_1.read(item.filename)
        if item.filename == HEADER_PATH:
            doc_zip_2.writestr(HEADER_PATH, header, zipfile.ZIP_DEFLATED)
        else:
            doc_zip_2.writestr(item, content)

    doc_zip_1.close()
    doc_zip_2.close()

    return doc_sio_2

Note that Python's zipfile cannot replace or delete items, so you need to make up a brand new zipfile, and copy unchanged elements from original.

like image 41
patb Avatar answered Nov 03 '25 02:11

patb