Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete text from pdf using PyMUPDF

Tags:

python

pymupdf

I need to remove the text "DRAFT" from a pdf document using Python. I can find the text box containing the text but can't find an example of how to edit the pdf text element using pymupdf.

In the example below the draft object contains the coords and text for the DRAFT text element.

import fitz

fname = r"original.pdf"
doc = fitz.open(fname)
page = doc.load_page(0)

draft = page.search_for("DRAFT")

# insert code here to delete the DRAFT text or replace it with an empty string

out_fname = r"final.pdf"
doc.save(out_fname)

Added 4/28/2022 I found a way to delete the text but unfortunately it also deletes any overlapping text underneath the box around DRAFT. I really just want to delete the DRAFT letters without modifying underlying layers

# insert code here to delete the DRAFT text or replace it with an empty string
rl = page.search_for("DRAFT", quads = True)
page.add_redact_annot(rl[0])

page.apply_redactions()

like image 738
user3005422 Avatar asked Nov 15 '25 09:11

user3005422


1 Answers

You can try this.

import fitz

doc = fitz.open("xxxx")

for page in doc:
    for xref in page.get_contents():
        stream = doc.xref_stream(xref).replace(b'The string to delete', b'')
        doc.update_stream(xref, stream)
like image 153
xiaoxu Avatar answered Nov 16 '25 22:11

xiaoxu