I'm writing a Python script to generate an RTF file containing text excerpts with a search term highlighted in bold. The script successfully generates the excerpts with the keyword, but the term isn’t bolded as intended.
Search Term: "Apple"
Expected Output in LibreOffice: Apple (bolded)
Actual Output in LibreOffice: "Apple0" (plain text with "0" appended)
Raw RTF Text: { Apple0} (viewed in a text editor)
I expect { Apple0} to be {\b Apple\b0}, where \b starts bold and \b0 ends it, per RTF syntax.
Here’s a simplified version of my Python code:
import re
TERM = "Apple"
RTF_HEADER = r"{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Calibri;}}\f0\fs22\par"
RTF_FOOTER = r"}"
BOLD_START = r"{\b "
BOLD_END = r"\b0}"
excerpt = "This is an Apple test."
term_pattern = re.compile(rf"\b{TERM}\b", re.IGNORECASE)
bolded_term = BOLD_START + TERM + BOLD_END # Intended: {\b Apple\b0}
excerpt_bolded = term_pattern.sub(bolded_term, excerpt)
with open("output.rtf", "w", encoding="utf-8") as f:
f.write(RTF_HEADER + excerpt_bolded + RTF_FOOTER)
Backslashes have special meaning in regular expressions. You use them with \b
for word boundary in re.compile
, but the rtf pieces you put together also include backslashes for the rtf commands. You need to escape those with another backslash each, so that they don't have special meaning in the
regular expression.
BOLD_START = r"{\\b "
BOLD_END = r"\\b0}"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With