I'm trying to look for Python script that could extract text from the first page of a word document. I found functions that could do paragraphs but not pages, which is not what I need.
The problem is, pages in docx format are purely virtual. MS Word decides by itself where and when to put page limiters, based on the text size and another parameters.
It's a little bit easier when user did explicitly set page breaks, as they can be found like it's described there, for example.
As a workaround, you can just calculate the amount of lines per page and trim it by yourself, but as long as I know, there's no "easy" method to do everything with 1 code line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With