Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract text from first page of a word document Using python

Tags:

python

ms-word

I'm trying to look for Python script that could extract text from the first page of a word document. I found functions that could do paragraphs but not pages, which is not what I need.

like image 707
L Zh Avatar asked Sep 06 '25 06:09

L Zh


1 Answers

The problem is, pages in docx format are purely virtual. MS Word decides by itself where and when to put page limiters, based on the text size and another parameters.

It's a little bit easier when user did explicitly set page breaks, as they can be found like it's described there, for example.

As a workaround, you can just calculate the amount of lines per page and trim it by yourself, but as long as I know, there's no "easy" method to do everything with 1 code line.

like image 145
Дмитрий Клименко Avatar answered Sep 07 '25 19:09

Дмитрий Клименко