Extract all tables from PDF in python [duplicate]

Question

I have an PDF and want to extract all tables from that PDF. When I run the code below, I get empty list.

import pdftables

filepath = 'File_Set_-2_feasibility_Study/140u-td005_-en-p.pdf'
with open(filepath, 'rb') as fh:
    table = pdftables.get_tables(fh)
print(table)

Michael Dorner · Accepted Answer

I assume that the PDF has more than one page? This should work:

from pdftables.pdf_document import PDFDocument
from pdftables.pdftables import page_to_tables

filepath = ...
page_number = ...
with open(filepath, 'rb') as file_object:
    pdf_doc = PDFDocument.from_fileobj(file_object)
    pdf_page = pdf_doc.get_page(pagenumber) 
    tables = page_to_tables(pdf_page)
    print(tables)

You can iterate over several pages, too:

for page_number, page in enumerate(pdf_doc.get_pages()):
    tables = page_to_tables(page)
    print(tables)

Extract all tables from PDF in python [duplicate]

Tags:

python

pdf

pdftables

Neeraj Sharma

1 Answers

Michael Dorner

Recent Activity

Donate For Us

Extract all tables from PDF in python [duplicate]

Tags:

python

pdf

pdftables

Neeraj Sharma

1 Answers

Michael Dorner

Related questions

Recent Activity

Donate For Us