Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using GCS path in PyPDF2 PdfFileReader

I am using the python library PyPDF2 and trying to read a pdf file using PdfFileReader. It works fine for a local pdf file. Is there a way to access my pdf file from Google Cloud Storage bucket (gs://bucket_name/object_name)?

from PyPDF2 import PdfReader

with open('testpdf.pdf','rb') as f1:
        reader = PdfReader(f1)
        number_of_pages = len(reader.pages)

Instead of 'testpdf.pdf', how can I provide my Google Cloud Storage object location? Please let me know if anyone tried this.

like image 625
san Avatar asked Dec 06 '25 18:12

san


1 Answers

You can use GCSFS library to access files from gcs bucket. For eg.

import gcsfs
from pypdf import PdfReader

gcs_file_system = gcsfs.GCSFileSystem(project="PROJECT_ID")
gcs_pdf_path = "gs://bucket_name/object.pdf"

f_object = gcs_file_system.open(gcs_pdf_path, "rb")
    
# Open our PDF file with the PdfReader
reader = PdfReader(f_object)
  
# Get number of pages
num = len(reader.pages)

f_object.close()
like image 98
Shivam123 Avatar answered Dec 09 '25 13:12

Shivam123



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!