Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gcp dataflow apache-beam problem. import another python file to main .py with code

I have a problem when working on a gcp dataflow project. I have created a dataflow pipeline code in python. It works well. I want to import this code file into another python code file with some classes and functions. If I just import it but not use it my whole code does not work.

Error text when I put a message in pub/sub topic :

File "dataflow_simple.py", line 87, in process NameError: global name 'pvalue' is not defined

What can I do to import another file with some classes and use it?

like image 805
Alexander Tolmachev Avatar asked Oct 30 '25 11:10

Alexander Tolmachev


1 Answers

What are you trying to achieve ? If the goal is to have one file defining the functions and classes and another one defining the pipeline, then you should do it the other way around: import the functions from the file that contains the pipeline.

If this is indeed what you are trying to do, organize your files in this way, and add a setup.py:

Dataflow
|----my_module
     |----__init__.py
     |----functions.py
     |----classes.py
|----setup.py
|----my_pipe.py

Then in my_pipe.py:

from my_module.functions import ...
from my_module.classes import ...

Still in my_pipe.py, give the path to setup.py when building the pipeline. This will ensure that all files are copied when sending the job to Dataflow:

options = beam.options.pipeline_options.PipelineOptions(
    ...,
    setup_file='/path/to/setup.py')

Reference

like image 87
totooooo Avatar answered Nov 02 '25 00:11

totooooo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!