I have a problem when working on a gcp dataflow project. I have created a dataflow pipeline code in python. It works well. I want to import this code file into another python code file with some classes and functions. If I just import it but not use it my whole code does not work.
Error text when I put a message in pub/sub topic :
File "dataflow_simple.py", line 87, in process NameError: global name 'pvalue' is not defined
What can I do to import another file with some classes and use it?
What are you trying to achieve ? If the goal is to have one file defining the functions and classes and another one defining the pipeline, then you should do it the other way around: import the functions from the file that contains the pipeline.
If this is indeed what you are trying to do, organize your files in this way, and add a setup.py:
Dataflow
|----my_module
|----__init__.py
|----functions.py
|----classes.py
|----setup.py
|----my_pipe.py
Then in my_pipe.py:
from my_module.functions import ...
from my_module.classes import ...
Still in my_pipe.py, give the path to setup.py when building the pipeline. This will ensure that all files are copied when sending the job to Dataflow:
options = beam.options.pipeline_options.PipelineOptions(
...,
setup_file='/path/to/setup.py')
Reference
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With