Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterative processing in Dataflow

As shown here Dataflow pipelines are represented by a fixed DAG. I'm wondering if it's possible to implement a pipeline where the processing proceeds until a dynamically evaluated condition is satisfied based on the data computed so far.

Here's some pseudo code to illustrate what I'd like to implement:

    PCollection pco = null
    while(true):
        pco = pco.apply(someTransform())
        if (conditionSatisfied(pco)):
            break
    pco.Write()
like image 455
scordata Avatar asked Aug 31 '25 16:08

scordata


1 Answers

It seems like you really want iterative computations. Right now Dataflow does not provide support for that, but we are aware that it is a very important use case and we are working on finding the right set of APIs to express it.

For now your workarounds are:

  • Iteratively run whole pipelines (run pipeline, inspect output, run again if the condition is not satisfied, etc). This has the obvious downside of pipeline setup and teardown overhead.
  • Build a pipeline with a hard-coded number of iterations by .apply()'ing in a loop unconditionally, then run the whole pipeline.
  • A combination of the two, e.g. run fixed 5-iteration pipelines until you're satisfied with the result.
like image 128
jkff Avatar answered Sep 02 '25 13:09

jkff