Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two DataStreams in Apache Flink

I'm using Flink to process my streaming data.

I have two data sources: A and B.

// A
DataStream<String> dataA = env.addSource(sourceA);
// B
DataStream<String> dataB = env.addSource(sourceB);

I use map to process the data coming from A and B.

DataStream<String> res = mergeDataAAndDataB();   // how to merge dataA and dataB?

Saying that sourceA is sending: "aaa", "bbb", "ccc"..., sourceB is sending: "A", "B", "C"....

What I'm trying to do is to merge them as Aaaa, Bbbb, Cccc... to generate a new DataStream<String> object.

How to achieve this?

like image 781
Yves Avatar asked Aug 30 '25 15:08

Yves


1 Answers

There are two kinds of stream merging in Flink.

dataA.union(dataB)

will create one new stream that has the elements of both streams, blended in some arbitrary way, perhaps "aaa", "bbb", "A", "ccc", "B", "C", which isn't what you've asked for -- just mentioning it for completeness.

What you do want is to create a connected stream, via

dataA.connect(dataB)

which you can then process with a RichCoFlatMapFunction or a KeyedCoProcessFunction to compute a sort of join that glues the strings together.

You'll find a tutorial on the topic of connected streams in the Flink documentation, and an example that's reasonably close in the training exercises that accompany the tutorials.

like image 200
David Anderson Avatar answered Sep 02 '25 13:09

David Anderson