I've been trying to set up a MySQL to Elasticsearch data pipeline for real-time data replication. The MySQL database has around 10 different tables that are highly normalized. But in Elasticsearch, I'm in need to have all of the data from these tables in a single index, which would be similar to the output from a big compound JOIN query. Tried a lot to find out, please help 🙂
(Changing the DB schema isn't feasible as there are a lot of other dependent services. )
For example :
Input from MySQL:
Table: main_profile
+--------+------+
|  name  | city |  
+--------+------+
| Edward |    1 | 
| Jake   |    9 |
+--------+------+
Table: city_master
+---------+----------+
| city_id |   name   |
+---------+----------+
|       1 | New York |
|      9  | Tampa    |
+---------+----------+
Document stored in Elasticsearch:
{
   "0": {
      "name": "Edward",
      "city": "New York"
   },
   "1": {
      "name": "Jake",
      "city": "Tampa"
   }
}
you can use Kafka Streams to do aggregation from two different topics to build a unfied message. Please check an example for Debezium source https://github.com/debezium/debezium-examples/tree/master/kstreams
The target is MongoDB in the example but the principle is the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With