Reading from one Hadoop cluster and writing to another Hadoop custer

Question

I am running a spark job and I need to read from a HDFS table which is in lets say HadoopCluster-1. Now I want the aggregate dataframe into a table which is present in another HadoopCluster-2. What would be the best way to do it?

I am thinking of below approach: Before writing the data to target table, read the hdfs-site.xml and core-site.xml using addResource. Then copy all the config values into a Map<String,String> Then set these conf values into my dataset.sparkSession.SparkContext.hadoopConfiguration().

Is this a good way to achieve my goal ?

Mohana B C · Accepted Answer

If you want to read hive table from cluster1 as a dataframe and write it as hive table in cluster2 after transforming dataframe, you can try below approach.

Make sure hiveserver2 is running on both cluster. command to run server is

hive --service hiveserever2

hive --service metastore

Make sure hive is properly configured with username/password. You can mark both username/password as empty but you will get an error, you can resolve that by referring this link.

Now read hive table from cluster1 as spark dataframe and write it to hive table of cluster2 after transformation.

// spark-scala code

val sourceJdbcMap = Map(
 "url"->"jdbc:hive2://<source_host>:<port>", //default port is 10000
 "driver"->"org.apache.hive.jdbc.HiveDriver",
 "user"->"<username>",
 "password"->"<password>",
 "dbtable"->"<source_table>")

val targetJdbcMap = Map(
 "url"->"jdbc:hive2://<target_host>:<port>", //default port is 10000
 "driver"->"org.apache.hive.jdbc.HiveDriver",
 "user"->"<username>",
 "password"->"<password>",
 "dbtable"->"<target_table>")

val sourceDF = spark.read.format("jdbc").options(sourceJdbcMap).load()

val transformedDF = //transformation goes here...

transformedDF.write.options(targetJdbcMap).format("jdbc").save()

Reading from one Hadoop cluster and writing to another Hadoop custer

Tags:

apache-spark

hadoop

hdfs

GearFour

1 Answers

Mohana B C

Recent Activity

Donate For Us

Reading from one Hadoop cluster and writing to another Hadoop custer

Tags:

apache-spark

hadoop

hdfs

GearFour

1 Answers

Mohana B C

Related questions

Recent Activity

Donate For Us