How to read Azure Table Storage data from Apache Spark running on HDInsight

Question

Is it any way of doing that from a Spark application running on Azure HDInsight? We are using Scala.

Azure Blobs are supported (through WASB). I don't understand why Azure Tables aren't.

Thanks in advance

Lucian · Accepted Answer

You can actually read from Table Storage in Spark, here's a project done by a Microsoft guy doing just that:

https://github.com/mooso/azure-tables-hadoop

You probably won't need all the Hive stuff, just the classes at root level:

AzureTableConfiguration.java
AzureTableInputFormat.java
AzureTableInputSplit.java
AzureTablePartitioner.java
AzureTableRecordReader.java
BaseAzureTablePartitioner.java
DefaultTablePartitioner.java
PartitionInputSplit.java
WritableEntity.java

You can read with something like this:

import org.apache.hadoop.conf.Configuration

sparkContext.newAPIHadoopRDD(getTableConfig(tableName,account,key),
                                                classOf[AzureTableInputFormat],
                                                classOf[Text],
                                                classOf[WritableEntity])

def getTableConfig(tableName : String, account : String, key : String): Configuration = {
    val configuration = new Configuration()
    configuration.set("azure.table.name", tableName)
    configuration.set("azure.table.account.uri", account)
    configuration.set("azure.table.storage.key", key)
    configuration
  }

You will have to write a decoding function to transform your WritableEntity to the Class you want.

It worked for me!

Asad Khan · Answer

Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.

How to read Azure Table Storage data from Apache Spark running on HDInsight

Tags:

apache-spark

azure

azure-storage

azure-hdinsight

Jose Parra

2 Answers

Lucian

Asad Khan

Recent Activity

Donate For Us

How to read Azure Table Storage data from Apache Spark running on HDInsight

Tags:

apache-spark

azure

azure-storage

azure-hdinsight

Jose Parra

2 Answers

Lucian

Asad Khan

Related questions

Recent Activity

Donate For Us