Is it any way of doing that from a Spark application running on Azure HDInsight? We are using Scala.
Azure Blobs are supported (through WASB). I don't understand why Azure Tables aren't.
Thanks in advance
You can actually read from Table Storage in Spark, here's a project done by a Microsoft guy doing just that:
https://github.com/mooso/azure-tables-hadoop
You probably won't need all the Hive stuff, just the classes at root level:
You can read with something like this:
import org.apache.hadoop.conf.Configuration
sparkContext.newAPIHadoopRDD(getTableConfig(tableName,account,key),
classOf[AzureTableInputFormat],
classOf[Text],
classOf[WritableEntity])
def getTableConfig(tableName : String, account : String, key : String): Configuration = {
val configuration = new Configuration()
configuration.set("azure.table.name", tableName)
configuration.set("azure.table.account.uri", account)
configuration.set("azure.table.storage.key", key)
configuration
}
You will have to write a decoding function to transform your WritableEntity to the Class you want.
It worked for me!
Currently Azure Tables are not supported. Only Azure blobs support the HDFS interface required by Hadoop & Spark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With