Hadoop has recently introduced Encryption at Rest (HDFS-6134). I'd like to know whether it's also supported in Spark? I mean can Spark processes data which is stored in encrypted format in HDFS?
Yes, Spark will be able to access data without any changes to the application code. The data is encrypted transparently to the applications, which means all your Java APIs and command-line interfaces work as before without any changes. The framework will take of encryption without bothering you.
Here is a quote from the documentation:
HDFS implements transparent, end-to-end encryption. Once configured, data read from and written to HDFS is transparently encrypted and decrypted without requiring changes to user application code.
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html
You will however be required to add/modify some configuration. Here's a worked example.
See also blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With