I am new to Hadoop eco-system tools. Can anyone help me with understand the difference between hive, beeline and hive.
Thanks in advance!
Apache Hive :
1] Apache Hive is a data warehouse infrastructure build over Hadoop platform for performing data intensive task such as querying, analysis, processing and visualization.
2] Hive generates query expression at compile time.
3] Every Hive query has this problem of "cold start"
4] Hive translates queries to be executed into MapReduce jobs under the hood involving overheads.
5] Hive is more universal, versatile and pluggable language.
6] For an upgradation project where compatibility and speed are equally imprtant. Hive is an ideal choice.
Cloudera Impala :
1] Impala is an excellent choice for programmers for running queries on HDFS and Apache HBase as it doesn't require data to be moved or transformed.
2] Impala does runtime code generation for "big loops" using llvm.
3] Impala avoids startup overhead as daemon processes are started at boot time itself, always being ready to process a query.
4] Impala resonds quickly through massively parallel processing.
5] Impala is used unleash its brute processing power and give lightning fast analytic result.
6] Impala is an ideal choice when starting a new project.
Beeline :
1] Hive CLI connects directly to the Hive Driver and requires that Hive be installed on the same machine as the client.
2] However, Beeline connects to HiveServer2 and does not require the installation of Hive libraries on the same machine as the client.
3] Beeline is a thin client that also uses the Hive JDBC driver but instead executes queries through HiveServer2, which allows multiple concurrent client connections and supports authentication.
4] Cloudera's Sentry security is working through HiveServer2 and not HiveServer1 which is used by Hive CLI. So hive though the command-line will not follow the policy from Setry. According to the cloudera docs you should not use Hive CLI and WebHCat. Use beeline or impala-sell instead.
5] Connect with Beeline : url is a jdbc connection string, pointing to the hiveServer2 host.
terminal> beeline -u url -n username -p password
OR
terminal> beeline
beeline> !connect jdbc:hive2://HiveServer2Host:Port 
Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine. Hortonworks and Amazon do not support Impala. Update: Hortonworks merged with Cloudera and new company name is Cloudera. And Amazon also supports Impala. MapR also supports Impala. Impala does not use Map-Reduce under the hood and works faster than Hive.
Apache Hive is a database built on top of Hadoop for providing data summarization, query, and analysis. Supported by all Hadoop vendors. Very reliable, can scale virtually unlimited and work with very big data, uses Map-Reduce framework primitives under the hood, even if configured to run on Tez execution engine. Can use Tez or MR(deprecated in Hive 2.x) execution engines.
Beeline is a Hive client. See here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_dataintegration/content/beeline-vs-hive-cli.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With