Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is difference between hadoop and spark [closed]

As spark is growing in market nowadays I can see the Spark’s major use cases over Hadoop like:

  1. Iterative Algorithms in Machine Learning
  2. Interactive Data Mining and Data Processing
  3. Spark is a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.
  4. Stream processing: Log processing and Fraud detection in live streams for alerts, aggregates and analysis
  5. Sensor data processing: Where data is fetched and joined from multiple sources, in-memory dataset really helpful as they are easy
    and fast to process.

My question is:

  1. Is spark going to replace Hadoop in upcoming days?
  2. Hadoop work concurrently while spark runs in parallel?(is it true?)
like image 535
Roshan Bagdiya Avatar asked Oct 27 '25 11:10

Roshan Bagdiya


2 Answers

Spark differ from hadoop in the sense that let you integrate data ingestion, proccessing and real time analytics in one tool. Moreover spark map reduce framework differ from standard hadoop map reduce because in spark intermediate map reduce result are cached, and RDD(abstarction for a distributed collection that ii fault tollerant) can be saved in memory if there is the need to reuse the same results (iterative alghoritms, group by , etc etc).

My answer is really superficial and does not not answer your question completly but just point out some of the main difference (much more in reality) Spark and databricks official site is really well documented and your question is already answered there :

https://databricks.com/spark/about

http://spark.apache.org/faq.html

like image 120
eugenio calabrese Avatar answered Oct 29 '25 06:10

eugenio calabrese


Hadoop today is a collection of technologies but in its essence it is a distributed file-system (HDFS) and a distributed resource manager (YARN). Spark is a distributed computational framework that is poised to replace Map/Reduce - another distributed computational framework that

  1. used to be synonymous with Hadoop
  2. ships with Hadoop out-of-the-box for backward compatibility (before YARN map/reduce support framework was Hadoop's resource management framework)

Specifically - Spark is not going to replace Hadoop but would probably replace map/reduce and Hadoop, map/reduce and spark are all distributed systems (and run in parallel)

like image 39
Arnon Rotem-Gal-Oz Avatar answered Oct 29 '25 07:10

Arnon Rotem-Gal-Oz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!