Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in bigdata

Parquet predicate pushdown

Is Data Lake and Big Data the same?

bigdata data-lake

Apache Hadoop vs Google Bigdata

Mini batch-training of a scikit-learn classifier where I provide the mini batches

python scikit-learn bigdata

NumPy reading file with filtering lines on the fly

How to do a join in Elasticsearch -- or at the Lucene level

pyspark: counter part of like() method in dataframe

Can large datasets be used with Excel 2013? [closed]

excel bigdata excel-2013

What do I need to know about working with huge databases?

Extend numpy mask by n cells to the right for each bad value, efficiently

python numpy bigdata

It appears I've run out of 32-bit address space. What are my options?

python numpy bigdata

Apache Spark: impact of repartitioning, sorting and caching on a join

Processing a very large text file with lazy Texts and ByteStrings

Send KafkaProducer from local machine to hortonworks sandbox on virtualbox

Implementing custom Spark RDD in Java

apache-spark bigdata

Spark Scala Understanding reduceByKey(_ + _)

How to process a range of hbase rows using spark?

Pyspark: how to duplicate a row n time in dataframe?

python pyspark bigdata

In spark join, does table order matter like in pig?

Creating a comparable and flexible fingerprint of an object