Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in bigdata

Dynamodb updateitem only with global secondary index

Parquet predicate pushdown

Is Data Lake and Big Data the same?

bigdata data-lake

Apache Hadoop vs Google Bigdata

Mini batch-training of a scikit-learn classifier where I provide the mini batches

python scikit-learn bigdata

NumPy reading file with filtering lines on the fly

How to do a join in Elasticsearch -- or at the Lucene level

pyspark: counter part of like() method in dataframe

Can large datasets be used with Excel 2013? [closed]

excel bigdata excel-2013

What do I need to know about working with huge databases?

Extend numpy mask by n cells to the right for each bad value, efficiently

python numpy bigdata

It appears I've run out of 32-bit address space. What are my options?

python numpy bigdata

Apache Spark: impact of repartitioning, sorting and caching on a join

Processing a very large text file with lazy Texts and ByteStrings

Send KafkaProducer from local machine to hortonworks sandbox on virtualbox

Implementing custom Spark RDD in Java

apache-spark bigdata

Spark Scala Understanding reduceByKey(_ + _)

How to process a range of hbase rows using spark?

Pyspark: how to duplicate a row n time in dataframe?

python pyspark bigdata

In spark join, does table order matter like in pig?