Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NodeJS - reading Parquet files

Does anyone know a way of reading parquet files with NodeJS?

I tried node-parquet -> very hard (but possible) to install - it works most of the time but not working for reading numbers (numerical data types).

Also tried parquetjs but that one can read only parquet files created by it's own library. Anything created with Spark or Python - can not read.

Thanks

like image 475
Joe Avatar asked Dec 08 '25 23:12

Joe


1 Answers

Does anyone know a way of reading parquet files with NodeJS?

I found many libraries but most of them are dead/not maintained.

  • parquetjs - https://github.com/ironSource/parquetjs/issues/128
  • parquets - https://github.com/kbajalc/parquets/issues/38
  • parquetjs-lite - https://github.com/ZJONSSON/parquetjs
  • node-parquet - https://github.com/skale-me/node-parquet/issues/62

Also tried parquetjs but that one can read only parquet files created by it's own library. Anything created with Spark or Python - can not read.

I have not tried this library but parquet has a defined spec. We should be able to read a parquet file created from python or spark in JavaScript.

Other option:

  • DuckDB - I would suggest to try this library. DuckDB is an in-process embedded library/DB.

Below code snippet using DuckDB to read parquet data directly from disk.

var duckdb = require('duckdb');
var db = new duckdb.Database(':memory:');
db.all("SELECT * FROM READ_PARQUET('D:\\sample\\userdata1.parquet') WHERE Country='Canada' LIMIT 3", function(err, res) {
  if (err) {
    throw err;
  }
  console.log(res)
});

DuckDB has a lot of features built around parquet.

  • Run SQL queries directly on parquet file on disk, read from S3, read from HTTP endpoint, we can even load very large parquet files into DuckDB format and run queries on the DB tables, join parquet data with other formats like CSV.
  • Write parquet files onto disk, write parquet to s3 bucket.
  • single file read, multi file read, read a folder/use glob expression.
  • read schema and metadata, footer statistics.
  • Parquet projection pushdown, filter pushdown.

Docs:

  • DuckDB node bindings
  • DuckDB Parquet Docs
  • Other DuckDB feature
like image 86
ns94 Avatar answered Dec 10 '25 13:12

ns94



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!