Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparklyr - How to change the parquet data types

Is there a way to change data types of columns when reading parquet files? I'm using the spark_read_parquet function from Sparklyr, but it doesn't have the columns option (from spark_read_csv) to change it.

In csv files, I would do something like:

data_tbl <- spark_read_csv(sc, "data", path, infer_schema = FALSE, columns = list_with_data_types)

How could I do something similar with parquet files?

like image 480
Igor Avatar asked Jan 27 '26 01:01

Igor


1 Answers

Specifying data types only makes sense when reading a data format that does not have built in metadata on variable types. This is the case with csv or fwf files, which, at most, have variable names in the first row. Thus the read functions for such files have that functionality.

This sort of functionality does not make sense for data formats that have built in variable types, such as Parquet (or .Rds and .Rds in R).

This in this case you should:

a) read the Parquet file into Spark b) make the necessary data transformations c) save the transformed data into a Parquet file, overwriting the previous file

like image 73
LucasMation Avatar answered Jan 29 '26 18:01

LucasMation



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!