Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conversion of JSON to parquet format using Apache Parquet in C#

Tags:

c#

parquet

How can i convert a json file from a mongo db source to a Parquet file using C#?

I have found a library called Parquet.Net but i need something more dynamic. The data i have it is very dynamic and it is dificult to build a schema on that, if you have a solution to this problema please let me know.

  var file = File.ReadAllLines(@"C:\Users\NodeJS\Downloads\countries.json");
            List<object> tt = new List<object>();
            var fields = new HashSet<DataField>();

            foreach (var item in file)
            {

                var entity = JsonConvert.DeserializeObject<JObject>(item).ToObject<Dictionary<string, object>>();
                 foreach(var t in entity)
                {
                    fields.Add(new DataField(t.Key, t.Value.GetType()));
                        tt.Add(t.Value);
                }
            }

            var schema = new Schema(fields);

            using (Stream fileStream = System.IO.File.Create("convertJson.parquet"))
            {
                ParquetConvert.Serialize(tt, fileStream,schema);
            }
like image 562
DEV-SOFT Avatar asked Sep 12 '25 23:09

DEV-SOFT


1 Answers

You could consider looking into Cinchoo ETL - an open source library, which can convert JSON to Parquet file.

Install Nuget package

install-package ChoETL.Parquet

Sample code

using ChoETL;

using (var r = new ChoJSONReader("*** Your JSON file ***"))
{
    using (var w = new ChoParquetWriter("*** Your parquet output file ***"))
    {
        w.Write(x);
    }
}

For more information, please visit codeproject article.

Sample fiddle: https://dotnetfiddle.net/fIJIfM

like image 54
Cinchoo Avatar answered Sep 14 '25 13:09

Cinchoo