How can i convert a json file from a mongo db source to a Parquet file using C#?
I have found a library called Parquet.Net but i need something more dynamic. The data i have it is very dynamic and it is dificult to build a schema on that, if you have a solution to this problema please let me know.
var file = File.ReadAllLines(@"C:\Users\NodeJS\Downloads\countries.json");
List<object> tt = new List<object>();
var fields = new HashSet<DataField>();
foreach (var item in file)
{
var entity = JsonConvert.DeserializeObject<JObject>(item).ToObject<Dictionary<string, object>>();
foreach(var t in entity)
{
fields.Add(new DataField(t.Key, t.Value.GetType()));
tt.Add(t.Value);
}
}
var schema = new Schema(fields);
using (Stream fileStream = System.IO.File.Create("convertJson.parquet"))
{
ParquetConvert.Serialize(tt, fileStream,schema);
}
You could consider looking into Cinchoo ETL - an open source library, which can convert JSON to Parquet file.
Install Nuget package
install-package ChoETL.Parquet
Sample code
using ChoETL;
using (var r = new ChoJSONReader("*** Your JSON file ***"))
{
using (var w = new ChoParquetWriter("*** Your parquet output file ***"))
{
w.Write(x);
}
}
For more information, please visit codeproject article.
Sample fiddle: https://dotnetfiddle.net/fIJIfM
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With