We have a v.large Dictionary<long,uint> (several million entries) as part of a high performance C# application. When the application closes we serialise the dictionary to disk using BinaryFormatter and MemoryStream.ToArray(). The serialisation returns in about 30 seconds and produces a file about 200MB in size. When we then try to deserialise the dictionary using the following code:
BinaryFormatter bin = new BinaryFormatter();
Stream stream = File.Open("filePathName", FileMode.Open);
Dictionary<long, uint> allPreviousResults =
(Dictionary<long, uint>)bin.Deserialize(stream);
stream.Close();
It takes about 15 minutes to return. We have tried alternatives and the slow part is definitely bin.Derserialize(stream), i.e. the bytes are read from the hard drive (high performance SSD) in under 1 second.
Can someone please point out what we are doing wrong as we want the load time on the same order as the save time.
Regards, Marc
You may checkout protobuf-net or simply serialize it yourself which will probably be the fastest you can get.
class Program
{
public static void Main()
{
var dico = new Dictionary<long, uint>();
for (long i = 0; i < 7500000; i++)
{
dico.Add(i, (uint)i);
}
using (var stream = File.OpenWrite("data.dat"))
using (var writer = new BinaryWriter(stream))
{
foreach (var key in dico.Keys)
{
writer.Write(key);
writer.Write(dico[key]);
}
}
dico.Clear();
using (var stream = File.OpenRead("data.dat"))
using (var reader = new BinaryReader(stream))
{
while (stream.Position < stream.Length)
{
var key = reader.ReadInt64();
var value = reader.ReadUInt32();
dico.Add(key, value);
}
}
}
}
size of resulting file => 90M bytes (85.8MB).
Just to show similar serialization (to the accepted answer) via protobuf-net:
using System.Collections.Generic;
using ProtoBuf;
using System.IO;
[ProtoContract]
class Test
{
[ProtoMember(1)]
public Dictionary<long, uint> Data {get;set;}
}
class Program
{
public static void Main()
{
Serializer.PrepareSerializer<Test>();
var dico = new Dictionary<long, uint>();
for (long i = 0; i < 7500000; i++)
{
dico.Add(i, (uint)i);
}
var data = new Test { Data = dico };
using (var stream = File.OpenWrite("data.dat"))
{
Serializer.Serialize(stream, data);
}
dico.Clear();
using (var stream = File.OpenRead("data.dat"))
{
Serializer.Merge<Test>(stream, data);
}
}
}
Size: 83meg - but most importantly, you haven't had to do it all by hand, introducing bugs. Fast too (will be even faster in "v2").
You may want to use a profiler to see if, behind the scenes, the deserializer is performing a bunch of on-the-fly reflection.
For now, if you don't want to use a database, try storing your objects as a flatfile in a custom format. For example, the first line the file gives the total number of entries in the dictionary, allowing you to instantiate a dictionary with a predetermined size. Have the remaining lines as a series of fixed-width key-value pairs representing all of the entries in your dictionary.
With your new file format, use a StreamReader to read in your file line-by-line or in fixed blocks, see if this allows you read in your dictionary any faster.
There are several fast Key-Value NoSQL solutions out there why not try them? As a example ESENT, somebody posted it here at SO. managedesent
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With