I am on a mission to eliminate all (or as many as I can) allocations to the Large Object Heap as possible in my applications. One of the biggest offenders is our code that computes the MD5 hash of a large string.
public static string MD5Hash(this string s)
{
using (MD5CryptoServiceProvider csp = new MD5CryptoServiceProvider())
{
byte[] bytesToHash = Encoding.UTF8.GetBytes(s);
byte[] hashBytes = csp.ComputeHash(bytesToHash);
return Convert.ToBase64String(hashBytes);
}
}
Leave for the sake of the example that the string itself is probably already in the LOH. Our goal is to prevent more allocations to the heap.
Also, the current implementation assumes UTF8 encoding (a big assumption), but really the goal is to generate a byte[] from a string.
The MD5CryptoServiceProvider can take a Stream as input, so we can create a method:
public static string MD5Hash(this Stream stream)
{
using (MD5CryptoServiceProvider csp = new MD5CryptoServiceProvider())
{
return Convert.ToBase64String(csp.ComputeHash(stream));
}
}
This is promising because we don't need a byte[] for ComputeHash to work. We need a stream object that will read bytes from a string as bytes are requested by ComputeHash.
This rather controvesial question provides a method for creating a byte array from a string regardless of encoding. However, we want to avoid the creation of a large byte array.
This question provides a method of creating a stream from a string by reading the string into a MemoryStream, but internally that is just allocating a large byte[] array as well.
Neither really do the trick.
So how can you avoid the allocation of a large byte[]? Is there a Stream class that will read from another stream (or reader) as bytes are read?
If you don't care about the encoding, then one thing that you can do to prevent any further buffer allocation is to use some unsafe code. I.e. get to the raw bytes of the string, wrap an instance of UnmanagedMemoryStream
around it and feed that to the MD5 crypto calculation.
So something like this:
public static string MD5Hash(this string s)
{
using (MD5CryptoServiceProvider csp = new MD5CryptoServiceProvider())
{
unsafe
{
fixed (char* input = s)
{
using (var stream = new UnmanagedMemoryStream((byte*)input, sizeof(char) * s.Length))
return Convert.ToBase64String(csp.ComputeHash(stream));
}
}
}
}
You can implement your own stream backed by a string.
Note that basically you only need to implement Read
and Write
, accordingly with the documentation (but just throw a NotSupportedException
on Write
since you should not write to this stream):
When you implement a derived class of Stream, you must provide implementations for the Read and Write methods. The asynchronous methods ReadAsync, WriteAsync, and CopyToAsync use the synchronous methods Read and Write in their implementations.
You probably want to also implement ReadByte
:
The default implementations of ReadByte and WriteByte create a new single-element byte array, and then call your implementations of Read and Write
Source: https://msdn.microsoft.com/pt-br/library/system.io.stream%28v=vs.110%29.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With