I have a console application that allows the users to specify variables to process. These variables come in three flavors: string, double and long (with double and long being by far the most commonly used types). The user can specify whatever variables they like and in whatever order so my system has to be able to handle that. To this end in my application I had been storing these as object and then casting/uncasting them as required. for example:
public class UnitResponse
{
    public object Value { get; set; }
}
My understanding was that boxed objects take up a bit more memory (about 12 bytes) than a standard value type.
My question is: would it be more efficient to use the dynamic keyword to store these values? It might get around the boxing/unboxing issue, and if it is more efficient how would this impact performance?
EDIT
To provide some context and prevent the "are you sure you're using enough RAM to worry about this" in my worst case I have 420,000,000 datapoints to worry about (60 variables * 7,000,000 records). This is in addition to a bunch of other data I keep about each variable (including a few booleans, etc.). So reducing memory does have a HUGE impact.
<< is the left shift operator. It is shifting the number 1 to the left 0 bits, which is equivalent to the number 1 .
%d is a format specifier, used in C Language. Now a format specifier is indicated by a % (percentage symbol) before the letter describing it. In simple words, a format specifier tells us the type of data to store and print. Now, %d represents the signed decimal integer.
OK, so the real question here is "I've got a freakin' enormous data set that I am storing in memory, how do I optimize its performance in both time and memory space?"
Several thoughts:
Dynamic ain't it; it's boxing plus a whole lot of other overhead. (C#'s dynamic is very fast compared to other dynamic dispatch systems, but it is not fast or small in absolute terms).
It's gross, but you could consider using a struct whose layout shares memory between the various fields - like a union in C. Doing so is really really gross and not at all safe but it can help in situations like these. Do a web search for "StructLayoutAttribute"; you'll find tutorials.
Normally I don't recommend using float over double because its a false economy; people often economise this way when they have ONE number, like the savings of four bytes is going to make the difference. The difference between 42 million floats and 42 million doubles is considerable.
Is there regularity in the data that you can exploit? For example, suppose that of your 42 million records, there are only 100000 actual values for, say, each long, 100000 values for each double, and 100000 values for each string. In that case, you make an indexed storage of some sort for the longs, doubles and strings, and then each record gets an integer where the low bits are the index, and the top two bits indicate which storage to get it out of. Now you have 42 million records each containing an int, and the values are stored away in some nicely compact form somewhere else.
Store the booleans as bits in a byte; write properties to do the bit shifting to get 'em out. Save yourself several bytes that way.
Remember that memory is actually disk space; RAM is just a convenient cache on top of it. If the data set is going to be too large to keep in RAM then something is going to page it back out to disk and read it back in later; that could be you or it could be the operating system. It is possible that you know more about your data locality than the operating system does. You could write your data to disk in some conveniently pageable form (like a b-tree) and be more efficient about keeping stuff on disk and only bringing it in to memory when you need it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With