I have a HashSet<string> with ~50k members. I have another list of objects that I'm iterating through one by one to determine if the object's email exists. If it does, I need to perform some action on the object.
var emailList = db.Emails.Select(s => s.EmailAddress.ToLower()).ToList();
var emailHash = new HashSet<string>(emailList);
var objects = db.Objects.ToList();
// everything is fine up to this point
foreach (var object in objects) {
if (!emailHash.Any(s => s.Equals(object.Email))) { // This takes ~0.3s
Console.WriteLine("Email: {0}", object.Email);
}
}
What can I do to speed up the evaluation of whether or not one string exists in a list of strings?
You are not using the HashSet correctly. Using Linq's .Any() will actually evaluate your condition against each element stored in the HashSet.
To search if an item exists in a HashSet (with constant time, O(1)) use emailHash.Contains(object.Email).
One obvious change is to not use the Enumerable.Any() LINQ function, which basically negates the advantages of using a hash set by performing a sequential search.
Instead, use HashSet's built-in Contains(string) function:
foreach (var object in objects) {
if (!emailHash.Contains(object.Email)) {
Console.WriteLine("Email: {0}", object.Email);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With