Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching for a string in HashSet<string> Performance

Tags:

c#

.net

c#-4.0

I have a HashSet<string> with ~50k members. I have another list of objects that I'm iterating through one by one to determine if the object's email exists. If it does, I need to perform some action on the object.

var emailList = db.Emails.Select(s => s.EmailAddress.ToLower()).ToList();
var emailHash = new HashSet<string>(emailList);
var objects = db.Objects.ToList();
// everything is fine up to this point
foreach (var object in objects) {
   if (!emailHash.Any(s => s.Equals(object.Email))) { // This takes ~0.3s
      Console.WriteLine("Email: {0}", object.Email);     
   }
}

What can I do to speed up the evaluation of whether or not one string exists in a list of strings?

like image 473
RobVious Avatar asked Oct 25 '25 06:10

RobVious


2 Answers

You are not using the HashSet correctly. Using Linq's .Any() will actually evaluate your condition against each element stored in the HashSet.

To search if an item exists in a HashSet (with constant time, O(1)) use emailHash.Contains(object.Email).

like image 130
Gerardo Grignoli Avatar answered Oct 26 '25 19:10

Gerardo Grignoli


One obvious change is to not use the Enumerable.Any() LINQ function, which basically negates the advantages of using a hash set by performing a sequential search.

Instead, use HashSet's built-in Contains(string) function:

foreach (var object in objects) {
   if (!emailHash.Contains(object.Email)) {
      Console.WriteLine("Email: {0}", object.Email);     
   }
}
like image 41
sstan Avatar answered Oct 26 '25 19:10

sstan