I am writing a multi-threaded program to scrape a certain site and collect ID's. It is storing these ID's in a shared static List<string> object.
When any item is added to the List<string>, it is first checked against a HashSet<string> which contains a blacklist of already collected ID's.
I do this as follows:
private static HashSet<string> Blacklist = new HashSet<string>();
private static List<string> IDList = new List<string>();
public static void AddIDToIDList(string ID)
{
lock (IDList)
{
if (IsIDBlacklisted(ID))
return;
IDList.Add(ID);
}
}
public static bool IsIDBlacklisted(string ID)
{
lock (Blacklist)
{
if (Blacklist.Contains(ID))
return true;
}
return false;
}
The Blacklist is saved to a file after finishing and is loaded every time the program starts, therefore, it will get pretty large over time (up to 50k records). Is there a more efficient way to not only store this blacklist, but also to check each ID against it?
Thanks!
To improve performance try to use ConcurrentBag<T> collection. Also there is no need to lock BlackList because it's not being modified e.g.:
private static HashSet<string> Blacklist = new HashSet<string>();
private static ConcurrentBag<string> IDList = new ConcurrentBag<string>();
public static void AddIDToIDList(string ID)
{
if (Blacklist.Contains(ID))
{
return;
}
IDList.Add(ID);
}
Read operations are thread safe on HashSet, as long as Blacklist is not being modified you don't need to lock on it. Also you should lock inside the blacklist check so the lock is taken less often, this also will increase your performance.
private static HashSet<string> Blacklist = new HashSet<string>();
private static List<string> IDList = new List<string>();
public static void AddIDToIDList(string ID)
{
if (IsIDBlacklisted(ID))
return;
lock (IDList)
{
IDList.Add(ID);
}
}
public static bool IsIDBlacklisted(string ID)
{
return Blacklist.Contains(ID);
}
If Blacklist is being modified the best way to lock around it is using a ReaderWriterLock (use the slim version if you are using a newer .NET)
private static HashSet<string> Blacklist = new HashSet<string>();
private static List<string> IDList = new List<string>();
private static ReaderWriterLockSlim BlacklistLock = new ReaderWriterLockSlim();
public static void AddIDToIDList(string ID)
{
if (IsIDBlacklisted(ID))
return;
lock (IDList)
{
IDList.Add(ID);
}
}
public static bool IsIDBlacklisted(string ID)
{
BlacklistLock.EnterReadLock();
try
{
return Blacklist.Contains(ID);
}
finally
{
BlacklistLock.ExitReadLock();
}
}
public static bool AddToIDBlacklist(string ID)
{
BlacklistLock.EnterWriteLock();
try
{
return Blacklist.Add(ID);
}
finally
{
BlacklistLock.ExitWriteLock();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With