My data structures knowledge is way rusty, and to be honest it was never my strongest point.
Right now we're about to build a queue-like component that has the following requirements:
I think that sums it up. so, obviously a single list or ordered list is out of the question, due to the fact that every time we add or remove objects from the collection it gets sorted again, and doing this in a single collection with a million objects is slow.
We tested a couple of approaches in the past, like linked lists, which proved to be fast for queueing, but slow for finding items (since we do have that scenario).
Right now we've come up with a structure like
SortedDictionary<int, SortedDictionary<int, SortedDictionary<int, SortedDictionary<int, SortedDictionary<int, ..
You get the idea.
It's kind of a sweet spot of grouping levels, keeping each collection relatively small (around 300 items per dictionary).
so, for the first level, we will have a sorteddictionary, for which the keys are the ids of each master category, and the values will be a sorteddictionary of which the key will be the id of the child category...and so on.
Right now we've tested with 100, 1,000, 10,000, 100,000 and 1,000,000 items.
For the smaller range, up to 100k, the solution is fast. it can queue/dequeue/find in less than a second even for up to 300k, which is really above the 80-90% of the scenarios that we will handle.
When it comes to a million, it does become slower, taking about 3-4 seconds to queue the whole thing, and up to 10 seconds to deplete the queue.
So, my questions are:
Thanks.
In Java, dynamically allocated data structures (such as ArrayList , LinkedList , Vector , Stack , HashSet , HashMap , Hashtable ) are supported in a unified architecture called the Collection Framework, which mandates the common behaviors of all the classes.
A few remarks and general suggestions (I'm sorry that I don't have the time to test myself):
I believe that your general approach - nested (sorted)dictionaries - is sound. I use similar structures very often, to my satisfaction - not for performance reasons because my cases are always small, but for clarity and flexibility.
In your case it also addresses one of the performance issues, because sorting (at inserting and at deleting) takes time and smaller (sub)dictionaries obviously sort faster.
Yes, having class-instances as values (or keys) only uses the reference, so in that respect it doesn't really matter what size or structure your class has.
The increasing time for deleting (and adding) presumably is caused (primarily) by the resorting that is done every time you remove (or add) an item.
As regards the performance of adding items:
If your case enables you to feed the dictionaries with items in a sorted (ascending) order, you may want to switch to a SortedLIST, rather than a SortedDICTIONARY, because in the list adding items is O(1) rather than O(log n) if the new items will wind up at the end of the sorted collection.
A collection has an initial CAPACITY - the reserved space for items. Adding items beyond the current capacity of the collection results in (a) increasing the capacity-value of the collection; (b) reallocating space for the (whole) new capacity; (c) copying the values from the original (small) storage to the new one. Therefore, if you have some idea about how large your collections are going to be: initialize the collection with the appropriate capacity. This is not (yet) possible with a sortedDictionary, but the sortedLIST supports that feature.
As regards the performance of deleting items:
You may want to consider using an approach that uses a custom-class wrapping the sorted collection, such that it doesn't "really" delete items (thereby avoiding the resorting), until the whole collection is empty.
All in all, have a try with something along the following lines (using vb syntax; I'm sure you can translate it to C#,C+ or whatever language you wish to use.
Public Class MySortedCollection(Of myKeyType, myValueType)
  Public Sub New(Optional myCapacity As Integer = 0)
    If myCapacity <= 0 Then
      MyValues = New SortedList(Of myKeyType, myValueType)
      ValidItems = New Dictionary(Of myKeyType, Boolean)
    Else
      MyValues = New SortedList(Of myKeyType, myValueType)(myCapacity)
      ValidItems = New Dictionary(Of myKeyType, Boolean)(myCapacity)
    End If
    LiveItemsCount = 0
  End Sub
  Private MyValues As SortedList(Of myKeyType, myValueType)
  Private ValidItems As Dictionary(Of myKeyType, Boolean)
  Private LiveItemsCount As Integer
  Public ReadOnly Property Count As Integer
    Get
      Return LiveItemsCount
    End Get
  End Property
  Public Sub Clear()
    MyValues.Clear()
    ValidItems.Clear()
    LiveItemsCount = 0
  End Sub
  Public Sub Add(key As myKeyType, value As myValueType)
    MyValues.Add(key, value)
    ValidItems.Add(key, True)
    LiveItemsCount += 1
  End Sub
  Public Function Remove(key As myKeyType) As Integer
    If ValidItems(key) Then
      ValidItems(key) = False
      LiveItemsCount -= 1
      If LiveItemsCount = 0 Then
        MyValues.Clear()
        ValidItems.Clear()
      End If
    End If
    Return LiveItemsCount
  End Function
  Public Function TryGetValue(key As myKeyType, ByRef value As myValueType) As Boolean
    If MyValues.TryGetValue(key, value) Then
      If ValidItems(key) Then
        Return True
      Else
        value = Nothing
      End If
    End If
    Return False
  End Function
  Public Function TryGetAndDelete(key As myKeyType, ByRef value As myValueType) As Boolean
    If Me.TryGetValue(key, value) Then
      ValidItems(key) = False
      LiveItemsCount -= 1
      If LiveItemsCount = 0 Then
        MyValues.Clear()
        ValidItems.Clear()
      End If
      Return True
    End If
    Return False
  End Function
  // add more collection-wrapping methods as needed
End Class
You "pay" performance-wise for the overhead of the wrapping class as well as for the auxiliary dictionary that is used inside to keep track of the "real" items versus those that are considered deleted. You save however on the repeated sorting upon deleting an item. It depends of course on the actual case whether it will help (or harm...). And again, I haven't tested it myself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With