In Ruby, I am building a method which constructs and returns a (probably large) array which should contain no duplicate elements. Would I get better performance by using a set and then converting that to an array? Or would it be better to just call .uniq
on the array I am using before I return it? Or what about using &
to append items to the array instead of +=
? And if I do use a set, would not having a <=>
method on the object I am putting into the set have an effect on performance? (If you're not sure, do you know of a way to test this?)
The real answer is: write the most readable and maintainable code, and optimize it only after you've shown it is a bottleneck. If you can find an algorithm in that is in linear time, you won't have to optimize it. Here it's easy to find...
Not quite sure which methods you are suggesting, but using my fruity
gem:
require 'fruity'
require 'set'
enum = 1000.times
compare do
uniq { enum.each_with_object([]){|x, array| array << x}.uniq }
set { enum.each_with_object(Set[]){|x, set| set << x}.to_a }
join { enum.inject([]){|array, x| array | [x]} }
end
# set is faster than uniq by 10.0% ± 1.0%
# uniq is faster than join by 394x ± 10.0
Clearly, it makes no sense building intermediate arrays like in the third method. Otherwise, it's not going to make a big difference since you will be in O(n)
; that's the main thing.
BTW, both sets
, uniq
and Array#|
use eql?
and hash
on your objects, not <=>
. These need to be defined in a sane manner, because the default is that objects are never eql?
unless they have the same object_id
(see this question)
Have you tried using the Benchmark library? Tests are usually very easy to construct and will properly reflect how it works in your particular version of Ruby.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With