This doesn't work
data Cutlery = Knife | Fork deriving (Show,Eq)
let x = [Knife,Fork]
let set1 = Set.fromList x
while defining
data Cutlery = Knife | Fork deriving (Show,Ord,Eq)
solves the issue but doesn't make sense. Is Data.Set different than the mathematical definition of a set?
A Data.Set captures the mathematical abstraction of a set, but it is not identical. The main difference is that a Data.Set requires its elements to be ordered, whereas a mathematical set only requires that its elements be comparable for equality.
The reason for requiring Ord is efficiency. It would be perfectly possible to build a set abstraction by defining
data Set a = Set [a]
i.e. under the hood it is just a list, and we make sure we never insert duplicate elements. The elem and insert operations would be
elem a (Set as) = any (a ==) as
insert a (Set as) | a `elem` as = Set as
| otherwise = Set (a:as)
However, this means that both elem and insert are O(n) operations. If we want to do any better than this, the standard approaches are
Ord instance)Hashable instance).The implementation chosen by the authors of Data.Set was to use a binary tree, which you can see by going to the source. The implementation is more or less
data Set a = Bin a (Set a) (Set a)
| Tip
Now you can write the elem function as
elem :: Ord a => a -> Set a -> Bool
elem = go
where
go _ Tip = False
go x (Bin y l r) = case compare x y of
LT -> go x l
GT -> go x r
EQ -> True
which is an O(log n) operation, rather than O(n). Insertions are trickier (as you need to keep the tree balanced) but similar.
In a hash set, you don't directly compare elements when inserting and removing them. Instead, each element is hashed to an integer, and stored in a location based on that integer.
In theory this doesn't require an Ord instance. In practice, you need some method of keeping track of multiple elements that hash to the same value, and the method chosen by the developers of Data.HashSet is to store multiple elements in a regular Data.Set, so it turns out you do need the Ord instance after all!
data HashSet a = HashSet (Data.IntMap.IntMap (Data.Set.Set a))
It could have been written instead as
data HashSet a = HashSet (Data.IntMap.IntMap [a])
instead, which removes out the Ord requirement at the cost of some inefficiency if there are many elements which has to the same value.
Is
Data.Setdifferent than the mathematical definition of a set?
Obviously, mathematical sets can be uncountable infinite – you won’t be able to represent that in all generality with a computer, or even a Turing machine.
But the answer you are looking for is this: Data.Set is a data type based on binary trees, and needs a total linear order on the elements to know whether to put and later find something on the left or right subtree of a node. So while it would be possible to implement a set datatype without an Ord constraint, this particular, more efficient implementation would not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With