Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do you call the property of a list that describes the degree to which it contains duplicates?

I have a function that selects Cartesian products of lists such that the number of duplicate elements is highest:

import Data.List (nub)

f :: Eq a => [[a]] -> [[a]]
f xss = filter ((==) minLength . length . nub) cartProd
  where
    minLength = minimum (map (length . nub) cartProd)
    cartProd = sequence xss

For example:

*Main> f [[1,2,3],[4],[1,5]]
[[1,4,1]]

But:

*Main> sequence [[1,2,3],[4],[1,5]]
[[1,4,1],[1,4,5],[2,4,1],[2,4,5],[3,4,1],[3,4,5]]

Is there a name for the property that the result of my function f has?

like image 910
marc_r Avatar asked Dec 07 '25 11:12

marc_r


1 Answers

I believe your function is computing a minimum set cover:

Given a set of elements { 1 , 2 , . . . , n } (called the universe) and a collection S of sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of S whose union equals the universe.

In your case, n is length xss. There is one set in S for each distinct element x of concat xss, namely the set { i | x `elem` (xss !! i) } of all indices that x occurs in. The minimum set cover then tells you which x to choose from each list in xss (sometimes giving you multiple choices; any choice will produce the same final nubbed length).

Here is a worked example for your [[1,2,3],[4],[1,5]]:

The universe is {1,2,3}.

There are five sets in the collection S; I'll name them S_1 through S_5:

  • S_1 = {1,3} because the first and third lists contain 1.
  • S_2 = {1} because the first list contains 2.
  • S_3 = {1} because the first list contains 3.
  • S_4 = {2} because the second list contains 4.
  • S_5 = {3} because the third list contains 5.

A minimum set cover for this is {S_1, S_4}. Because this is a set cover, this means every list contains either 1 or 4. Because it is minimal, no other choice of sets produces a smaller final collection of values. So, we can choose either 1 or 4 from each list to produce a final answer. As it happens, no list contains both 1 and 4 so there is only one choice, namely [1,4,1].

like image 186
Daniel Wagner Avatar answered Dec 10 '25 01:12

Daniel Wagner