Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python remove partial duplicates from a list

I have a list of items that was improperly created. Instead of copying the whole item once, it made multiple partial copies of the same item. The partial duplicates are mixed with other duplicates and some unique items. For example list a:

a = ['one two','one two three four','one two three','five six','five six seven','eight nine']

I want to remove the partial duplicates and keep the longest expression of the item. For example I would like to produce list b:

b = ['one two three four', 'five six seven','eight nine']

The integrity of the item must remain intact, cannot become:

c = '[two one three four', 'vife six seven', 'eight nine']

like image 934
Mario Tomas Avatar asked Mar 15 '26 19:03

Mario Tomas


1 Answers

You can use sets for this.

Try this code

a = ['one two','one two three', 'one two three four', 'five six', 'five six seven','eight nine']

# check for subsets
for i in range(len(a)):
   for j in range(len(a)):
      if i==j: continue # same index
      if (set(a[i].split()) & set(a[j].split())) == set(a[i].split()): # if subset
         a[i]="" # clear string

# a = [x for x in a if len(x)]  # remove empty strings

b = []
for x in a:  # each string in a
   if len(x) > 0: # if not empty
      b.append(x)  # add to final list  

a = b

print(a)

Output

['one two three four', 'five six seven', 'eight nine']
like image 166
Mike67 Avatar answered Mar 18 '26 10:03

Mike67



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!