Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python compression string not quite right

Tags:

python

string

I have the following code that is self explanatory in the docstring. How do I get it to not flag single letters with a 1, thereby turning a single digit into 2 in the final compressed string?

For example in the docstring it turns AAABBBBCDDDD -> A3B4C1D4 but I want it to turn into A3B4CD4. I'm new at this so it's any comments are greatly appreciated.

class StringCompression(object):
    '''
    Run Length Compression Algorithm: Given a string of letters, such as
    nucleotide sequences, compress it using numbers to flag contiguous repeats.
    Ex: AAABBBBCDDDD -> A3B4C1D4


    >>>x = StringCompression('AAAAbC')
    >>>x.compress()
    'A4bC'
    '''
    def __init__(self, string):
        self.string = string

    def compress(self):
        '''Executes compression on the object.'''
        run = ''
        length = len(self.string)

        if length == 0:
            return ''

        if length == 1:
            return self.string #+ '1'

        last = self.string[0]

        count = 1

        i = 1

        while i < length:

            if self.string[i] == self.string[i - 1]:
                count += 1

            else:
                run = run + self.string[i - 1] + str(count)
                count = 1

            i += 1

        run = (run + self.string[i - 1] + str(count))

        return run

1 Answers

Here's an alternative solution using itertools.groupby and a generator:

from itertools import chain, groupby

x = 'AAABBBBCDDDD'

def compressor(s):
    for i, j in groupby(s):
        size = len(list(j))
        yield (i, '' if size==1 else str(size))

res = ''.join(chain.from_iterable(compressor(x)))

print(res)

A3B4CD4
like image 119
jpp Avatar answered Feb 25 '26 22:02

jpp