Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting data according to a list

Tags:

python

regex

I'm trying to figure out how to extract some data from a string according to this list:

check_list = ['E1', 'E2', 'E7', 'E3', 'E9', 'E10', 'E12', 'IN1', 'IN2', 'IN4', 'IN10']

For example for this list:

s1 = "apto E1-E10 tower 1-2 sanit"

I would get ['E1', 'E10']

s2 = "apto IN2-IN1-IN4-E12-IN10 mamp"

For this I would get: ['IN2', 'IN1', 'IN4', 'E12', 'IN10']

And then this gets tricky:

s3 = "E-2-7-3-9-12; IN1-4-10 T 1-2 inst. hidr."

I would get: ['E2', 'E7', 'E3', 'E9', 'E12', 'IN1', 'IN4', 'IN10']

Can you please give some advice to solve this?

like image 747
Javier Cárdenas Avatar asked Dec 30 '25 00:12

Javier Cárdenas


1 Answers

The following should work:

def extract_data(s):
    check_set = set(['E1', 'E2', 'E7', 'E3', 'E9', 'E10', 'E12',
                     'IN1', 'IN2', 'IN4', 'IN10'])
    result = []
    for match in re.finditer(r'\b(E|IN)[-\d]+', s):
        for digits in re.findall(r'\d+', match.group(0)):
            item = match.group(1) + digits
            if item in check_set:
                result.append(item)
    return result

Examples:

>>> extract_data("apto E1-E10 tower 1-2 sanit")
['E1', 'E10']
>>> extract_data("apto IN2-IN1-IN4-E12-IN10 mamp")
['IN2', 'IN1', 'IN4', 'E12', 'IN10']
>>> extract_data("E-2-7-3-9-12; IN1-4-10 T 1-2 inst. hidr.")
['E2', 'E7', 'E3', 'E9', 'E12', 'IN1', 'IN4', 'IN10']
like image 141
Andrew Clark Avatar answered Dec 31 '25 15:12

Andrew Clark



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!