Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check following element in list in pandas dataframe

I have created the following pandas dataframe

import pandas as pd
import numpy as np

ds = {
      'col1' : 
          [
              ['U', 'U', 'U', 'U', 'U', 1, 0, 0, 0, 'U','U', None],
              [6, 5, 4, 3, 2],
              [0, 0, 0, 'U', 'U'],
              [0, 1, 'U', 'U', 'U'],
              [0, 'U', 'U', 'U', None]
              ]
      }

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)
                                      col1
0  [U, U, U, U, U, 1, 0, 0, 0, U, U, None]
1                          [6, 5, 4, 3, 2]
2                          [0, 0, 0, U, U]
3                          [0, 1, U, U, U]
4                       [0, U, U, U, None]

For each row in col1, I need to check if every element equals to U in the list is followed (from left to right) by any value apart from U and None: in that case I'd create a new column (called iCount) with value of 1. Else 0.

In the example above, the resulting dataframe would look like this:

                                      col1 iCount
0  [U, U, U, U, U, 1, 0, 0, 0, U, U, None]      1
1                          [6, 5, 4, 3, 2]      0
2                          [0, 0, 0, U, U]      0
3                          [0, 1, U, U, U]      0
4                       [0, U, U, U, None]      0

Only in the first row the value U is followed by a value which is neither U nor None (it is 1)

I have tried this code:

col5 = np.array(df['col1'])

for i in range(len(df)):
    iCount = 0

    for j in range(len(col5[i])-1):
        
        print(col5[i][j])
        
        if((col5[i][j] == "U") & ((col5[i][j+1] != None) & (col5[i][j+1] != "U"))):
            
            iCount += 1
            
        else:
            iCount = iCount
    

But I get this (wrong) dataframe:

                                      col1  iCount
0  [U, U, U, U, U, 1, 0, 0, 0, U, U, None]       0
1                          [6, 5, 4, 3, 2]       0
2                          [0, 0, 0, U, U]       0
3                          [0, 1, U, U, U]       0
4                       [0, U, U, U, None]       0

Can anyone help me please?

like image 945
Giampaolo Levorato Avatar asked Sep 01 '25 15:09

Giampaolo Levorato


2 Answers

If you only want to test if there is at least one case in which a non-None follow a U, use itertools.pairwise and any:

from itertools import pairwise

def count_after_U(lst):
    return int(any(a=='U' and b not in {'U', None} for a, b in pairwise(lst)))

df['iCount'] = list(map(count_after_U, df['col1']))

Output:

                                            col1  iCount
0        [U, U, U, U, U, 1, 0, 0, 0, U, U, None]       1
1                                [6, 5, 4, 3, 2]       0
2                                [0, 0, 0, U, U]       0
3                                [0, 1, U, U, U]       0
4                             [0, U, U, U, None]       0
5  [U, U, 4, U, U, 1, 0, U, U, None, 1, U, None]       1
6                                [U, None, 1, U]       0

If you also want to check the other values until the next U, use a custom function:

def any_after_U(lst):
    flag = False
    for item in lst:
        if item == 'U':
            flag = True
        else:
            if flag and item is not None:
                return 1
    return 0

df['iCount'] = list(map(any_after_U, df['col1']))

Example:

                                            col1  iCount
0        [U, U, U, U, U, 1, 0, 0, 0, U, U, None]       1
1                                [6, 5, 4, 3, 2]       0
2                                [0, 0, 0, U, U]       0
3                                [0, 1, U, U, U]       0
4                             [0, U, U, U, None]       0
5  [U, U, 4, U, U, 1, 0, U, U, None, 1, U, None]       1
6                                [U, None, 1, U]       1

original answer before clarification

approach 1: considering only the first item after U

IIUC, use a custom python function:

from itertools import pairwise

def count_after_U(lst):
    return sum(a=='U' and b not in {'U', None} for a,b in pairwise(lst))

df['iCount'] = list(map(count_after_U, df['col1']))

Or, to be more flexible with the conditions:

def count_after_U(lst):
    flag = False
    iCount = 0
    for item in lst:
        if item == 'U':
            flag = True
        else:
            if flag and item is not None:
                iCount += 1
            flag = False
    return iCount

df['iCount'] = list(map(count_after_U, df['col1']))

Output:

                                      col1  iCount
0  [U, U, U, U, U, 1, 0, 0, 0, U, U, None]       1
1                          [6, 5, 4, 3, 2]       0
2                          [0, 0, 0, U, U]       0
3                          [0, 1, U, U, U]       0
4                       [0, U, U, U, None]       0

More complex example:

                                            col1  iCount
0        [U, U, U, U, U, 1, 0, 0, 0, U, U, None]       1
1                                [6, 5, 4, 3, 2]       0
2                                [0, 0, 0, U, U]       0
3                                [0, 1, U, U, U]       0
4                             [0, U, U, U, None]       0
5  [U, U, 4, U, U, 1, 0, U, U, None, 1, U, None]       2

approach 2: considering all values after U:

Just indent the flag reset in the previous approach to only reset it if a value was not yet found:

def count_after_U(lst):
    flag = False
    iCount = 0
    for item in lst:
        if item == 'U':
            flag = True
        else:
            if flag and item is not None:
                iCount += 1
                flag = False
    return iCount

df['iCount'] = list(map(count_after_U, df['col1']))

Example:

                                            col1  iCount
0        [U, U, U, U, U, 1, 0, 0, 0, U, U, None]       1
1                                [6, 5, 4, 3, 2]       0
2                                [0, 0, 0, U, U]       0
3                                [0, 1, U, U, U]       0
4                             [0, U, U, U, None]       0
5  [U, U, 4, U, U, 1, 0, U, U, None, 1, U, None]       3
like image 97
mozway Avatar answered Sep 04 '25 16:09

mozway


Try this:

def calcUs(lst):
    cnt = 0
    for x, y in zip(lst, lst[1:]):
        if (x == 'U' and y != 'U' and y != None):
            cnt += 1
    return cnt
df['iCount'] = df['col1'].apply(lambda x: calcUs(x))
df

Output:

                                      col1  iCount
0  [U, U, U, U, U, 1, 0, 0, 0, U, U, None]       1
1                          [6, 5, 4, 3, 2]       0
2                          [0, 0, 0, U, U]       0
3                          [0, 1, U, U, U]       0
4                       [0, U, U, U, None]       0
like image 40
gtomer Avatar answered Sep 04 '25 15:09

gtomer