Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe.lookup and map combination resulting in column label error

I have a large dataframe of around (1200, 10) of mostly string where I have to append a new column say 'Z' based an existing reference column say 'Y', whose values are 'A', 'B', 'C', or unknown (NaN or other), from this I need to select one of three corresponding columns in the df say 'D', 'E', 'F', or output NaN and append this value as column 'Z'. I currently have the following code:

df = pd.DataFrame({'T': {0: '.', 1: '.', 2: '.', 3: '.'}, 
                   'G': {0: '.', 1: '.', 2: '.', 3: '.'}, 
                   'D': {0: 4, 1: 1, 2: 5, 3: 3}, 
                   'E': {0: 6, 1: 2, 2: 7, 3: 2}, 
                   'F': {0: 8, 1: 3, 2: 9, 3: 1}, 
                   'K': {0: '.', 1: '.', 2: '.', 3:'.'}, 
                   'Y': {0: 'A', 1: 'B', 2: 'B', 3: np.nan}})

d = {'A': 'D', 'B': 'E', 'C': 'F'}
df['Z'] = df.lookup(df.index, df.Y.map(d))

The issue is that lookup breaks down where Y is an unknown value. and in the specific code, Y.unique() turns up something like (A, B, C, NaN, nan). So I was wondering if there would be a way to use a lookup-esque method that outputs Z to NaN where Y is NaN or unknown outside of the the given dict?


    T   G   D   E   F   K   Y   Z
0   .   .   4   6   8   .   A   4.0
1   .   .   1   2   3   .   B   2.0
2   .   .   5   7   9   .   B   7.0
3   .   .   3   2   1   .   NaN NaN
like image 562
Roy Han Avatar asked Nov 21 '25 14:11

Roy Han


1 Answers

You can use stack and reindex with zip for multiindexes:

df['Z'] = df.stack().reindex(zip(df.index, df.Y.map(d))).reset_index(level=1, drop=True)

Output:

   T  G  D  E  F  K    Y    Z
0  .  .  4  6  8  .    A    4
1  .  .  1  2  3  .    B    2
2  .  .  5  7  9  .    B    7
3  .  .  3  2  1  .  NaN  NaN

Details:

First, let's use stack, move the column header into the dataframe row index, creating a multiindex for the dataframe. Where, level 0 is the original row index, and level 1 are the column headers.

Now, we can use reindex to filter down to only those indexes we need. Multiindex are identified using tuples. (level0, level1) hence, we zip df.index and df.y.map(d) together creating the tuples used by reindex.

Lastly, we drop the level=1 of the multiindex creating a structure of the original index and assign a new column with those values.

like image 93
Scott Boston Avatar answered Nov 24 '25 03:11

Scott Boston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!