I am currently trying to create a new column within a polars dataframe (df). Within my df, there are many many rows, and within this new column I only want my existing list to populate wherever certain conditions are met.
I know that in pandas, I could do something like the following (where 'output' is my new column and assuming list is the same length as the filtered df) :
df = pd.DataFrame({
    'cond1': [-1, 0, 1, 2, 2, 3, 4, 5],
    'cond2': ['No', 'No', 'No', 'Yes','Yes', 'Yes', 'Yes', 'Yes']
})
conditions = ((df['cond1']>=1) & (df['cond2'] == 'Yes'))
list_new = [1,2,3,4,5]
df.loc[conditions, 'output'] = list_new
How would I do this same thing but within polars? I want the total number of rows to stay the same, I just want to update the new column wherever the conditions are met
I tried a few variations of the following but nothing has seemed to work:
df_polars = df_polars.with_columns(pl.when(conditions).then(list_new).alias('output')
.with_row index() to get row indexes..filter() to get rows based on conditions.Now we get DataFrame with proper row indexes and new data:
conditions = [pl.col('cond1') >= 1, pl.col('cond2') == 'Yes']
df_output = (
    df.with_row_index()
    .filter(conditions).select('index', output=pl.Series(list_new)
)
┌───────┬────────┐
│ index ┆ output │
│ ---   ┆ ---    │
│ u32   ┆ i64    │
╞═══════╪════════╡
│ 3     ┆ 1      │
│ 4     ┆ 2      │
│ 5     ┆ 3      │
│ 6     ┆ 4      │
│ 7     ┆ 5      │
└───────┴────────┘
.join() to add column to original DataFrame:(
    df.with_row_index()
    .join(df_output, on='index', how='left')
    .drop('index')
)
┌───────┬───────┬────────┐
│ cond1 ┆ cond2 ┆ output │
│ ---   ┆ ---   ┆ ---    │
│ i64   ┆ str   ┆ i64    │
╞═══════╪═══════╪════════╡
│ -1    ┆ No    ┆ null   │
│ 0     ┆ No    ┆ null   │
│ 1     ┆ No    ┆ null   │
│ 2     ┆ Yes   ┆ 1      │
│ 2     ┆ Yes   ┆ 2      │
│ 3     ┆ Yes   ┆ 3      │
│ 4     ┆ Yes   ┆ 4      │
│ 5     ┆ Yes   ┆ 5      │
└───────┴───────┴────────┘
Can't avoid the use of with_row_index but one can also use DataFrame.update here
import polars as pl
from polars import col
df = pl.DataFrame({
    'cond1': [-1, 0, 1, 2, 2, 3, 4, 5],
    'cond2': ['No', 'No', 'No', 'Yes','Yes', 'Yes', 'Yes', 'Yes']
})
list_new = [1, 2, 3, 4, 5]
df = df.with_row_index()
subset = df.select(
   col('index').filter(col('cond1') >= 1, col('cond2') == 'Yes'),
   pl.Series('output', list_new)
)
print(
    df.with_columns(output=None)
    .update(subset, on='index', how='outer')
    .drop('index')
)
# shape: (8, 3)
# ┌───────┬───────┬────────┐
# │ cond1 ┆ cond2 ┆ output │
# │ ---   ┆ ---   ┆ ---    │
# │ i64   ┆ str   ┆ i64    │
# ╞═══════╪═══════╪════════╡
# │ -1    ┆ No    ┆ null   │
# │ 0     ┆ No    ┆ null   │
# │ 1     ┆ No    ┆ null   │
# │ 2     ┆ Yes   ┆ 1      │
# │ 2     ┆ Yes   ┆ 2      │
# │ 3     ┆ Yes   ┆ 3      │
# │ 4     ┆ Yes   ┆ 4      │
# │ 5     ┆ Yes   ┆ 5      │
# └───────┴───────┴────────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With