Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

With Python Polars, how to compare two frame (like with `==`), but while returning True/False on comparisons with null

With python polars, how can I compare two dataframes (like with ==), and get a comparison result per-cell, but while returning True/False on comparisons with null. By default, doing df1 == df2 results in null being in any cells where either df1 or df2 contains a null.

For example:

df1 = pl.DataFrame(
    {
        "a": [1, 2, 3, None, 5],
        "b": [5, 4, 3, 2, None],
    }
)


df2 = pl.DataFrame(
    {
        "a": [1, 2, 3, 1, 5],
        "b": [5, 4, 30, 2, None],
    }
)


print(f"df1: {df1}")
print(f"df2: {df2}")

print(f"df1 == df2: {df1 == df2}")

Results in:

df1: shape: (5, 2)
┌──────┬──────┐
│ a    ┆ b    │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 1    ┆ 5    │
│ 2    ┆ 4    │
│ 3    ┆ 3    │
│ null ┆ 2    │
│ 5    ┆ null │
└──────┴──────┘
df2: shape: (5, 2)
┌─────┬──────┐
│ a   ┆ b    │
│ --- ┆ ---  │
│ i64 ┆ i64  │
╞═════╪══════╡
│ 1   ┆ 5    │
│ 2   ┆ 4    │
│ 3   ┆ 30   │
│ 1   ┆ 2    │
│ 5   ┆ null │
└─────┴──────┘
df1 == df2: shape: (5, 2)
┌──────┬───────┐
│ a    ┆ b     │
│ ---  ┆ ---   │
│ bool ┆ bool  │
╞══════╪═══════╡
│ true ┆ true  │
│ true ┆ true  │
│ true ┆ false │
│ null ┆ true  │
│ true ┆ null  │
└──────┴───────┘

However, I'm trying to determine how to get the following result:

df1 compared to df2: shape: (5, 2)
┌──────┬───────┐
│ a    ┆ b     │
│ ---  ┆ ---   │
│ bool ┆ bool  │
╞══════╪═══════╡
│ true ┆ true  │
│ true ┆ true  │
│ true ┆ false │
│false ┆ true  │ <- false b/c cell is null in one DF, and a value in the other
│ true ┆ true  │ <- bottom-right cell is true
└──────┴───────┘     because df1 and df2 have the same value (null)
like image 712
HumpbackWhale194 Avatar asked Oct 30 '25 03:10

HumpbackWhale194


2 Answers

It's a little more verbose that df1 == df2, but to have pure polars solution you can use eq_missing(). In addition, you can use the fact that df[col] returns column as pl.Series:

df1.select(pl.col(c).eq_missing(df2[c]) for c in df1.columns)

┌───────┬───────┐
│ a     ┆ b     │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ true  ┆ true  │
│ true  ┆ true  │
│ true  ┆ false │
│ false ┆ true  │
│ true  ┆ true  │
└───────┴───────┘
like image 70
Roman Pekar Avatar answered Nov 01 '25 17:11

Roman Pekar


You can easily check how polars implements pl.DataFrame.__eq__ under the hood. A helper function similar to the polars implementation, but relying on pl.Expr.eq_missing instead (as already mentioned in the answer above), could look as follows.

def _compare_to_other_df_missing(df, other):
    """
    Compare a DataFrame with another DataFrame respecting `None == None`.

    This differs from default comparison where null values are propagated.
    """
    if df.columns != other.columns:
        msg = "DataFrame columns do not match"
        raise ValueError(msg)
    if df.shape != other.shape:
        msg = "DataFrame dimensions do not match"
        raise ValueError(msg)

    suffix = "__POLARS_CMP_OTHER"
    other_renamed = other.select(pl.all().name.suffix(suffix))
    combined = pl.concat([df, other_renamed], how="horizontal")
    expr = [pl.col(n).eq_missing(pl.col(f"{n}{suffix}")) for n in df.columns]

    return combined.select(expr)
>>> _compare_to_other_df_missing(df1, df2)
shape: (5, 2)
┌───────┬───────┐
│ a     ┆ b     │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ true  ┆ true  │
│ true  ┆ true  │
│ true  ┆ false │
│ false ┆ true  │
│ true  ┆ true  │
└───────┴───────┘
like image 40
Hericks Avatar answered Nov 01 '25 18:11

Hericks



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!