I am using the polars library in python and have two data frames the look like this
import polars as pl
data1 = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df1 = pl.DataFrame(data1)
data2 = {
'B': [4, 5, 6],
'C': [7, 8, 9],
'D': [1, 2, 3]
}
df2 = pl.DataFrame(data2)
# the column B and C are same in both data frames
# TODO: Join/Concat the data frames into one.
The data2 can vary some time it can have 2 common columns, some time it can have 1 common column and some times more. and The result should look like. I was wondering if there is any built function, or some kind of flag in a function that exists already in polars, that i can use.
result = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9],
'D': [1, 2, 3]
}
I am not quite sure how to join
or concat
the polar data frames in order to achieve this.
You can concat with how="align"
but the resulting column order differs.
pl.concat([df1, df2], how="align")
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ B ┆ C ┆ A ┆ D │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 4 ┆ 7 ┆ 1 ┆ 1 │
│ 5 ┆ 8 ┆ 2 ┆ 2 │
│ 6 ┆ 9 ┆ 3 ┆ 3 │
└─────┴─────┴─────┴─────┘
You can see how it is implemented here.
It basically finds the common columns to use as on
and "outer joins" all the frames together.
dfs = [df1, df2]
cols = pl.concat(pl.Series(df.columns) for df in dfs)
join_cols = list(cols.filter(cols.is_duplicated()).unique())
result = dfs[0].join(dfs[1], how="outer", on=join_cols, suffix="")
# only needed for more than 2 frames
# for df in dfs[2:]:
# result = result.join(df, how="outer", on=join_cols, suffix="")
result = result.select(*cols.unique(maintain_order=True))
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ A ┆ B ┆ C ┆ D │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1 ┆ 4 ┆ 7 ┆ 1 │
│ 2 ┆ 5 ┆ 8 ┆ 2 │
│ 3 ┆ 6 ┆ 9 ┆ 3 │
└─────┴─────┴─────┴─────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With