I'd like to apply a user-define function which takes a few inputs (corresponding some columns in a polars DataFrame) to some columns of a polars DataFrame in Rust. The pattern that I'm using is as below. I wonder is this the best practice?
fn my_filter_func(col1: &Series, col2: &Series, col2: &Series) -> ReturnType {
let it = (0..n).map(|i| {
let col1 = match col.get(i) {
AnyValue::UInt64(val) => val,
_ => panic!("Wrong type of col1!"),
};
// similar for col2 and col3
// apply user-defined function to col1, col2 and col3
}
// convert it to a collection of the required type
}
You can downcast the Series to the proper type you want to iterate over, and then use rust iterators to apply your logic.
fn my_black_box_function(a: f32, b: f32) -> f32 {
// do something
a
}
fn apply_multiples(col_a: &Series, col_b: &Series) -> Float32Chunked {
match (col_a.dtype(), col_b.dtype()) {
(DataType::Float32, DataType::Float32) => {
let a = col_a.f32().unwrap();
let b = col_b.f32().unwrap();
a.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect()
}
_ => panic!("unpexptected dtypes"),
}
}
You don't have to leave the lazy API to be able to access my_black_box_function.
We can collect the columns we want to apply in a Struct data type and then apply a closure over that Series.
fn apply_multiples(lf: LazyFrame) -> Result<DataFrame> {
df![
"a" => [1.0, 2.0, 3.0],
"b" => [3.0, 5.1, 0.3]
]?
.lazy()
.select([concat_lst(["col_a", "col_b"]).map(
|s| {
let ca = s.struct_()?;
let b = ca.field_by_name("col_a")?;
let a = ca.field_by_name("col_b")?;
let a = a.f32()?;
let b = b.f32()?;
let out: Float32Chunked = a
.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect();
Ok(out.into_series())
},
GetOutput::from_type(DataType::Float32),
)])
.collect()
}
The solution I found working for me is with map_multiple(my understanding - this to be used if no groupby/agg) or apply_multiple(my understanding - whenerver you have groupby/agg). Alternatively, you could also use map_many or apply_many. See below.
use polars::prelude::*;
use polars::df;
fn main() {
let df = df! [
"names" => ["a", "b", "a"],
"values" => [1, 2, 3],
"values_nulls" => [Some(1), None, Some(3)],
"new_vals" => [Some(1.0), None, Some(3.0)]
].unwrap();
println!("{:?}", df);
//df.try_apply("values_nulls", |s: &Series| s.cast(&DataType::Float64)).unwrap();
let df = df.lazy()
.groupby([col("names")])
.agg( [
total_delta_sens().sum()
]
);
println!("{:?}", df.collect());
}
pub fn total_delta_sens () -> Expr {
let s: &mut [Expr] = &mut [col("values"), col("values_nulls"), col("new_vals")];
fn sum_fa(s: &mut [Series])->Result<Series>{
let mut ss = s[0].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap().clone();
for i in 1..s.len(){
ss = ss.add_to(&s[i].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap()).unwrap();
}
Ok(ss)
}
let o = GetOutput::from_type(DataType::Float64);
map_multiple(sum_fa, s, o)
}
Here total_delta_sens is just a wrapper function for convenience. You don't have to use it.You can do directly this within your .agg([]) or .with_columns([]) :
lit::<f64>(0.0).map_many(sum_fa, &[col("norm"), col("uniform")], o)
Inside sum_fa you can as Richie already mentioned downcast to ChunkedArray and .iter() or even .par_iter() Hope that helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With