Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the right way to populate a Rust NDarray from an Iterator over structs?

My use-case (likely a common one) is loading data from a database:

conn.prep_exec("select * from my_table", ())
    .unwrap()
    .map(|row| from_row::<MyRowStruct>(row.unwrap()))

The most naive (and very boilerplate-y, and very inefficient) way of doing this would be to collect the iterator into a Vec, use the length of the Vec (and the known number of fields) to initialize an Array, and then copy the values into it one by one.

There are two main issues with this:

  1. I would need to take each row's columns, which have already been arranged into named struct properties, and manually arrange them back out into array indices
  2. The entire dataset would go through a needless copy, instead of getting stored in memory once and then used as an NDarray

#1 seems like an opportunity for a derive-macro, similar to how serde does it

#2, seems like maybe a Vec of structs couldn't be converted as-is to an NDarray since the memory layout would be different. So maybe it would need to be "read in" at the iterator stage? Except NDarrays can't be grown dynamically. What I'm basically looking for is what Python's pandas library does (except that that library includes the DB stuff under the same roof as the matrix stuff).

Are there trait-shenanigans I can play here (From, Serialize, Deserialize)? What's the standard practice? Surely I can't be the first person trying to do this?

like image 498
Brandon Avatar asked Sep 14 '25 13:09

Brandon


1 Answers

The relevant conversion is done by this From instance – unfortunately the ndarray documentation makes it easy to miss. To use it, you can convert each row into an array ([T; n], not ndarray's array), then collect those into a Vec.

// example
struct MyRowStruct {
    a: f64,
    b: f64,
    c: f64,
}

let arr: Array2<f64> = iter
    .map(|row| [row.a, row.b, row.c])
    .collect::<Vec<_>>()
    .into();

The vector that is collected into is directly used as the backing storage for the array, like you said. It doesn't address the first issue, but that depends on your concrete struct: I think you'd need to look at implementing FixedInitializer on it directly.

like image 163
nnnmmm Avatar answered Sep 16 '25 13:09

nnnmmm