I have implemented a Gauss-Newton optimization process which involves calculating the increment by solving a linearized system Hx = b. The H matrx is calculated by H = J.transpose() * W * J and b is calculated from b = J.transpose() * (W * e) where e is the error vector. Jacobian here is a n-by-6 matrix where n is in thousands and stays unchanged across iterations and W is a n-by-n diagonal weight matrix which will change across iterations (some diagonal elements will be set to zero). However I encountered a speed issue.
When I do not add the weight matrix W, namely H = J.transpose()*J and b = J.transpose()*e, my Gauss-Newton process can run very fast in 0.02 sec for 30 iterations. However when I add the W matrix which is defined outside the iteration loop, it becomes so slow (0.3~0.7 sec for 30 iterations) and I don't understand if it is my coding problem or it normally takes this long.
Everything here are Eigen matrices and vectors.
I defined my W matrix using .asDiagonal() function in Eigen library from a vector of inverse variances. then just used it in the calculation for H ad b. Then it gets very slow. I wish to get some hints about the potential reasons for this huge slowdown.
EDIT:
There are only two matrices. Jacobian is definitely dense. Weight matrix is generated from a vector by the function vec.asDiagonal() which comes from the dense library so I assume it is also dense. 
The code is really simple and the only difference that's causing the time change is the addition of the weight matrix. Here is a code snippet:
for (int iter=0; iter<max_iter; ++iter) {
    // obtain error vector
    error = ...  
    // calculate H and b - the fast one
    Eigen::MatrixXf H = J.transpose() * J;
    Eigen::VectorXf b = J.transpose() * error;
    // calculate H and b - the slow one
    Eigen::MatrixXf H = J.transpose() * weight_ * J;
    Eigen::VectorXf b = J.transpose() * (weight_ * error);
    // obtain delta and update state
    del = H.ldlt().solve(b);
    T <- T(del)   // this is pseudo code, meaning update T with del
}
It is in a function in a class, and weight matrix now for debug purposes is defined as a class variable that can be accessed by the function and is defined before the function is called.
Writing efficient matrix product expressions In general achieving good performance with Eigendoes no require any special effort: simply write your expressions in the most high level way. This is especially true for small fixed size matrices.
Bookmark this question. Show activity on this post. I tried to rewrite code from Fortran to C++ with a 2000*2000 matrix multiplication implements through Eigen library. I found that for loop in Eigen is much slower (>3x) than do loop in Fortran.
Indeed, in Eigenwe have implemented a set of highly optimized routines which are very similar to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and natural API. Each of these routines can compute in a single evaluation a wide variety of expressions.
I found that for loop in Eigen is much slower (>3x) than do loop in Fortran. The codes are listed below:
I guess that weight_ is declared as a dense MatrixXf? If so, then replace it by w.asDiagonal() everywhere you use weight_, or make the later an alias to the asDiagonal expression:
auto weight = w.asDiagonal();
This way Eigen will knows that weight is a diagonal matrix and computations will be optimized as expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With