I am using the Eigen library in C++: I am currently calculating the covariance matrix myself as follows:
Eigen::MatrixXd covariance_matrix = Eigen::MatrixXd::Constant(21, 21, 0);
data mean = calc_mean(all_data)
for(int j = 0; j < 21; j++){
    for(int k = 0; k < 21; k++){
        for(std::vector<data>::iterator it = all_data.begin(); it!= all_data.end(); it++){
            covariance_matrix(j,k) += ((*it)[j] - mean[j]) * ((*it)[k] - mean[k]);
        }
        covariance_matrix(j,k) /= all_data.size() - 1;
    }
}
Is there an inbuilt/more optimized way to do this with the Eigen library? For example if I store my data in a MatrixXd where each row is an observation and each column a feature?
Thanks
The only difference in formula for Population Covariance and Sample Covariance lies in the fact that Population Covariance is calculated over the entire dataset(N) whereas Sample Covariance is calculated over a sample (N-1), so that the denominator of the Population Covariance is 1 larger than that of the Sample ...
Covariance in Excel: StepsStep 1: Enter your data into two columns in Excel. For example, type your X values into column A and your Y values into column B. Step 2: Click the “Data” tab and then click “Data analysis.” The Data Analysis window will open. Step 3: Choose “Covariance” and then click “OK.”
Using Eigen expressions will leverage SIMD and cache optimized algorithms, so yes it should definitely be faster, and in any case, much simpler to write:
MatrixXd centered = mat.rowwise() - mat.colwise().mean();
MatrixXd cov = (centered.adjoint() * centered) / double(mat.rows() - 1);
Moreover, assuming "data" is a typedef for a double[21], then you can use the Map<> feature to view your std::vector as an Eigen object:
Map<Matrix<double,Dynamic,21,RowMajor> > mat(&(all_data[0][0], all_data.size(), 21);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With