I want to split my data variable into different variables a b and c, and apply mean to the bins (1st dimension). Is there way to substantially (e.g. 1x order of magnitude) improve this code in terms of speed? General feedback welcome
data=rand(20,1000); %generate data
bins=[5 10 5]; %given size of bins
start_bins=cumsum([1 bins(1:end-1)]);
end_bins=cumsum([bins]);
%split the data into 3 cell arrays and apply mean in 1st dimension
binned_data=cellfun(@(x,y) mean(data(x:y,:),1),num2cell(start_bins),num2cell(end_bins),'uni',0);
%data (explicitly) has be stored into different variables
[a,b,c]=deal(binned_data{:});
whos a b c
Name Size Bytes Class Attributes
a 1x1000 8000 double
b 1x1000 8000 double
c 1x1000 8000 double
You can use splitapply (accumarray's slightly friendlier little brother):
% Your example
data = rand(20,1000); % generate data
bins = [5 10 5]; % given size of bins
% Calculation
bins = repelem(1:numel(bins), bins).'; % Bin sizes to group labels
binned_data = splitapply( @mean, data, bins ); % splitapply for calculation
The rows of binned_data are your a, b and c.
The mean can be applied before the splitting, which reduces the data to a vector, and then accumarray can be used:
binned_data = accumarray(repelem(1:numel(bins), bins).', mean(data,2), [], @(x){x.'});
accumarray1 does not work with matrix data. But you can use sparse, which automatically accumulates data values corresponding to the same indices:
ind_rows = repmat(repelem((1:numel(bins)).', bins), 1, size(data,2));
ind_cols = repmat(1:size(data,2), size(data,1), 1);
binned_data = sparse(ind_rows, ind_cols, data);
binned_data = bsxfun(@rdivide, binned_data, bins(:));
binned_data = num2cell(binned_data, 2).';
But splitapply does. See @Wolfie's answer.
You can use matrix multiplication:
r = 1:numel(bins);
result = (r.' == repelem(r,bins)) * data .* (1./bins(:));
If you want the output as cell:
result = num2cell(result,2);
For large matrices it is better to use sparse matrix:
result = sparse(r.' == repelem(r,bins)) * data .* (1./bins(:));
Note: In previous versions of MATLAB you should use bsxfun:
result = bsxfun(@times,bsxfun(@eq, r.',repelem(r,bins)) * data , (1./bins(:)))
Here is the result of timing for three proposed methods in Octave:
Matrix Multiplication:
0.00197697 seconds
Accumarray:
0.00465298 seconds
Cellfun:
0.00718904 seconds
EDIT : For a 200 x 100000 matrix :
Matrix Multiplication:
0.806947 seconds sparse: 0.2331 seconds
Accumarray:
0.0398011 seconds
Cellfun:
0.386079 seconds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With