I am trying to put my dataset into the MATLAB [ranked,weights] = relieff(X,Ylogical,10, 'categoricalx', 'on') function to rank the importance of my predictor features. The dataset<double n*m> has n observations and m discrete (i.e. categorical) features. It happens that each observation (row) in my dataset has at least one NaN value. These NaNs represent unobserved, i.e. missing or null, predictor values in the dataset. (There is no corruption in the dataset, it is just incomplete.)
relieff() uses this function below to remove any rows that contain a NaN:
function [X,Y] = removeNaNs(X,Y)
% Remove observations with missing data
NaNidx = bsxfun(@or,isnan(Y),any(isnan(X),2));
X(NaNidx,:) = [];
Y(NaNidx,:) = [];
This is not ideal, especially for my case, since it leaves me with X=[] and Y=[] (i.e. no observations!)
In this case:
1) Would replacing all NaN's with a random value, e.g. 99999, help? By doing this, I am introducing a new feature state for all the predictor features so I guess it is not ideal.
2) or is replacing NaNs with the mode of the corresponding feature column vector (as below) statistically more sound? (I am not vectorising for clarity's sake)
function [matrixdata] = replaceNaNswithModes(matrixdata)
for i=1: size(matrixdata,2)
cv= matrixdata(:,i);
modevalue= mode(cv);
cv(find(isnan(cv))) = modevalue;
matrixdata(:,i) = cv;
end
3) Or any other sensible way that would make sense for "categorical" data?
P.S: This link gives possible ways to handle missing data.
I suggest to use a table instead of a matrix. Then you have functions such as ismissing (for the entire table), and isundefined to deal with missing values for categorical variables.
T = array2table(matrix);
T = standardizeMissing(T); % NaN is standard for double but this
% can be useful for other data type
var1 = categorical(T.var1);
missing = isundefined(var1);
T = T(missing,:); % removes lines with NaN
matrix = table2array(T);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With