I have a data matrix X (60x208) and a matrix of labels Y (1x208). I want to split my data matrix X into two random subsets of column vectors: training (which will be 70% of the data) and testing (which will be 30% of the data), but I need to still be able to identify which label from Y corresponds to each column vector. I couldn't find any function to do this, any ideas?
EDIT: Thought I should add, there are only two labels in Y: 1 and 2 (not sure if this makes a difference)
Definition: A = B − C is a regular splitting of A if B−1 ≥ 0 and C ≥ 0. The matrix D = B−1C has nonnegative entries if (2) represents a regular splitting of A. represents the spectral radius of D, and thus D is a convergent matrix. As a consequence, the iterative method (5) is necessarily convergent.
In MATLAB you can also create a column vector using square brackets [ ]. However, elements of a column vector are separated either by a semicolon ; or a newline (what you get when you press the Enter key). Create a column vector x with elements x1 = 1, x2 = -2 and x3 = 5.
That's pretty easy to do.  Use randperm to generate a random permutation of indices from 1 up to as many points as you have... which is 208 in your case.
Once you generate this sequence, simply use this and subset into your X and Y to extract the training and test data and labels.  As such, do something like this:
num_points = size(X,2);
split_point = round(num_points*0.7);
seq = randperm(num_points);
X_train = X(:,seq(1:split_point));
Y_train = Y(seq(1:split_point));
X_test = X(:,seq(split_point+1:end));
Y_test = Y(seq(split_point+1:end));
The split_point determines how many points we need to place into our training set, and we will need to round it in case this calculation yields any decimal points.  I also didn't hard code 208 in there because your data set might grow and so this will work with any size data set you choose.  X_train and Y_train will contain your data and labels for your training set while X_test and Y_test will contain your data and labels for your test set.
As such, the first column of X_train is your data point for the first element of your training set, with the first element of Y_train serving as the label for that particular point... and so on and so forth!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With