Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to input Scikit learn MLP classifier with variable length of input data.

I want to run simple MLP Classifier (Scikit learn) with following set of data.

Data set consists of 100 files, containing sound signals. Each file has two columns (two signals) and rows (length of the signals). The length of rows (signals) vary from file to file ranges between 70 to 80 values. So the dimensions of file are 70 x 2 to 80 x 2. Each file represent one complete record.

enter image description here

The problem I am facing how to train simple MLP with variable length of data, with training and testing set contains 75 and 25 files respectively.

One solution is to concatenate all file and make one file i.e. 7500 x 2 and train MLP. But important information of signals is no longer useful in this case.

like image 346
ZEESHAN Avatar asked Oct 12 '25 22:10

ZEESHAN


1 Answers

Three approaches in order of usefulness. Approach 1 is strongly recommended.

1st Approach - LSTM/GRU

You don't use simple MLP. The type of data you're dealing with is a sequential data. Recurrent networks (LSTM/GRU) have been created for this purpose. They are capable of processing variable length sequences.

2nd Approach - Embeddings

Find a function that can transform your data into a fixed-length sequence, called embedding. An example of network producing time series embedding is TimeNet. However, that essentially brings us back to the first approach.

3rd Approach - Padding

If you you can find a reasonable upper bound for the length of sequence, you can pad shorter series to the length of the longest one (pad 0 at the beginning/end of the series, interpolate/forecast the remaining values), or cut longer series to the length of the shortest one. Obviously you will either introduce noise or lose information, respectively.

like image 183
Aechlys Avatar answered Oct 14 '25 11:10

Aechlys