I have a multilabel classification problem, I used the following code but the validation accuracy jumps to 99% in the first epoch which is weird given the complexity of the data as the input features are 2048 extracted from inception model (pool3:0) layer and the labels are [1000],(here is the link of a file contains samples of features and label : https://drive.google.com/file/d/0BxI_8PO3YBPPYkp6dHlGeExpS1k/view?usp=sharing ), is there something I am doing wrong here ??
Note: labels are sparse vector contain only 1 ~ 10 entry as 1 the rest is zeros
model.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['accuracy']) 
The output of prediction is zeros !
What wrong I do in training the model to bother the prediction ?
#input is the features file and labels file
def generate_arrays_from_file(path ,batch_size=100):
x=np.empty([batch_size,2048])
y=np.empty([batch_size,1000])
while True:
    f = open(path)
    i = 1  
    for line in f:
        # create Numpy arrays of input data
        # and labels, from each line in the file
        words=line.split(',')
        words=map(float, words[1:])
        x_= np.array(words[0:2048])
        y_=words[2048:]
        y_= np.array(map(int,y_))
        x_=x_.reshape((1, -1))
        #print np.squeeze(x_)
        y_=y_.reshape((1,-1))
        x[i]= x_
        y[i]=y_
        i += 1
        if i == batch_size:
            i=1
            yield (x, y)
    f.close()
model = Sequential()
model.add(Dense(units=2048, activation='sigmoid', input_dim=2048))
model.add(Dense(units=1000, activation="sigmoid", 
kernel_initializer="uniform"))
model.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=
['accuracy'])
model.fit_generator(generate_arrays_from_file('train.txt'),
                validation_data= generate_arrays_from_file('test.txt'),
                validation_steps=1000,epochs=100,steps_per_epoch=1000, 
                  verbose=1)
I think the problem with the accuracy is that your output are sparse.
Keras computes accuracy using this formula:
K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
So, in your case, having only 1~10 non zero labels, a prediction of all 0 will yield an accuracy of 99.9% ~ 99%.
As far as the problem not learning, I think the problem is that you are using a sigmoid as last activation and using 0 or 1 as output value. This is bad practice since, in order for the sigmoid to return 0 or 1 the values it gets as input must be very large or very small, which reflects on the net having very large (in absolute value) weights. Furthermore, since in each training output there are far less 1 than 0 the network will soon get to a stationary point in which it simply outputs all zeros (the loss in this case is not very large either, should be around 0.016~0.16).
What you can do is scale your output labels so that they are between (0.2, 0.8) for example so that the weights of the net won't become too big or too small. Alternatively you can use a relu as activation function.
Did you try to use the cosine similarity as loss function?
I had the same multi-label + high dimensionality problem.
The cosine distance takes account of the orientation of the model output (prediction) and the desired output (true class) vector.
It is the normalized dot-product between two vectors.
In keras the cosine_proximity function is -1*cosine_distance. Meaning that -1 corresponds to two vectors with the same size and orientation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With