Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Softmax function in neural network (Python)

I am learning the neural network and implement it in python. I firstly define a softmax function, I follow the solution given by this question Softmax function - python. The following is my codes:

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    s = 0
    e = np.exp(A)
    s = e / np.sum(e, axis =0)
    return s

I was given a test codes to see if the sofmax function is correct. The test_array is the test data and test_output is the correct output for softmax(test_array). The following is the test codes:

# Test if your function works correctly.
test_array = np.array([[0.101,0.202,0.303],
                       [0.404,0.505,0.606]]) 
test_output = [[ 0.30028906,  0.33220277,  0.36750817],
               [ 0.30028906,  0.33220277,  0.36750817]]
print(np.allclose(softmax(test_array),test_output))

However according to the softmax function that I defined. Testing the data by softmax(test_array) returns

print (softmax(test_array))

[[ 0.42482427  0.42482427  0.42482427]
 [ 0.57517573  0.57517573  0.57517573]]

Could anyone indicate me what is the problem of the function softmax that I defined?

like image 313
Jassy.W Avatar asked Mar 13 '26 14:03

Jassy.W


2 Answers

The problem is in your sum. You are summing in axis 0 where you should keep axis 0 untouched.

To sum over all the entries in the same example, i.e., in the same line, you have to use axis 1 instead.

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    e = np.exp(A)
    return e / np.sum(e, axis=1, keepdims=True)

Use keepdims to preserve shape and be able to divide e by the sum.

In your example, e evaluates to:

[[ 1.10627664  1.22384801  1.35391446]
 [ 1.49780395  1.65698552  1.83308438]]

then the sum for each example (denominator in the return line) is:

[[ 3.68403911]
 [ 4.98787384]]

The function then divides each line by its sum and gives the result you have in test_output.

As MaxU pointed out, it is a good practice to remove the max before exponentiating, in order to avoid overflow:

e = np.exp(A - np.sum(A, axis=1, keepdims=True))
like image 82
grovina Avatar answered Mar 16 '26 03:03

grovina


Try this:

In [327]: def softmax(A):
     ...:     e = np.exp(A)
     ...:     return  e / e.sum(axis=1).reshape((-1,1))

In [328]: softmax(test_array)
Out[328]:
array([[ 0.30028906,  0.33220277,  0.36750817],
       [ 0.30028906,  0.33220277,  0.36750817]])

or better this version which will prevent overflow when large values are exponentiated:

def softmax(A):
    e = np.exp(A - np.max(A, axis=1).reshape((-1, 1)))
    return  e / e.sum(axis=1).reshape((-1,1))
like image 42
MaxU - stop WAR against UA Avatar answered Mar 16 '26 04:03

MaxU - stop WAR against UA