I am learning the neural network and implement it in python. I firstly define a softmax function, I follow the solution given by this question Softmax function - python. The following is my codes:
def softmax(A):
"""
Computes a softmax function.
Input: A (N, k) ndarray.
Returns: (N, k) ndarray.
"""
s = 0
e = np.exp(A)
s = e / np.sum(e, axis =0)
return s
I was given a test codes to see if the sofmax function is correct. The test_array is the test data and test_output is the correct output for softmax(test_array). The following is the test codes:
# Test if your function works correctly.
test_array = np.array([[0.101,0.202,0.303],
[0.404,0.505,0.606]])
test_output = [[ 0.30028906, 0.33220277, 0.36750817],
[ 0.30028906, 0.33220277, 0.36750817]]
print(np.allclose(softmax(test_array),test_output))
However according to the softmax function that I defined. Testing the data by softmax(test_array) returns
print (softmax(test_array))
[[ 0.42482427 0.42482427 0.42482427]
[ 0.57517573 0.57517573 0.57517573]]
Could anyone indicate me what is the problem of the function softmax that I defined?
The problem is in your sum. You are summing in axis 0 where you should keep axis 0 untouched.
To sum over all the entries in the same example, i.e., in the same line, you have to use axis 1 instead.
def softmax(A):
"""
Computes a softmax function.
Input: A (N, k) ndarray.
Returns: (N, k) ndarray.
"""
e = np.exp(A)
return e / np.sum(e, axis=1, keepdims=True)
Use keepdims to preserve shape and be able to divide e by the sum.
In your example, e evaluates to:
[[ 1.10627664 1.22384801 1.35391446]
[ 1.49780395 1.65698552 1.83308438]]
then the sum for each example (denominator in the return line) is:
[[ 3.68403911]
[ 4.98787384]]
The function then divides each line by its sum and gives the result you have in test_output.
As MaxU pointed out, it is a good practice to remove the max before exponentiating, in order to avoid overflow:
e = np.exp(A - np.sum(A, axis=1, keepdims=True))
Try this:
In [327]: def softmax(A):
...: e = np.exp(A)
...: return e / e.sum(axis=1).reshape((-1,1))
In [328]: softmax(test_array)
Out[328]:
array([[ 0.30028906, 0.33220277, 0.36750817],
[ 0.30028906, 0.33220277, 0.36750817]])
or better this version which will prevent overflow when large values are exponentiated:
def softmax(A):
e = np.exp(A - np.max(A, axis=1).reshape((-1, 1)))
return e / e.sum(axis=1).reshape((-1,1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With