I've implemented the following neural network to solve the XOR problem in Python. My neural network consists of an input layer of 2 neurons, 1 hidden layer of 2 neurons and an output layer of 1 neuron. I am using the Sigmoid function as the activation function for the hidden layer and the linear (identity) function as the activation function for the output layer:
import numpy as np
def sigmoid(z):
    return 1/(1+np.exp(-z))
def s_prime(z):
    return np.multiply(sigmoid(z), sigmoid(1.0-z))
def init_weights(layers, epsilon):
    weights = []
    for i in range(len(layers)-1):
        w = np.random.rand(layers[i+1], layers[i]+1)
        w = w * 2*epsilon - epsilon
        weights.append(np.mat(w))
    return weights
def fit(X, Y, w, predict=False, x=None):
    w_grad = ([np.mat(np.zeros(np.shape(w[i]))) 
              for i in range(len(w))])
    for i in range(len(X)):
        x = x if predict else X[0]
        y = Y[0,i]
        # forward propagate
        a = x
        a_s = []
        for j in range(len(w)):
            a = np.mat(np.append(1, a)).T
            a_s.append(a)
            z = w[j] * a
            a = sigmoid(z)
        if predict: return a
        # backpropagate
        delta = a - y.T
        w_grad[-1] += delta * a_s[-1].T
        for j in reversed(range(1, len(w))):
            delta = np.multiply(w[j].T*delta, s_prime(a_s[j]))
            w_grad[j-1] += (delta[1:] * a_s[j-1].T)
    return [w_grad[i]/len(X) for i in range(len(w))]
def predict(x):
    return fit(X, Y, w, True, x)
####
X = np.mat([[0,0],
            [0,1],
            [1,0],
            [1,1]])
Y = np.mat([0,1,1,0])
layers = [2,2,1]
epochs = 10000
alpha = 0.5
w = init_weights(layers, 1)
for i in range(epochs):
    w_grad = fit(X, Y, w)
    print w_grad
    for j in range(len(w)):
        w[j] -= alpha * w_grad[j]
for i in range(len(X)):
    x = X[i]
    guess = predict(x)
    print x, ":", guess
The backpropagation seems to all be correct; the only issue that comes to mind would be some problem with my implementation of the bias units. Either way, all predications for each input converge to approximately 0.5 each time I run the code. I've scoured the code and can't seem to find what's wrong. Can anyone point what's wrong with my implementation? I appreciate any feedback.
If for any reason it might help, here's the kind of output I'm getting:
[[0 0]] : [[ 0.5]]
[[0 1]] : [[ 0.49483673]]
[[1 0]] : [[ 0.52006739]]
[[1 1]] : [[ 0.51610963]]
Linearly separable data basically means that you can separate data with a point in 1D, a line in 2D, a plane in 3D and so on. A perceptron can only converge on linearly separable data. Therefore, it isn't capable of imitating the XOR function.
Our lab demonstrates that single human layer 2/3 neurons can compute the XOR operation.
Among various logical gates, the XOR or also known as the “exclusive or” problem is one of the logical operations when performed on binary inputs that yield output for different combinations of input, and for the same combination of input no output is produced.
It is absolutely possible to solve the XOR problem with only two neurons.
Your implementation of forward and backpropagation is more or less correct.  However, where you're going wrong is quite simple.  The first small error is to look inside your fit function - specifically the first statement inside your for loop:
x = x if predict else X[0]
You are saying that if you aren't predicting (i.e. performing training), the input example chosen during each iteration of Stochastic Gradient Descent must always be the first example, which is [0 0] (i.e. X[0]).  This is the reason why you are getting 0.5 for all of your predictions because you are only training using the first input.  You need to change this so that it reads the correct example, which is example i:
x = x if predict else X[i]
The last change you need to make is your s_prime function.  The derivative of the sigmoid function is indeed what you have there:
def s_prime(z):
    return np.multiply(sigmoid(z), sigmoid(1.0-z))
When you calculate the forward propagation, you have already computed the output activations of each neuron in a_s, so when you compute the local derivative at these neurons, you supply the output activations directly to s_prime so there is no need for you to compute the sigmoid of these again.
Therefore:
def s_prime(z):
    return np.multiply(z, 1.0-z)
Once I made these two changes, we now get this output:
[[0 0]] : [[ 0.00239857]]
[[0 1]] : [[ 0.99816778]]
[[1 0]] : [[ 0.99816596]]
[[1 1]] : [[ 0.0021052]]
You can see that this agrees with the expected output of the XOR gate more or less.  One last thing I can recommend is that 10000 iterations is far too long computationally given your current code structure.  I noticed that with the above corrections, we are able to reach the expected output in fewer iterations.  I've decreased the iterations to 1000 and I've bumped the learning rate alpha up to 0.75.  Changing these two things we now get:
[[0 0]] : [[ 0.03029435]]
[[0 1]] : [[ 0.95397528]]
[[1 0]] : [[ 0.95371525]]
[[1 1]] : [[ 0.04796917]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With