Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predict if a number is odd or even using Logistic Regression formula y = x % 2 + 0

Given array of numbers from 1-20 (X_train) and array of binary values from 0 or 1 (y_train) passing it to Logistic Regression algorithm and then training the model. Trying to predict with below X_test gives me incorrect data.

Created the sample train and test data as shown below. Please suggest what's wrong with the code.

import numpy as np

X_train = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], dtype=float).reshape(-1, 1)
y_train = np.array([1, 0, 1, 0, 1, 0, 1, 0, 1,  0,  1,  0,  1,  0,  1,  0,  1,  0,  1,  0], dtype=float)
X_test = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 55, 88, 99, 100], dtype=float).reshape(-1, 1)

from sklearn import linear_model
logreg = linear_model.LogisticRegression()
logreg.fit(X_train, y_train)
y_predict = logreg.predict(X_test)
print(y_predict)

Output :
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0.]
like image 498
rsa9 Avatar asked Dec 13 '25 22:12

rsa9


2 Answers

A similar question was already asked here. I would like to use this post as inspiration for my solution.

But first let me mention two things:

  1. A logistic regression is very beneficial in terms of time, performance and explainability if you have some kind of nested-linear relationships between your feature(s) and label, but obviously that is not the case for your example. You want to estimate a discontinous function that equals one if your input is odd and zero otherwise, which is not easily achieved.

  2. Your data representation is not good. I think this point is more critical for your prediction goal as a better data representation does lead to a better prediction.

Next, I would like to share an alternative data representation. This new representation does yield perfect prediction results, even for a simple untuned logistic regression.

Code:

import numpy as np
from sklearn import linear_model

X_train = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
y_train = np.array([1, 0, 1, 0, 1, 0, 1, 0, 1,  0,  1,  0,  1,  0,  1,  0,  1,  0,  1,  0], dtype=float)
X_test = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 55, 88, 99, 100])

def convert_repr(x):
    return list(map(int, list(str(format(x, '016b')))))

# Change data representation
X_train = np.array(list(map(convert_repr, X_train)))
X_test = np.array(list(map(convert_repr, X_test)))

logreg = linear_model.LogisticRegression()
logreg.fit(X_train, y_train)
y_predict = logreg.predict(X_test)
print(y_predict)

Output:

[1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0.]

As you can see, the data is more important than the actual model.

like image 135
ko3 Avatar answered Dec 16 '25 14:12

ko3


It is an ill-posed task: for logistic regression to work, there should be some point on the x axis which separates the high probability region from the low probability region of the target class (as you see in the test output). With alternating even/odd outputs, this can obviously not be a correct model, thus learning is completely instable.

You either need better features, or a more complex model with can take care of the more complicated space.

like image 45
phipsgabler Avatar answered Dec 16 '25 15:12

phipsgabler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!