I am trying to implement linear regression using python.
I did the following steps:
import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1
Then I try to obtain the co-efficients, and use the following:
regression_coeff = n.polyfit(x,y,1)
And then I get the following error:
raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x
I am unable to get my head around this, as when I print x
and y
, I can very clearly see that they are both 1D vectors.
Can someone please help?
Dataset can be found here: DataSets
The original code is:
import pandas as p
import numpy as n
data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])
x = data
y = data1
regression = n.polyfit(x, y, 1)
This should work:
np.polyfit(data.values.flatten(), data1.values.flatten(), 1)
data
is a dataframe and its values are 2D:
>>> data.values.shape
(546, 1)
flatten()
turns it into 1D array:
>> data.values.flatten().shape
(546,)
which is needed for polyfit()
.
Simpler alternative:
df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)
pandas.read_csv()
returns a DataFrame
, which has two dimensions while np.polyfit
wants a 1D vector
for both x
and y
for a single fit. You can simply convert the output of read_csv()
to a pd.Series
to match the np.polyfit()
input format using .squeeze()
:
data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With