Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy Correlation Error for Python

I am trying to show correlation between two individual lists. Before installing Numpy, I parsed World Bank data for GDP values and the number of internet users and stored them in two separate lists. Here is the snippet of code. This is just for gdp07. I actually have more lists for more years and other data such as unemployment.

import numpy as np

file = open('final_gdpnum.txt', 'r')
gdp07 = []
for line in file:
    fields = line.strip().split()
    gdp07.append(fields [0])    

file2 = open('internetnum.txt', 'r')
netnum07 = []
for line in file2:
    fields2 = line.strip().split()
    nnetnum07.append(fields2 [0])

print np.correlate(gdp07,netnum07,"full")

The error I get is this:

Traceback (most recent call last):
  File "Project3,py", line 83, in ,module.
    print np.correlate(gdp07, netnum07, "full")
  File "/usr/lib/python2.6/site-packages/numpy/core/numeric.py", line 645, in correlate
    return multiarray.correlate2(a,v,mode))
ValueError: data type must provide an itemsize

Just for the record, I am using Cygwin with Python 2.6 on a Windows computer. I am only using Numpy along with its dependencies and other parts of its build (gcc compiler). Any help would be great. Thx

like image 669
Nopiforyou Avatar asked May 09 '26 02:05

Nopiforyou


1 Answers

Perhaps that is the error when you try to input data as string, since according to python docs strip() return a string

http://docs.python.org/library/stdtypes.html

Try parsing the data to whatever type you want

As you can see here

In [14]:np.correlate(["3", "2","1"], [0, 1, 0.5])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/dog/<ipython-input-14-a0b588b9af44> in <module>()
----> 1 np.correlate(["3", "2","1"], [0, 1, 0.5])

/usr/lib64/python2.7/site-packages/numpy/core/numeric.pyc in correlate(a, v, mode, old_behavior)
    643         return multiarray.correlate(a,v,mode)
    644     else:
--> 645         return multiarray.correlate2(a,v,mode)
    646 
    647 def convolve(a,v,mode='full'):

ValueError: data type must provide an itemsize

try parsing the values

In [15]: np.correlate([int("3"), int("2"),int("1")], [0, 1, 0.5])
Out[15]: array([ 2.5])



import numpy as np

file = open('final_gdpnum.txt', 'r')
gdp07 = []
for line in file:
    fields = line.strip().split()
    gdp07.append(int(fields [0]))    

file2 = open('internetnum.txt', 'r')
netnum07 = []
for line in file2:
    fields2 = line.strip().split()
    nnetnum07.append(int(fields2 [0]))

print np.correlate(gdp07,netnum07,"full")

your other error is a character ending problem i hope this works, since I dont think I can reproduce it since I have a linux box that supports utf-8 by default. I went by ipython help(codecs) documentation http://code.google.com/edu/languages/google-python-class/dict-files.html

import codecs

f =  codecs.open(file, "r", codecs.BOM_UTF8)
for line in f:
    fields = line.strip().split()
    gdp07.append(int(fields [0]))
like image 147
user1462442 Avatar answered May 12 '26 05:05

user1462442



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!