I'm new to Python and I'm having trouble using lists.
I expose my problem, as you can see, I have a datos.csv file with the following structure.
1,4.0,?,?,none,?
2,2.0,3.0,?,none,?
2,2.5,2.5,?,tc,39
Using this function I store the data on a list.
def main():
lista = []
with open('datos.csv','r') as f:
for line in f:
lista.append(line.strip().split(','))
determinar_tipo(lista)
if __name__ == '__main__':
main()
Up to this point, I have no problem. However the problem comes when I have to determine the type of the elements.
Using this code, I can organize my list in columns
for columna in itertools.izip(*lista):
This code allows me to treat the data as columns, as indicated, here is an example of what I receibe from this 'for':
{'1','2','2'} {'4.0','2.0','2.5'} . . .
As you can see is the same data from my csv file but structure by columns.
Ok, here is my problem.
I have to determine the type of each of the columns based on their argument, that is, suppose that the first column {'1', '2', '3'}, check the first element and that element is the type of that column, in this case the type of this column would be int.
Another example with {'4.0', '2', '2.5'}, I check the type of the primary element and determine that it is float.
For the case {'?', '?' , '?'} the type would be "indeterminate".
However, the problem comes with the signs '?', If I get a symbol of these, I have to check the next element of the column, it would be the element called 'word', {'?', 'Word', '5'}, so the type of the column woulb be string.
This is the code I developed to verify it, but I did not finish determining the types correctly.
def determinar_tipo(lista):
b = 0
aux = []
for columna in itertools.izip(*lista):
if columna[0] != "?": #If it's a number or string I save it
aux.append(columna[0])
print columna[0]
else: #If it's '?'
if len(columna) > b:
b = b + 1
if columna[b] != "?":
aux.append(columna[b])
b = 0
else:
b = b + 1
print b
#Correct code
for x in aux:
try:
var_type = type(int(x))
except ValueError:
try:
var_type = type(float(x))
except ValueError:
var_type = type(x)
print var_type
The first part of the code is responsible for storing in another list the element to determine the type for each of the columns, while the second part of the code is responsible for checking the type of each of those elements in the previous list.
In summary, I do not know how to make the 'for' return the correct element to be checked the type of the column correctly.
This is the correct answer for my data:
1 , 4.0 , ? , ? , none , ?
2 , 2.0 , 3.0 , ? , none , ?
2 , 2.5 , 2.5 , ? , tc , 39
int float float undetermined string int**
I changed the fucntions' names so that they make more sense:
def determinar_tipo(valor):
if valor == '?':
return 'undetermined'
try:
int(valor)
except ValueError:
pass
else:
return int
try:
float(valor)
except ValueError:
return str
else:
return float
def determinar_tipos(lista):
aux = []
for columna in itertools.izip(*lista):
i = 0
while i < len(columna) and columna[i] == '?':
i += 1
aux.append(columna[i])
for i, each in enumerate(aux):
aux[i] = determinar_tipo(each)
return aux
I made this approach. The important part is the generator. Let me know if it is helpful:
import itertools
lista =[
['1','4.0','?','?','none','?'],
['2','2.0','3.0','?','none','?'],
['2','2.5','2.5','?','tc','39']
]
def columnType(column):
for val in column:
if val != '?':
try:
float(val)
if '.' in val: yield 'float'
else: yield 'int'
except ValueError:
yield 'string'
for columna in itertools.izip(*lista):
print columna, next(columnType(columna),'undetermined')
Giving as a result:
('1', '2', '2') int
('4.0', '2.0', '2.5') float
('?', '3.0', '2.5') float
('?', '?', '?') undetermined
('none', 'none', 'tc') string
('?', '?', '39') int
EDIT: Alternative function with @MaartenFabré suggestion:
import itertools
lista =[
['1','4.0','?','?','none','?'],
['2','2.0','3.0','?','none','?'],
['2','2.5','2.5','?','tc','39']
]
def columnType(column):
for val in column:
if val != '?':
try:
float(val)
if '.' in val: return 'float'
else: return 'int'
except ValueError:
return 'string'
return 'undetermined'
for columna in itertools.izip(*lista):
print columna, columnType(columna)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With