I am doing an assignment for machine learning class in python. I started learning python just yesterday so I am not aware of practices used in python.
Part of my task is to load data from csv (2D array) lets call it arr_2d and normalize that.
I've found sklearn and numpy solutions online but they expect 2D array as input.
My approach after loading arr_2d is to parse them into array of objects (data: [HealthRecord]).
My solution was a code similar to this (note: kinda pseudocode)
result = [] # 2D array of property values
for key in ['age','height','weight',...]:
tmp = list(map(lambda item: getattr(key, item), data))
result.append(tmp)
Result now contains 3 * data.length items and I would use sklearn to normalize single row in my result array, then rotate it back and parse normalized to HealthRecord.
I see this as overcomplicated and what I would like to see an option to do it any easier way, like sending [HealthRecord] to sklearn.normalize
Code below shows my (simplified) loading and parsing:
class Person:
age: int
height: int
weight: int
def arr_2_obj(data: [[]]) -> Person:
person = Person()
person.age = data[0]
person.height = data[1]
person.weight = data[2]
return person
# age (days), height (cm), weight (kg)
input = [
[60*365, 125, 65],
[30*365, 195, 125],
[13*365, 116, 53],
[16*365, 164, 84],
[12*365, 125, 96],
[10*365, 90, 46],
]
parsed = []
for row in input:
parsed.append(arr_2_obj(row))
note: Person class is HealthRecord
Thank you for any input or insights.
Edit: typo sci-learn -> sklearn
You can't. In practice, you're dealing with tabular data. The standard (as in most popular, not standard library) package in python to process tabular data is pandas, so you can do something like:
import pandas as pd
df = pd.DataFrame([d.__dict__ for d in data])
normalized_df = (df-df.mean())/df.std() # example normalization
If you insist on dealing with arrays of objects instead of tables, you can write a class which does the required conversions to shorten notations, e.g. something like
class ObjectList:
def __init__(self, object_type, records):
self.objects = [object_type(**record) for record in records]
def to_data_frame(self):
return pd.DataFrame([d.__dict__ for d in self.objects])
class PersonList(ObjectList):
def __init__(self, records):
super().__init__(Person, records)
The above assumes class Person has an __init__ function accepting arguments height, age, weight.
You can also try to shorten notations further by overloading operators, but unless you're writing library code I don't see why you would want to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With