Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use sklearn's FunctionTransformer with string data?

I'm using sklearn's FunctionTransformer to preprocess some of my data, which are date strings such as "2015-01-01 11:09:15".

My customized function takes a string as input, but I found out that FunctionTransformer cannot deal with strings as in the source code it didn't implement fit_transform. Therefore, the call got routed to parent class as:

     57     def fit(self, X, y=None):
     58         if self.validate:
---> 59             check_array(X, self.accept_sparse)
     60         return self

The check_array seems only working with numeric ndarrays. Now of course I can do everything in the pandas domain, but I wonder if there's a better way of dealing with this in sklearn - esp. given that I would possibly use a pipeline in the future?

Thanks!

like image 966
peidaqi Avatar asked Oct 22 '25 17:10

peidaqi


1 Answers

Seems as if the validate parameter is what you are looking for: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html

Here an example, where it may make sense to leave it as a string over converting to float as mentioned in the comment. Let's say you want to add time zone info to your date string:

import pandas as pd

def add_TZ(df):
    df['date'] = df['date'].astype(str) + "Z"

data = {  'date' : ["2015-01-01 11:00:00", "2015-01-01 11:15:00", "2015-01-01 11:30:00"],
        'value' : [4., 3., 2.]}

df = pd.DataFrame(data)

This will fail as you noted due to the check:

ft = FunctionTransformer(func=add_TZ)
ft.fit_transform(df)

Output:

ValueError: could not convert string to float: '2015-01-01 11:30:00'

This works:

ft = FunctionTransformer(func=add_TZ, validate=False)
ft.fit_transform(df)

Output:

    date                    value
0   2015-01-01 11:00:00Z    4.0
1   2015-01-01 11:15:00Z    3.0
2   2015-01-01 11:30:00Z    2.0
like image 177
Marcus V. Avatar answered Oct 25 '25 08:10

Marcus V.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!