Use sklearn's FunctionTransformer with string data?

Question

I'm using sklearn's FunctionTransformer to preprocess some of my data, which are date strings such as "2015-01-01 11:09:15".

My customized function takes a string as input, but I found out that FunctionTransformer cannot deal with strings as in the source code it didn't implement fit_transform. Therefore, the call got routed to parent class as:

     57     def fit(self, X, y=None):
     58         if self.validate:
---> 59             check_array(X, self.accept_sparse)
     60         return self

The check_array seems only working with numeric ndarrays. Now of course I can do everything in the pandas domain, but I wonder if there's a better way of dealing with this in sklearn - esp. given that I would possibly use a pipeline in the future?

Thanks!

Marcus V. · Accepted Answer

Seems as if the validate parameter is what you are looking for: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html

Here an example, where it may make sense to leave it as a string over converting to float as mentioned in the comment. Let's say you want to add time zone info to your date string:

import pandas as pd

def add_TZ(df):
    df['date'] = df['date'].astype(str) + "Z"

data = {  'date' : ["2015-01-01 11:00:00", "2015-01-01 11:15:00", "2015-01-01 11:30:00"],
        'value' : [4., 3., 2.]}

df = pd.DataFrame(data)

This will fail as you noted due to the check:

ft = FunctionTransformer(func=add_TZ)
ft.fit_transform(df)

Output:

ValueError: could not convert string to float: '2015-01-01 11:30:00'

This works:

ft = FunctionTransformer(func=add_TZ, validate=False)
ft.fit_transform(df)

Output:

    date                    value
0   2015-01-01 11:00:00Z    4.0
1   2015-01-01 11:15:00Z    3.0
2   2015-01-01 11:30:00Z    2.0

Use sklearn's FunctionTransformer with string data?

Tags:

python

pandas

machine-learning

scikit-learn

peidaqi

1 Answers

Marcus V.

Recent Activity

Donate For Us

Use sklearn's FunctionTransformer with string data?

Tags:

python

pandas

machine-learning

scikit-learn

peidaqi

1 Answers

Marcus V.

Related questions

Recent Activity

Donate For Us