I am new to Python and pandas. I have a dataset that has the following structures. It is a pandas DF
city time1 time2
a [1991, 1992, 1993] [1993,1994,1995]
time1 and time2 represnts the coverage of the data in two sources. I would like create a new column that indicates whether time1 and time2 have any intersection, if so return True otherwise False. The task sound very straightforward. I was thinking about using set operations on the two columns but it did not work as expected. Would anyone help me figure this out?
Thanks!
I appreciate your help.
You can iterate through all the columns and change the lists to sets and see if there is are any values in the intersection.
df1 = df.applymap(lambda x: set(x) if type(x) == list else set([x]))
df1.apply(lambda x: bool(x.time1 & x.time2), axis=1)
This is a semi-vectorized way that should make it run much faster.
df1 = df[['time1', 'time2']].applymap(lambda x: set(x) if type(x) == list else set([x]))
(df1.time1.values & df1.time2.values).astype(bool)
And even a bit faster
change_to_set = lambda x: set(x) if type(x) == list else set([x])
time1_set = df.time1.map(change_to_set).values
time2_set = df.time2.map(change_to_set).values
(time1_set & time2_set).astype(bool)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With