I have two columns with dates(mm/dd/yy). I need to validate if DateColumn_A comes before DateColumn_B.
I used the following code and I got this error "TypeError: must be string, not Series". Please help a beginner.
Code:
Column_A = datetime.strptime(df['DateColumn_A'], '%m %d %y')
Column_B = datetime.strptime(df['DateColumn_B'], '%m %d %y')
for index, row in dataframe.iterrows():
if row[Column_A] < row[Column_B]
print (index,row[Column_A])
else:
pass
You can compare them like this:
from datetime import datetime
Column_A = datetime.strptime(df['DateColumn_A'], '%m %d %y').date()
Column_B = datetime.strptime(df['DateColumn_B'], '%m %d %y').date()
diff = Column_A - Column_B
if diff > 0:
# Column_A is greater than Column_B
else:
# Column_B is greater than Column_A
Elaborating on my comment above with an example.
First make sure the date columns you are comparing are actually dates. You can do that using the pandas to_datetime
function like so:
>>> df = df.apply(pd.to_datetime, errors='ignore')
>>> df.DateColumnA
0 2018-01-01
1 2018-05-01
Name: DateColumnA, dtype: datetime64[ns]
The below snippet is using boolean indexing. So df['DateColumnA'] < df['DateColumnB']
returns a series of Trues and Falses. And then df.loc[df['DateColumnA'] < df['DateColumnB']]
is akin to saying "Give me the subset of the DataFrame where this condition is True"
>>> df
DateColumnA DateColumnB
0 2018-01-01 2018-02-01
1 2018-05-01 2018-01-01
>>> df['DateColumnA'] < df['DateColumnB']
0 True
1 False
dtype: bool
>>> df.loc[df['DateColumnA'] < df['DateColumnB']]
DateColumnA DateColumnB
0 2018-01-01 2018-02-01
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With