How can I extract the date from a string like "monkey 2010-07-10 love banana"? Thanks!
Extracting Dates from a Text File with the Datefinder Module. The Python datefinder module can locate dates in a body of text. Using the find_dates() method, it's possible to search text data for many different types of dates. Datefinder will return any dates it finds in the form of a datetime object.
Using strptime() , date and time in string format can be converted to datetime type. The first parameter is the string and the second is the date time format specifier. One advantage of converting to date format is one can select the month or date or time individually.
Using python-dateutil:
In [1]: import dateutil.parser as dparser  In [18]: dparser.parse("monkey 2010-07-10 love banana",fuzzy=True) Out[18]: datetime.datetime(2010, 7, 10, 0, 0) Invalid dates raise a ValueError:
In [19]: dparser.parse("monkey 2010-07-32 love banana",fuzzy=True) # ValueError: day is out of range for month It can recognize dates in many formats:
In [20]: dparser.parse("monkey 20/01/1980 love banana",fuzzy=True) Out[20]: datetime.datetime(1980, 1, 20, 0, 0) Note that it makes a guess if the date is ambiguous:
In [23]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True) Out[23]: datetime.datetime(1980, 10, 1, 0, 0) But the way it parses ambiguous dates is customizable:
In [21]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True, dayfirst=True) Out[21]: datetime.datetime(1980, 1, 10, 0, 0) If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:
import re from datetime import datetime  match = re.search(r'\d{4}-\d{2}-\d{2}', text) date = datetime.strptime(match.group(), '%Y-%m-%d').date() Otherwise, if the date is given in an arbitrary form, you can't extract it easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With