I want to create a pandas dataframe df like:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
"value": [10, 20, 16, 31, 56],
}
)
I want to specify column "date" as dtype=datetime64[ns] while creating the dataframe, not afterwards.
So not like this:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
"value": [10, 20, 16, 31, 56],
}
)
df["date"] = pd.to_datetime(df["date"])
But like this:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": pd.Series(
["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
dtype=np.datetime64,
),
"value": [10, 20, 16, 31, 56],
}
)
but that gives the error:
ValueError: The 'datetime64' dtype has no unit. Please pass in 'datetime64[ns]' instead.
Doing that
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": pd.Series(
["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
dtype=np.datetime64[ns],
),
"value": [10, 20, 16, 31, 56],
}
)
I get this error:
NameError: name 'ns' is not defined
So how can I set a specific column as type "datetime64[ns]"?
You can use pd.to_datetime
within the dict, as follows:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": pd.to_datetime(["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"]),
"value": [10, 20, 16, 31, 56],
}
)
date
column is in datetime64[ns]
format, as you can see by df.info()
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 group 5 non-null object
1 date 5 non-null datetime64[ns]
2 value 5 non-null int64
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 248.0+ bytes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With