I have a dataframe df_1 with a column year which denotes when the year a crime occurred. So for example, df_1 is something like this:
location description Weapon Year
0 Howard Ave Auto theft Knife 2017
1 Craig Drive Burglary Gun 2014
2 King Ave Assault Hands 2017
I need to create a dataframe that has the number of crime occurrences by year from 2012-2017.
crime_year = pd.DataFrame(df_1.year.value_counts(), columns=["Year", "AggregateCrime"])
crime_yearindex = crime_year.sort_index(axis = 0, ascending=True)
crime_yearindex
When I print crime_yearindex, I just get the column headers and not the data itself. What may I be doing wrong?
When you are doing value_counts, it will return a series, so I am adding .reset_index().values after value_counts, to make index also become the value
crime_year = pd.DataFrame(df.Year.value_counts().reset_index().values, columns=["Year", "AggregateCrime"])
crime_yearindex = crime_year.sort_index(axis = 0, ascending=True)
crime_yearindex
Out[1225]:
Year AggregateCrime
0 2017 2
1 2014 1
You could use the .groupby() function to get yearly counts of crime occurances.
So in this case df_1.groupby(by="Year").count() would get you the crime count of every year
After that you could use .loc to select specific years
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With