I would like to make a word frequency distribution, with the words on the x-axis and the frequency count on the y-axis.
I have the following list:
example_list = [('dhr', 17838), ('mw', 13675), ('wel', 5499), ('goed', 5080),
('contact', 4506), ('medicatie', 3797), ('uur', 3792),
('gaan', 3473), ('kwam', 3463), ('kamer', 3447),
('mee', 3278), ('gesprek', 2978)]
I tried to first convert it into a pandas DataFrame and then use the pd.hist() as in the example below, but I just can't figure it out and think it is actually straight forward but probably I'm missing something.
import numpy as np
import matplotlib.pyplot as plt
word = []
frequency = []
for i in range(len(example_list)):
word.append(example_list[i][0])
frequency.append(example_list[i][1])
plt.bar(word, frequency, color='r')
plt.show()
How to count item frequency in a tuple? You can use the Python tuple count() function to count the frequency of an element in a tuple. Pass the element for which you want to count the occurrences as an argument to the function.
Create Histogram In Matplotlib, we use the hist() function to create histograms. The hist() function will use an array of numbers to create a histogram, the array is sent into the function as an argument.
Using pandas:
import pandas as pd
import matplotlib.pyplot as plt
example_list = [('dhr', 17838), ('mw', 13675), ('wel', 5499), ('goed', 5080), ('contact', 4506), ('medicatie', 3797), ('uur', 3792), ('gaan', 3473), ('kwam', 3463), ('kamer', 3447), ('mee', 3278), ('gesprek', 2978)]
df = pd.DataFrame(example_list, columns=['word', 'frequency'])
df.plot(kind='bar', x='word')

You can't pass the words into the matplotlib.pyplot.bar directly. However you could create an indices array for bar and then replace these indices with the words using matplotlib.pyplot.xticks:
import numpy as np
import matplotlib.pyplot as plt
indices = np.arange(len(example_list))
plt.bar(indices, frequency, color='r')
plt.xticks(indices, word, rotation='vertical')
plt.tight_layout()
plt.show()

The for-loop to create word and frequency could also be replaced by a simple zip und list unpacking:
word, frequency = zip(*example_list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With