I have a list in Excel, a subset of which looks like this:
Food and Human Nutrition
Food and Human Nutrition with Placement
Food and Nutrition with Professional Experience
Food Marketing and Nutrition
Food Marketing and Nutrition with Placement
Food, Nutrition and Health
I'd like to find the n
most common words within this list. I tried with this formula to find the most common word:
=INDEX(rng,MODE(MATCH(rng,rng,0)))
The issue with this is that it views each cell as a single string, and as each of the 6 rows is different does not find a most common word. What I'd like it to do is output 'Food', 'Nutrition' and 'and' as the most common words, followed by 'Marketing', 'Placement', 'with' etc.
Here is a VBA macro that provides what you seem to want.
Read the comments closely in the code for assumptions that need to be made. And reference that needs to be set
Also, note that punctuation can cause the same word to be counted in different categories. If this might be a problem, we merely need to split the source data differently, either eliminating all punctuation before splitting on spaces, or by using Regular Expressions to do the split.
'Set Reference to Microsoft Scripting Runtime
Option Explicit
Sub UniqueWordCounts()
Dim wsSrc As Worksheet, wsRes As Worksheet
Dim rSrc As Range, rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim vWords As Variant
Dim dWords As Dictionary
Dim I As Long, J As Long
Dim V As Variant, vKey As Variant
'Assume source data is in column 1, starting at A1
' Could easily be anyplace
Set wsSrc = Worksheets("sheet2")
With wsSrc
Set rSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
'Results to go a few columns over
Set wsRes = Worksheets("sheet2")
Set rRes = rSrc(1, 1).Offset(0, 2)
'Read source data into vba array (for processing speed)
vSrc = rSrc
'Collect individual words and counts into dictionary
Set dWords = New Dictionary
dWords.CompareMode = TextCompare
For I = 1 To UBound(vSrc, 1)
'Split the sentence into individual words
For Each vKey In Split(vSrc(I, 1))
If Not dWords.Exists(vKey) Then
dWords.Add Key:=vKey, Item:=1
Else
dWords(vKey) = dWords(vKey) + 1
End If
Next vKey
Next I
'Size results array
ReDim vRes(0 To dWords.Count, 1 To 2)
'Column headers
vRes(0, 1) = "Word"
vRes(0, 2) = "Count"
'Populate the columns
I = 0
For Each V In dWords.Keys
I = I + 1
vRes(I, 1) = V
vRes(I, 2) = dWords(V)
Next V
'Size results range
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
'Populate, format and sort the Results range
With rRes
.EntireColumn.Clear
.Value = vRes
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
.EntireColumn.AutoFit
.Sort key1:=.Columns(2), order1:=xlDescending, key2:=.Columns(1), order2:=xlAscending, MatchCase:=False, Header:=xlYes
End With
End Sub
If you know & want to use VBA, then it would be quite an interestring task. Thus, some custom formula like this =MostCommonWords(Range;Optional WordsNumber)
would give you this result:
This is the code, behind the formula:
Public Function MostCommonWords(inputRange As Range, _
Optional NumberOfWords As Long = 1) As String
Dim myCell As Range
Dim inputString As String, tempString As String, myResult As String
Dim myArr As Variant, myKey As Variant
Dim cnt As Long, topNumber As Long
Dim myColl As Object
Set myColl = CreateObject("Scripting.Dictionary")
For Each myCell In inputRange
tempString = LCase(Replace(myCell, ",", ""))
inputString = inputString & " " & tempString
Next myCell
myArr = Split(inputString)
For cnt = LBound(myArr) To UBound(myArr)
If myColl.exists(myArr(cnt)) Then
myColl(myArr(cnt)) = myColl(myArr(cnt)) + 1
Else
myColl.Add myArr(cnt), 1
End If
Next cnt
For cnt = 1 To NumberOfWords
topNumber = 0
myResult = vbNullString
For Each myKey In myColl
If topNumber < myColl(myKey) Then
topNumber = myColl(myKey)
myResult = myKey
End If
Next myKey
MostCommonWords = MostCommonWords & " " & myResult
myColl.Remove myResult
Next cnt
End Function
How does it work?
inputString
. myColl.Remove myResult
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With