Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way in Excel to count the occurrences of specific text within a string but also include previous character?

I have a spreadsheet containing 98 rows of different strings that are sequences of male/female visits recorded using only "M" and "F". I am trying to find the frequency of male followed by male visits in these sequences which would appear in the strings as "MM". However, using the methods I know of in Excel, it currently misses some occurrences because after it has found matching characters it then ignores those and moves onto the next, whereas I need it to always consider the last character. E.g, MMM would first be a male followed by a male, but is again then a male followed by another male. For example, one cell includes "MMMFMMMMM", using the methods I know of in Excel it counts it only as 3 occurrences because once it finds "MM" it then moves onto the rest of the string. But in my scenario this would actually be 6 occurrences of a male followed by a male, as for the sequence I always consider the last visitor. Is there any way to automate this using a formula or even VBA or macro etc in Excel? Please see the example below of some records from my A and B columns that I currently have and the C column that I would like to add for the frequency of MM visits in each sequence:

Day Sequence MM Count
1 MFFFM 0
2 FMMM 2
3 FF 0
4 MMMMM 4
5 M 0
6 MFMMMFF 2

So far I have tried:

For all cells: =SUM(LEN(B2:B7)-LEN(SUBSTITUTE(B2:B7,"MM","")))/LEN("MM") =SUMPRODUCT(--(ISNUMBER(FIND("MM", B2:B7)))) =COUNTIF(B2:B7, "*MM*")

For each cell: =LEN(B2)-LEN(SUBSTITUTE(B2,"MM","")) =MATCH(2,1/(MID(B2,SEQUENCE(LEN(B2)),2)="MM"))

However nearly all of these approaches only count each MM occurrence and then moves onto the next characters. Some of them also only count the cells that contain an occurrence of MM. Is there a method in Excel that can count the occurrences of MM in each cell's string whilst also considering the previous character in the string? Thanks so much for any input!

like image 926
Ellka Avatar asked Sep 06 '25 00:09

Ellka


2 Answers

Updated Answer:

With newer functions come newer answers, try REGEXREPLACE():

=LEN(REGEXREPLACE(B2:B7,"(?!MM).",))

Original Answer:

Nice question. One option could be:

enter image description here

Formula in C2:

=MAP(B2:B7,LAMBDA(s,SUM(LEN(TEXTSPLIT(s&"FM","F",,1))-1)))

Or, if you know the strings are no longer then 99 chars then avoid LAMBDA() and maybe use something like:

=MMULT(N(MID(B2:B7,SEQUENCE(,99),2)="MM"),ROW(1:99)^0)

If you happen to be forced to use legacy functions, maybe:

=MMULT(N(MID(B2,TRANSPOSE(ROW(1:99)),2)="MM"),ROW(1:99)^0)

And confirm through CtrlShiftEnter

like image 134
JvdV Avatar answered Sep 07 '25 23:09

JvdV


Another heavy handed option:

=IFERROR(REDUCE(0,SEQUENCE(LEN(B2)-1,,2),LAMBDA(a,b,IF(AND(MID(B2,b,1)="M",MID(B2,b-1,1)="M"),a+1,a))),0)

enter image description here

like image 36
Scott Craner Avatar answered Sep 07 '25 21:09

Scott Craner