Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I want to count every occurrence of a value/character in each row of a dataframe in R, INCLUDING when it is surrounded by other values/characters

I have read in an excel sheet as a dataframe that contains various numbers and characters in each row/col (with some NAs). For each row, I want to count how many occurrences of "g" there are, for example. My problem is that some cells contain something like, "g#" or "g a" or "1g", and, thus, are not being included in the count. I want to count EVERY occurrence of g, regardless of what is in the cell with it, and then add this count as a new variable to the current dataframe.

I have tried messing around with the following code, all which work for counting each occurrence of EXACTLY "g" but not simply every occurrence of "g".

My hunch is that I am looking for a regular expression to place in any of the following codes. (I searched for a few hours with no avail.) I also tried functions from the stringr package, such as str_count, but these seem to be only applicable to vectors.

oneelecsheet$countg <- rowSums(oneelecsheet == "g", na.rm = TRUE)

library(expss)
oneelecsheet$countg <- count_row_if("g", oneelecsheet)

oneelecsheet$countg <- apply(oneelecsheet, 1, function(x) length(which(x=="g")))

library(dplyr)
oneelecsheet$countg <- apply(oneelecsheet, 1, function(x) sum(x %in% "g"))
like image 828
Kylie Baer Avatar asked Oct 30 '25 10:10

Kylie Baer


1 Answers

If there are multiple occurrences of "g" in a cell how would you want to count it? For example, if there is a word called "ageeg" would it be given a count of 1 or 2? Based on the answer to that question you can use any of the following.

1) If only one "g" has to be counted per cell

df$gcount <- colSums(apply(df, 1, grepl, pattern = "g"))

df
#       a     b gcount
#1 abcg#g  good      2
#2     gg   bad      1
#3     g@  ugly      2
#4  abcdg ageeg      2

If we want to avoid apply we can use

rowSums(sapply(df, grepl, pattern = "g"))

Or (thanks to @thelatemail)

Reduce(`+`, lapply(df, grepl, pattern ="g"))

2) If every "g" has to be counted separately

df$gcount <- colSums(apply(df, 1, stringr::str_count, "g"))

df
#       a     b gcount
#1 abcg#g  good      3
#2     gg   bad      2
#3     g@  ugly      2
#4  abcdg ageeg      3

We can use the non-apply versions here too

rowSums(sapply(df, stringr::str_count, "g"))

Or

Reduce(`+`, lapply(df, stringr::str_count, "g"))

data

df <- data.frame(a = c("abcg#g", "gg", "g@", "abcdg"),
                 b = c("good", "bad", "ugly", "ageeg"))
like image 90
Ronak Shah Avatar answered Nov 02 '25 02:11

Ronak Shah