I have read in an excel sheet as a dataframe that contains various numbers and characters in each row/col (with some NAs). For each row, I want to count how many occurrences of "g" there are, for example. My problem is that some cells contain something like, "g#" or "g a" or "1g", and, thus, are not being included in the count. I want to count EVERY occurrence of g, regardless of what is in the cell with it, and then add this count as a new variable to the current dataframe.
I have tried messing around with the following code, all which work for counting each occurrence of EXACTLY "g" but not simply every occurrence of "g".
My hunch is that I am looking for a regular expression to place in any of the following codes. (I searched for a few hours with no avail.) I also tried functions from the stringr package, such as str_count, but these seem to be only applicable to vectors.
oneelecsheet$countg <- rowSums(oneelecsheet == "g", na.rm = TRUE)
library(expss)
oneelecsheet$countg <- count_row_if("g", oneelecsheet)
oneelecsheet$countg <- apply(oneelecsheet, 1, function(x) length(which(x=="g")))
library(dplyr)
oneelecsheet$countg <- apply(oneelecsheet, 1, function(x) sum(x %in% "g"))
If there are multiple occurrences of "g" in a cell how would you want to count it? For example, if there is a word called "ageeg" would it be given a count of 1 or 2? Based on the answer to that question you can use any of the following.
1) If only one "g" has to be counted per cell
df$gcount <- colSums(apply(df, 1, grepl, pattern = "g"))
df
# a b gcount
#1 abcg#g good 2
#2 gg bad 1
#3 g@ ugly 2
#4 abcdg ageeg 2
If we want to avoid apply we can use
rowSums(sapply(df, grepl, pattern = "g"))
Or (thanks to @thelatemail)
Reduce(`+`, lapply(df, grepl, pattern ="g"))
2) If every "g" has to be counted separately
df$gcount <- colSums(apply(df, 1, stringr::str_count, "g"))
df
# a b gcount
#1 abcg#g good 3
#2 gg bad 2
#3 g@ ugly 2
#4 abcdg ageeg 3
We can use the non-apply versions here too
rowSums(sapply(df, stringr::str_count, "g"))
Or
Reduce(`+`, lapply(df, stringr::str_count, "g"))
data
df <- data.frame(a = c("abcg#g", "gg", "g@", "abcdg"),
b = c("good", "bad", "ugly", "ageeg"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With