I have a Column consisting of several Country Offices associated a with a company, where I would like to shorten fx: China Country Office and Bangladesh Country Office, to just China or Bangladesh- In other words removing the words "Office" and "Country" from the column called Imp_Office.
I tried using the tm-package, with reference to an earlier post, but nothing happened.
what I wrote:
library(tm)
stopwords = c("Office", "Country","Regional")
MY_df$Imp_Office <- gsub(paste0(stopwords, collapse = "|","",
MY_df$Imp_Office))
Where I got the following error message:
Error in gsub(paste0(stopwords, collapse = "|", "", MY_df$Imp_Office))
:
argument "x" is missing, with no default
I also tried using the function readLines:
stopwords = readLines("Office", "Country","Regional")
MY_df$Imp_Office <- gsub(paste0(stopwords, collapse = "|","",
MY_df$Imp_Office))
But this didn't help either
I have considered the possibility of using some other string manipulation method, but I don't need to detect, replace or remove whitespace - so I am kind of lost here.
Thank you.
First, let's set up a dataframe with a column like what you describe:
library(tidyverse)
df <- data_frame(Imp_Office = c("China Country Office",
"Bangladesh Country Office",
"China",
"Bangladesh"))
df
#> # A tibble: 4 x 1
#> Imp_Office
#> <chr>
#> 1 China Country Office
#> 2 Bangladesh Country Office
#> 3 China
#> 4 Bangladesh
Then we can use str_remove_all()
from the stringr package to remove any bits of text that you don't want from them.
df %>%
mutate(Imp_Office = str_remove_all(Imp_Office, " Country| Office"))
#> # A tibble: 4 x 1
#> Imp_Office
#> <chr>
#> 1 China
#> 2 Bangladesh
#> 3 China
#> 4 Bangladesh
Created on 2018-04-24 by the reprex package (v0.2.0).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With