Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove specific words in a column

I have a Column consisting of several Country Offices associated a with a company, where I would like to shorten fx: China Country Office and Bangladesh Country Office, to just China or Bangladesh- In other words removing the words "Office" and "Country" from the column called Imp_Office.

I tried using the tm-package, with reference to an earlier post, but nothing happened.

what I wrote:

library(tm)
stopwords = c("Office", "Country","Regional")
MY_df$Imp_Office <- gsub(paste0(stopwords, collapse = "|","", 
MY_df$Imp_Office))

Where I got the following error message:

  Error in gsub(paste0(stopwords, collapse = "|", "", MY_df$Imp_Office)) 
    : 
      argument "x" is missing, with no default

I also tried using the function readLines:

stopwords = readLines("Office", "Country","Regional")
MY_df$Imp_Office <- gsub(paste0(stopwords, collapse = "|","", 
MY_df$Imp_Office))

But this didn't help either

I have considered the possibility of using some other string manipulation method, but I don't need to detect, replace or remove whitespace - so I am kind of lost here.

Thank you.

like image 917
BloopFloopy Avatar asked Sep 05 '25 16:09

BloopFloopy


1 Answers

First, let's set up a dataframe with a column like what you describe:

library(tidyverse)


df <- data_frame(Imp_Office = c("China Country Office",
                                "Bangladesh Country Office",
                                "China",
                                "Bangladesh"))
df
#> # A tibble: 4 x 1
#>   Imp_Office               
#>   <chr>                    
#> 1 China Country Office     
#> 2 Bangladesh Country Office
#> 3 China                    
#> 4 Bangladesh

Then we can use str_remove_all() from the stringr package to remove any bits of text that you don't want from them.

df %>%
    mutate(Imp_Office = str_remove_all(Imp_Office, " Country| Office"))
#> # A tibble: 4 x 1
#>   Imp_Office
#>   <chr>     
#> 1 China     
#> 2 Bangladesh
#> 3 China     
#> 4 Bangladesh

Created on 2018-04-24 by the reprex package (v0.2.0).

like image 164
Julia Silge Avatar answered Sep 08 '25 11:09

Julia Silge