I have the following data:
temp<-c("AIR BAGS:FRONTAL" ,"SERVICE BRAKES HYDRAULIC:ANTILOCK",
    "PARKING BRAKE:CONVENTIONAL",
    "SEATS:FRONT ASSEMBLY:POWER ADJUST",
    "POWER TRAIN:AUTOMATIC TRANSMISSION",
    "SUSPENSION",
    "ENGINE AND ENGINE COOLING:ENGINE",
    "SERVICE BRAKES HYDRAULIC:ANTILOCK",
    "SUSPENSION:FRONT",
    "ENGINE AND ENGINE COOLING:ENGINE",
    "VISIBILITY:WINDSHIELD WIPER/WASHER:LINKAGES")
I would like to create a new vector that retains only the text before the first ":" in the cases where a ":" is present, and the whole word when ":" is not present.
I have tried to use:
temp=data.frame(matrix(unlist(str_split(temp,pattern=":",n=2)), 
+                        ncol=2, byrow=TRUE))
but it does not work in the cases where there is no ":"
I know this question is very similar to: truncate string from a certain character in R, which used:
sub("^[^.]*", "", x)
But I am not very familiar with regular expressions and have struggled to reverse that example to retain only the beginning of the string.
You can solve this with a simple regex:
sub("(.*?):.*", "\\1", x)
 [1] "AIR BAGS"                  "SERVICE BRAKES HYDRAULIC"  "PARKING BRAKE"             "SEATS"                    
 [5] "POWER TRAIN"               "SUSPENSION"                "ENGINE AND ENGINE COOLING" "SERVICE BRAKES HYDRAULIC" 
 [9] "SUSPENSION"                "ENGINE AND ENGINE COOLING" "VISIBILITY"     
How the regex works:
"(.*?):.*" Look for a repeated set of any characters .* but modify it with ? to not be greedy. This should be followed by a colon and then any character (repeated)"\\1"
The bit to understand is that any regex match is greedy by default. By modifying it to be non-greedy, the first pattern match can not include the colon, since the first character after the parentheses is a colon. The regex after the colon is back to the default, i.e. greedy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With