Why does stringr::str_match on a column return a matrix?

Question

I'm using tidyverse to load the data, so I have a tibble which you can reproduce like:

df_1 <- tibble(id = c(1, 2, 3), subject_id = c("ABCD-FOO1-G001-YX-732E5", "ABCD-FOO2-A011-ZA-892N2", "ABCD-FOO3-1001-CD-742W5"))

Now I want to modify subject_id to extract just the two first character groups, i.e:

"ABCD-FOO1-G001-YX-732E5" -> "ABCD-FOO1"

When I'm running the following code:

df_1 %>% mutate(subject_id = stringr::str_match(subject_id, "[^-]*-[^-]*"))

each element of the subject_id column is a tibble itself:

> class(df_1[1, "subject_id"])
[1] "tbl_df"     "tbl"        "data.frame"

How do I make sure subject_id is a character vector instead of tibble?

loki · Accepted Answer

Here a take on the how to avoid this rather than the why.

As we learn from ?str_match:

For str_match, a character matrix. First column is the complete match, followed by one column for each capture group. [...]

So we need to pull the first column from the matrix:

df_1 %>% mutate(subject_id = stringr::str_match(subject_id, "[^-]*-[^-]*") %>% .[,1])
# # A tibble: 3 x 2
#      id subject_id
#   <dbl> <chr>     
# 1     1 ABCD-FOO1 
# 2     2 ABCD-FOO2 
# 3     3 ABCD-FOO3

Also keep in mind, that in your example of class(), you subset a tibble. A tibble will always stay a tibble even if it has only 1 cell. See for comparison class(df_2[1,"id"]). For more on that have a look at this chapter from R for Data Science.

Why does stringr::str_match on a column return a matrix?

Tags:

r

stringr

tibble

1 Answers

loki

Recent Activity

Donate For Us

Why does stringr::str_match on a column return a matrix?

Tags:

r

stringr

tibble

1 Answers

loki

Related questions

Recent Activity

Donate For Us