I am working with a data frame in R in which a column contain gene IDs separated by bars that look like the following:
geneIDs <- c("100/1000/100008586","1277/63923/8516","1133/1132/1956/8516")
#> geneIDs
#> [1] "100/1000/100008586" "1277/63923/8516" "1133/1132/1956/8516"
I need to convert each of the different geneIDs to Gene Symbol based on a data.frame that contains in each row the geneID and its correspondent Gene Symbol, as depicted bellow:
#> head(gene_symbols)
ENTREZID SYMBOL
1 1 A1BG
2 10 NAT2
3 100 ADA
4 1000 CDH2
5 10000 AKT3
6 100008586 GAGE12F
Using the first element from the geneIDs as an example, my expected outcome would look like:
#> geneIDs
#> [1] "ADA/CDH2/GAGE12F"
Thank you very much in advance!
Possible solution:
geneIDs <- c("100/1000/100008586","1277/63923/8516","1133/1132/1956/8516")
lookupTable <- structure(list(ENTREZID = c(1L, 10L, 100L, 1000L, 10000L, 100008586L
), SYMBOL = c("A1BG", "NAT2", "ADA", "CDH2", "AKT3", "GAGE12F"
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
)) %>%
mutate(ENTREZID = as.character(ENTREZID))
as_tibble(x = geneIDs) %>%
mutate(value = strsplit(geneIDs, split = "/")) %>%
unnest_longer(value) %>%
left_join(lookupTable, by = c("value" = "ENTREZID"))
Which gives:
# A tibble: 10 × 2
value SYMBOL
<chr> <chr>
1 100 ADA
2 1000 CDH2
3 100008586 GAGE12F
4 1277 NA
5 63923 NA
6 8516 NA
7 1133 NA
8 1132 NA
9 1956 NA
10 8516 NA
Or to return exactly what you specified:
geneString <- as_tibble(x = geneIDs) %>%
mutate(value = strsplit(geneIDs, split = "/")) %>%
unnest_longer(value) %>%
left_join(lookupTable, by = c("value" = "ENTREZID")) %>%
filter(!is.na(SYMBOL)) %>%
pull(SYMBOL)
paste(geneString, collapse = "/")
"ADA/CDH2/GAGE12F"
You could split the strings at the / and match each to the ENTREZID column to look up the SYMBOL. Replace any non-matches with the original string fragment, and paste the result together, collapsing with "/"
sapply(strsplit(geneIDs, '/'), function(x) {
y <- gene_symbols$SYMBOL[match(x, gene_symbols$ENTREZID)]
y[is.na(y)] <- x[is.na(y)]
paste0(y, collapse = '/')
})
#> [1] "ADA/CDH2/GAGE12F" "1277/63923/8516" "1133/1132/1956/8516"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With