In my data.table, I use tstrsplit to split the ValueId column, with the keep= parameter. But in this case, I do not know the value to put in the keep, and I would like to use the value from the Level column.
All my attempts are failures. Is it possible ? Maybe not in data.table ?
Here is a reprex :
library(data.table)
foo <- data.table(Level = c(2,2,3,4,3),
ValueId = c("11983:1055521", "11983:1055521-5168:290668-198:100798", "11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771-5162:290728-5166:290620",
"11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771", " 11983:1055521-5168:290676-198:100794-92:91781-139:95090-135:95353"))
foo[, newvar := tstrsplit(ValueId, "-", fixed = TRUE, keep = 4)]
foo[, newvar := tstrsplit(ValueId, "-", fixed = TRUE, keep = Level)]
Thanks !!
You can use mapply with [ to extract the substring retuned by strsplit with the position given in foo$Level.
mapply(`[`, strsplit(foo$ValueId, "-", fixed = TRUE), foo$Level)
#[1] NA "5168:290668" "198:100798" "92:91604" "198:100794"
There are a couple problems. One of them is in the tstrsplit function itself which is defined as:
function (x, ..., fill = NA, type.convert = FALSE, keep, names = FALSE)
{
if (!isTRUEorFALSE(names) && !is.character(names))
stop("'names' must be TRUE/FALSE or a character vector.")
ans = transpose(strsplit(as.character(x), ...), fill = fill,
ignore.empty = FALSE)
if (!missing(keep)) {
keep = suppressWarnings(as.integer(keep))
chk = min(keep) >= min(1L, length(ans)) & max(keep) <=
length(ans)
if (!isTRUE(chk))
stop("'keep' should contain integer values between ",
min(1L, length(ans)), " and ", length(ans),
".")
ans = ans[keep]
}
if (type.convert)
ans = lapply(ans, type.convert, as.is = TRUE)
if (isFALSE(names))
return(ans)
else if (isTRUE(names))
names = paste0("V", seq_along(ans))
if (length(names) != length(ans)) {
str = if (missing(keep))
"ans"
else "keep"
stop("length(names) (= ", length(names), ") is not equal to length(",
str, ") (= ", length(ans), ").")
}
setattr(ans, "names", names)
ans
}
<bytecode: 0x0000019bffd6da98>
<environment: namespace:data.table>
The important thing to note is that if block that checks that your keep is the appropriate size for the return. In your example you have the first row that returns NA. The reason this works in your hard coded example is that strsplit is vectorized so the NA row is run at the same time as the rows that work so this if block doesn't get triggered. You can try this out by changing that 4 to 40 and you'll get this message Error in tstrsplit(ValueId, "-", fixed = TRUE, keep = 40) : 'keep' should contain integer values between 1 and 9. because in that case nothing worked.
So what you need to do is redefine the tstrsplit function so that it'll return NA instead of stopping
tstrsplitNA<-function (x, ..., fill = NA, type.convert = FALSE, keep)
{
ans = transpose(strsplit(as.character(x), ...), fill = fill,
ignore.empty = FALSE)
if (!missing(keep)) {
keep = suppressWarnings(as.integer(keep))
chk = min(keep) >= min(1L, length(ans)) & max(keep) <=
length(ans)
if (!isTRUE(chk))
ans<-NA_character_
ans = ans[keep]
}
if (type.convert)
ans = lapply(ans, type.convert, as.is = TRUE)
return(ans)
ans
}
That isn't enough though because strsplit is vectorized so doing foo[, newvar := tstrsplitNA(ValueId, split="-", fixed = TRUE, keep = Level)] isn't just running that function row by row but rather feeding the entirety of your ValueId column to strsplit and then transposing it which returns gibberish relative to what you want.
You can tell data.table to do the operation row by row simply by using the by argument with Level and ValueId
foo[, newvar := tstrsplitNA(ValueId, split="-", fixed = TRUE, keep = Level), by=c('Level','ValueId')]
foo
Level ValueId newvar
1: 2 11983:1055521 NA
2: 2 11983:1055521-5168:290668-198:100798 5168:290668
3: 3 11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771-5162:290728-5166:290620 198:100798
4: 4 11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771 92:91604
5: 3 11983:1055521-5168:290676-198:100794-92:91781-139:95090-135:95353 198:100794
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With