I am experiencing difficulty with the perl expression \\L\\1 in very particular circumstances on R-dev (2017-06-06 and 2017-06-16 r72796 builds):
bib <- readLines("https://raw.githubusercontent.com/HughParsonage/TeXCheckR/master/tests/testthat/lint_bib_in.bib", encoding = "UTF-8")
leading_spaces <- 2
is_field <- grepl("=", bib, fixed = TRUE)
field_width <- nchar(trimws(gsub("[=].*$", "", bib, perl = TRUE)))
widest_field <- max(field_width[is_field])
out <- bib
# Vectorized gsub:
for (line in seq_along(bib)){
# Replace every field line with
# two spaces + field name + spaces required for widest field + space
if (is_field[line]){
spaces_req <- widest_field - field_width[line]
out[line] <-
gsub("^\\s*(\\w+)\\s*[=]\\s*\\{",
paste0(paste0(rep(" ", leading_spaces), collapse = ""),
"\\L\\1",
paste0(rep(" ", spaces_req), collapse = ""),
" = {"),
bib[line],
perl = TRUE)
}
}
# Add commas:
out[is_field] <- gsub("\\}$", "\\},", out[is_field], perl = TRUE)
out[9]
#> R-dev " author"
#> R 3.4.0 " author = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter},"
To reproduce, it is necessary:
readLines from a file, and specify the encoding. (Using dput won't reproduce)\\L or \\U in the perl regex.Is this a change in R 3.5.0, or have I been misusing \\L in this instance?
UPDATE
The patch correcting this behaviour was applied in r74274.
ORIGINAL ANSWER
There is clearly some unexpected behavior.
When referring to \1, it works outputting:
[1] " author = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter},"
However, whenever a \U or \L is used with \1,the second backreference gets removed.
"\\U\\1": [1] " AUTHOR"
"\\U\\1\\E\\2": [1] " AUTHOR"
A gsubfn solution still works (here, an example with toupper()):
library(gsubfn)
bib <- readLines("https://raw.githubusercontent.com/HughParsonage/TeXCheckR/master/tests/testthat/lint_bib_in.bib", encoding = "UTF-8")
leading_spaces <- 2
is_field <- grepl("=", bib, fixed = TRUE)
field_width <- nchar(trimws(gsub("[=].*$", "", bib, perl = TRUE)))
widest_field <- max(field_width[is_field])
out <- bib
# Vectorized gsub:
for (line in seq_along(bib)){
# Replace every field line with
# two spaces + field name + spaces required for widest field + space
if (is_field[line]){
spaces_req <- widest_field - field_width[line]
out[line] <-
gsubfn("^\\s*(\\w+)\\s*=\\s*\\{",
function(y) paste0(
paste0(rep(" ", leading_spaces), collapse = ""),
toupper(y),
paste0(rep(" ", spaces_req), collapse = ""),
" = {"
),
bib[line], engine="R"
)
}
}
# Add commas:
out[is_field] <- gsub("\\}$", "},", out[is_field], perl = TRUE)
out[9]
Output:
[1] " AUTHOR = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter},"
My sessionInfo details:
> sessionInfo()
R Under development (unstable) (2017-06-19 r72808)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gsubfn_0.6-6 proto_1.0.0
loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0 tcltk_3.5.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With