Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Substitute double backslash from input with single backslash in Lua

Tags:

escaping

lua

Suppose I have the str variable, to which I assign the value test\\ttest (or it could actually just be \\, for this case). What I want to do, is substitute double backslashes with single backslashes.

The purpose is clear: I want to output the \t escape sequence (horizontal tab), while now it is just output as plain text \t.

It's also clear that I can't use:

str:gsub("\\","\")

Because that would cause a syntax error, being \" recognized as an escape sequence. I tried in all of the ways I could come out with. I also tried using loadstring() (and nested loadstring() calls as well) but it also failed.

Please, don't say to do:

str:gsub("\\t","\t")

Of course, it would work, but it's not what I need. I need to replace a double backslash with a single backslash.

like image 593
noize Avatar asked Nov 19 '25 05:11

noize


1 Answers

I suspect you are being confused by quoting, because string.gsub can replace backslash characters:

C:...> lua
Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> a="test\\\\ttest"
> =a
test\\ttest
> =a:gsub([[\\]],[[\]])
test\ttest       1
>

The backslash is used a as a character escape in double and single-quoted strings, but not in long strings written with the [[...]] notation. In the usual string constant, backslash consumes one or more following characters, and replaces the whole sequence with a single byte in the internal string value. So "\\" is a single byte string containing a single backslash, "\" is a syntax error, and "\"" is a single byte string containing a double quotation mark.

Adding to the confusion is that Lua patterns as understood by string.gsub (and its siblings) use % characters for quoting and for naming special patterns. This is one of the more visible differences between Lua patterns and the regular expressions supported by other languages. To a Lua pattern, a backslash is just an ordinary character.

So when I set the value of a above, I used extra backslashes to get the string value to have two total. I could have written a=[[test\\ttest]] to the same effect. The call to gsub was written with the simple pattern that replaced doubled backslashes with singles. As can be seen, it succeeded and the result is the string test\ttest (along with a count of matches as the second return value).

In short, the substitution you as ask for in the question "just works" as expected.

But reading between the lines, that isn't quite what you wanted. It appears you are trying to convert the string test\\ttest to test<TAB>test. If that single conversion is what you wanted, then just write it as such: a:gsub([[\\t]],"\t"). (Note that I used quotes so that the string literal will interpret the \t to mean an ASCII character in the replacement value.)

The more general case is more difficult, because you not only have to handle the normal single-letter escapes for tab, bell, backspace, carriage return, newline, and so forth, but you also have to handle the one to three digit decimal code sequence.

Update: The temptation to write something that handles all backslash escapes as the Lua compiler does for string literals proved too strong.

function unbackslashed(s)
    local ch = {
        ["\\a"] = '\\007', --'\a' alarm             Ctrl+G BEL
        ["\\b"] = '\\008', --'\b' backspace         Ctrl+H BS
        ["\\f"] = '\\012', --'\f' formfeed          Ctrl+L FF
        ["\\n"] = '\\010', --'\n' newline           Ctrl+J LF
        ["\\r"] = '\\013', --'\r' carriage return   Ctrl+M CR
        ["\\t"] = '\\009', --'\t' horizontal tab    Ctrl+I HT
        ["\\v"] = '\\011', --'\v' vertical tab      Ctrl+K VT
        ["\\\n"] = '\\010',--     newline
        ["\\\\"] = '\\092',--     backslash
        ["\\'"] = '\\039', --     apostrophe
        ['\\"'] = '\\034', --     quote
    }
    return s:gsub("(\\.)", ch)
        :gsub("\\(%d%d?%d?)", function(n)
            return string.char(tonumber(n))
        end)
end

Such a function could prove useful if parsing user-supplied text and wishing to handle backslash escapes in the text supplied by the user. String literals should be handled by the compiler already.

Another caution is that if you find yourself with partially translated strings, you might actually be suffering from a lack of clarity of design. Actually needing a function like this outside of parsing user input is an indication that there might be a deeper problem with your design.

The function unbackslashed works by first replacing all recognized sequences that take the form of backslash followed by a single character with their equivalent numeric forms. A second pass converts all numeric forms into their literal characters. Two passes was required because the string patterns understood by string.gsub does not support an alternative notation as supported by a full regular expression parser. Otherwise the pattern to match could have been written similar to Perl's /\\([0-9]{1-3})|\\(.)/ and the substitution performed in one pass.

like image 183
RBerteig Avatar answered Nov 21 '25 18:11

RBerteig



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!