I'm cleaning text strings in R. I want to remove all the punctuation except apostrophes and hyphens. This means I can't use the [:punct:] character class (unless there's a way of saying [:punct:] but not '-). 
! " # $ % &  ( ) * + ,  . / : ; < = > ? @ [ \ ] ^ _  { | } ~. and backtick must come out.
For most of the above, escaping is not an issue. But for square brackets, I'm really having issues. Here's what I've tried:
gsub('[abc]', 'L', 'abcdef') #expected behaviour, shown as sanity check
# [1] "LLLdef"
gsub('[[]]', 'B', 'it[]') #only 1 substitution, ie [] treated as a single character
# [1] "itB"
gsub('[\[\]]', 'B', 'it[]') #single escape, errors as expected
Error: '[' is an unrecognized escape in character string starting "'[["
gsub('[\\[\\]]', 'B', 'it[]') #double escape, single substitution
# [1] "itB"
gsub('[\\]\\[]', 'B', 'it[]') #double escape, reversed order, NO substitution
# [1] "it[]"
I'd prefer not to used fixed=TRUE with gsub since that will prevent me from using a character class.  So, how do I include square brackets in a regex character class?
ETA additional trials:
gsub('[[\\]]', 'B', 'it[]') #double escape on closing ] only, single substitution
# [1] "itB"
gsub('[[\]]', 'B', 'it[]') #single escape on closing ] only, expected error
Error: ']' is an unrecognized escape in character string starting "'[[]"
ETA: the single substitution was caused by not setting perl=T in my gsub calls.  ie:
gsub('[[\\]]', 'B', 'it[]', perl=T)
You can use [:punct:], when you combine it with a negative lookahead
(?!['-])[[:punct:]]
This way a [:punct:]is only matched, if it is not in ['-]. The negative lookahead assertion (?!['-]) ensures this condition. It failes when the next character is a ' or a - and then the complete expression fails.
Inside a character class you only need to escape the closing square bracket:
Try using '[[\\]]' or '[[\]]' (I am not sure about escaping the backslash as I don't know R.)
See this example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With