I'm developing a web app where users can response to blog entries. This is a security problem because they can send dangerous data that will be rendered to other users (and executed by javascript).
They can't format the text they send. No "bold", no colors, no nothing. Just simple text. I came up with this regex to solve my problem:
[^\\w\\s.?!()]
So anything that is not a word character (a-Z, A-Z, 0-9), not a whitespace, ".", "?", "!", "(" or ")" will be replaced with an empty string. Than every quatation mark will be replaced with: """.
I check the data on the front end and I check it on my server.
Is there any way somebody could bypass this "solution"?
I'm wondering how StackOverflow does this thing? There are a lot of formatting here so they must do a good work with it.
If you just want simple text don't worry about filtering specific html tags.  You want the equvilent to PHP's htmlspecialchars().  A good way to use this is print htmlspecialchars($var,ENT_QUOTES);  This function will perform the following encodings:
'&' (ampersand) becomes '&'
'"' (double quote) becomes '"' when ENT_NOQUOTES is not set.
''' (single quote) becomes ''' only when ENT_QUOTES is set.
'<' (less than) becomes '<'
'>' (greater than) becomes '>'
This is solving the problem of XSS at the lowest level, and you don't need some complex library/regex that you don't understand (and is probably insecure after all complexity is the enemy of security).
Make sure to TEST YOUR XSS FILTER by running a free xss scanner.
I agree with Tomalak, and just wanted to add a few points.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With