Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reject names (people and companies) using whitelists with C# regex's?

Tags:

regex

xss

I've run into a few problems using a C# regex to implement a whitelist of allowed characters on web inputs. I am trying to avoid SQL injection and XSS attacks. I've read that whitelists of the allowable characters are the way to go.

The inputs are people names and company names.

Some of the problems are:

  1. Company names that have ampersands. Like "Jim & Sons". The ampersand is important, but it is risky.

  2. Unicode characters in names (we have asian customers for example), that enter their names using their character sets. I need to whitelist all these.

  3. Company names can have all kinds of slashes, like "S/A" and "S\A". Are those risky?

I find myself wanting to allow almost every character after seeing all the data that is in the DB already (and being entered by new users).

Any suggestions for a good whitelist that will handle these (and other) issues?

NOTE: It's a legacy system, so I don't have control of all the code. I was hoping to reduce the number of attacks by preventing bad data from getting into the system in the first place.

like image 434
jm. Avatar asked Dec 07 '25 12:12

jm.


1 Answers

This SO thread has a lot of good discussion on protecting yourself from injection attacks.

In short:

  1. Filter your input as best as you can
  2. Escape your strings using framework based methods
  3. Parameterize your sql statements

In your case, you can limit the name field to a small character set. The company field will be more difficult, and you need to consider and balance your users need for freedom of entry with your need for site security. As others have said, trying to write your own custom sanitation methods is tricky and risky. Keep it simple and protect yourself through your architecture - don't simply rely on strings being "safe", even after sanitization.

EDIT:

To clarify - if you're trying to develop a whitelist, it's not something that the community can hand out, since it's entirely dependent on the data you want. But let's look at a example of a regex whitelist, perhaps for names. Say I've whitelisted A-Z and a-z and space.

Regex reWhiteList = new Regex("^[A-Za-z ]+$")

That checks to see if the entire string is composed of those characters. Note that a string with a number, a period, a quote, or anything else would NOT match this regex and thus would fail the whitelist.

if (reWhiteList.IsMatch(strInput))
   // it's ok, proceed to step 2
else
   // it's not ok, inform user they've entered invalid characters and try again

Hopefully this helps some more! With names and company names you'll have a tough-to-impossible time developing a rigorous pattern to check against, but you can do a simple allowable character list, as I showed here.

like image 64
patjbs Avatar answered Dec 11 '25 21:12

patjbs



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!