Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Username uniqueness validation - Design Approach

Tags:

algorithm

hash

This is a general design problem - I want to validate a username field for uniqueness when the user enters the value and tabs out. I do a Ajax validation and get a response from the server. This is all very standard. Now, what if I have a HUGE user database ? How to handle this situation ? I want to find if a username "foozbarz" is present among 150Million usernames ?

  1. Database queries are out of question [EDIT] - Read the username database once and populate the cache/hash for faster lookup (to clarify Emil Vikström's point)
  2. In memory databases wont help either
  3. Keep an in-memory hash (or cache/memcache) to store all usernames - usernames can be easily hashed and lookup will be very fast. But there are some problems with this: a. Size of the hash - can we optimize so that we can reduce the hash size ? b. Hash/cache refresh frequencies (users might get added while we are validating)
  4. Shard the username table based on some criteria (e.g.: A-B in table username_1 and so on) - thanks piotrek for this suggestion

Or, any other better approach ?

like image 520
Ved Avatar asked Oct 20 '25 02:10

Ved


2 Answers

why don't you simply partition the data? if you have/plan to have 150M+ users i assume you have/will have budget for this. if you are just starting (with 2k users) do it traditional way with simple indexed search on database. when you have so many users that you observe performance issues and measure that this is because of your database (and not e.g. www server) then you simply put another database. on the first one you will have users with name from a to m and rest on the other one. you may choose other criterion, like hash, to make data be balanced. when you need more you will add more databases. but if you don't have so many users right now, i advise you not to do any premature optimizations. there are many things that may become a bottleneck with this amount of data

like image 99
piotrek Avatar answered Oct 21 '25 18:10

piotrek


You are most likely right about doing some kind of hashing where you store the taken names and, obviously, not hashed means it's free.

What you shouldn't do is rely on that validation. There can be a lot of time between user pressing Register and user checking if name is free.

To be fair, you only have one issue here and that's consideration for whether you REALLY need to worry whether you will get 150 million users. Scalability is often an issue, but unless this happens over night, you can probably swap in a better solution before this happens.

Secondly, your worry about both users getting a THIS NAME IS FREE and then one taking it. First of all, the chances of that happening are pretty damn low. Secondly, the only ways I can think of ‘solving’ this in a way where user will never click OK with validated name and get a USERNAME TAKEN is to either a) Remember what user validated last, store that, and if someone else registers that in a mean time, use AJAX to change the name field to taken and notify the user. Don't do this. A lot of wasted cycles and really too much effort to implement. b) Lock usernames as user validates one, for a short period of time. This results in a lot of free usernames coming up as taken when they actually aren't. You probably don't want this either.

The easiest solution for this is to simply put hash things into the table as users actually click OK, but before doing that, check if the name exists again. If it does, just send the user back with USERNAME TAKEN. The chances of someone racing someone else for a name are really, really slim and I doubt anyone will make a big fuss over how your validator (which did its job, the name was free at the point of checking) ‘lied’ to the user.

Basically your only issue is how you want to store the nicknames.

like image 34
Mateusz Kowalczyk Avatar answered Oct 21 '25 17:10

Mateusz Kowalczyk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!