I am using Ruby on Rails 3.2.2 and MySQL. I would like to know if it is "advisable" / "desirable" to store in a database table related to a class all records related to two others classes for each "combination" of their instances.
That is, I have User and Article models. In order to store all user-article authorization objects, I would like to implement a ArticleUserAuthorization model so that
given N users and M articles there are N*M ArticleUserAuthorization records.
Making so, I can state and use ActiveRecord::Associations as the following:
class Article < ActiveRecord::Base
has_many :user_authorizations, :class_name => 'ArticleUserAuthorization'
has_many :users, :through => :user_authorizations
end
class User < ActiveRecord::Base
has_many :article_authorizations, :class_name => 'ArticleUserAuthorization'
has_many :articles, :through => :article_authorizations
end
However, the above approach of storing all combinations will result in a big database table containing billions billions billions of rows!!! Furthermore, ideally speaking, I am planning to create all authorization records when an User or an Article object is created (that is, I am planning to create all previously mentioned "combinations" at once or, better, in "delayed" batches... in any way, this process creates other billions billions of database table rows!!!) and make the viceversa when destroying (by deleting billions billions of database table rows!!!). Furthermore, I am planning to read and update those rows at once when an User or Article object is updated.
So, my doubts are:
Note: I would use this approach because, in order to retrieve only "authorized objects" when retrieving User or Article objects, I think I need "atomic" user authorization rules (that is, one user authorization record for each user and article object) since the system is not based on user groups like "admin", "registered" and so on. So, I thought that the availability of a ArticleUserAuthorization table avoids to run methods related to user authorizations (note: those methods involve some MySQL querying that could worsen performance - see this my previous question for a sample "authorization" method implementation) on each retrieved object by "simply" accessing / joining the ArticleUserAuthorization table so to retrieve only "user authorized" objects.
The fact of the matter is that if you want article-level permissions per user then you need a way to relate Users to the Articles they can access. This neccesitates a minimum you need N*A (where A is the number of uniquely permissioned articles).
The 3NF approach to this would be, as you suggested, to have a UsersArticles set... which would be a very large table (as you noted).
Consider that this table would be accessed a whole lot... This seems to me like one of the situations in which a slightly denormalized approach (or even noSQL) is more appropriate.
Consider the model that Twitter uses for their user follower tables:
Jeff Atwood on the subject
And High Scalability Blog
A sample from those pieces is a lesson learned at Twitter that querying followers from a normalized table puts tremendous stress on a Users table. Their solution was to denormalize followers so that a user's follower's are stored on their individual user settings.
Denormalize a lot. Single handedly saved them. For example, they store all a user IDs friend IDs together, which prevented a lot of costly joins. - Avoid complex joins. - Avoid scanning large sets of data.
I imagine a similar approach could be used to serve article permissions and avoid a tremendously stressed UsersArticles single table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With