I would like to rank or sort a collection of items (with size potentially greater than 100,000) where items in the collection have no intrinsic (comparable) value, instead all I have is the comparisons between any two items which have been provided by users in a subjective manner.
Example: Consider a collection with elements [a, b, c, d] and comparisons by users b > a, a > d, d > c. The correct order of this collection would be [b, a, d, c].
This example is simple, however there could be more complicated cases:
c > b. In which case that would cause a conflict with the ordering above.b > a, d > c. In which case the ordering is ambiguous. It could be [b, a, d, c] or [d, c, b, a]. In this case either ordering is acceptable.If possible it would be nice to somehow take into account multiple instances of the same comparison and give those with higher occurrences more weight. But a solution without this condition would still be acceptable.
A similar application of this algorithm was used by Zuckerberg's FaceMash application where he ranked people based on comparisons (if I understood it correctly), but I have not been able to find what that algorithm actually was.
Is there an algorithm which already exists that can solve the problem above? I would not like to spend effort trying to come up with one if that is the case. If there is no specific algorithm, is there perhaps certain types of algorithms or techniques which you can point me to?
A simple ranking algorithm would give a higher rank to a document that contained all of the keywords in the query and a lower rank to one that contained only some of the keywords. This simple formula can be modified to take into account the keyword weights stored in the search engine's database.
2 Ranking Methods The ranking methods presented here are: average ranks (AR), success rate ratios (SRR) and significant wins (SW).
Graph-based ranking algorithms are essentially a way of deciding the importance of a vertex within a graph, based on information drawn from the graph structure.
This is a problem that has already occurred in another arena: competitive games! Here, too, the goal is to assign each player a global "rank" on the basis of a series of 1 vs. 1 comparisons. The difficulty, of course, is that the comparisons are not transitive (I take "subjective" to mean "provided by a human being" in your question). Kasparov beats Fischer beats (don't know another chess player!) Bob beats Kasparov, potentially.
This renders useless algorithms that rely on transitivity (i.e. a > b and b > c => a > c) as you end up with (likely) a highly cyclic graph.
Several rating systems have been devised to tackle this problem.
The most well-known system is probably the Elo algorithm/score for competitive chess players. Its descendants (for instance, the Glicko rating system) are more sophisticated and take into account statistical properties of the win/loss record---in other words, how reliable is a rating? This is similar to your idea of weighting more heavily records with more "games" played. Glicko also forms the basis for the TrueSkill system used on Xbox Live for multiplayer video games.
You may be interested in the minimum feedback arc set problem. Essentially the problem is to find the minimum number of comparisons that "go the wrong way" if the elements are linearly ordered in some ordering. This is the same as finding the minimum number of edges that must be removed to make the graph acyclic. Unfortunately, solving the problem exactly is NP-hard.
A couple of links that discuss the problem:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.8157&rep=rep1&type=pdf
http://en.wikipedia.org/wiki/Feedback_arc_set
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With