I have a very basic question. I'd like to do low-rank matrix factorization and I was looking at the Vowpal Wabbit documentation on the topic. My question is:
Is there a difference between these two approaches? (implementation or otherwise)
$ vw --lrq ab5
or
$ vw -q ab --rank 5
Here, a and b are feature namespaces and 5 is the latent-factor dimensionality.
Possible follow-up:
If these are equivalent, will --rank also work for higher-order interactions?
--rank and --lrq are two separate and very different, implementations of matrix factorization/decomposition in vowpal wabbit.
"Matrix factorization", sometimes referred to as "Matrix decomposition" is a general term in ML, there are many ways to approximate a matrix using simpler factors (sometimes with loss of info).
Although they bear some similarities in that they both try to capture the strongest latent interactions between two feature subsets, they are not equivalent neither in implementation nor in the quality of the model they produce. Their performance strongly depends on the problem at hand.
--rank was the first implementation of MF in vw by Jake Hofman. It was inspired by SVD (Singular Value Decomposition)
--lrq was implemented a few years later by Paul Mineiro. It was inspired by libfm
On data-sets which are hard to generalize from (e.g. movielens 1M where user has at most one rating per movie), --lrq seems to perform better. It seems to be using better defaults, converges faster, is more efficient and generates much smaller on-disk models. --rank may perform better on other data-sets where there's more examples per user/item to generalize from.
You can tell the two implementation produce different results by running an example. e.g. pick a dataset under the test directory and run the two algos on it:
vw --lrq aa3 test/train-sets/0080.dat
versus:
vw --rank 3 -q aa test/train-sets/0080.dat
Feel free to add: --holdout_off -c --passes 1000 to make them run longer so you can compare run-times between the two.
You would note that the two use a different number of features per example (--lrq is more minimalistic and will only look at the subset you explicitly tell it to), that the convergence, and final average loss is better with --lrq. If you store the model with -f modelname - you'd note it'll be much smaller with --lrq especially on big data-sets.
OTOH, if you try a data-set like test/train-sets/ml100k_small_train in the source tree, with a rank of 10 between the namespaces u (user) and i (item), you would get a better loss with --rank than with --lrq. This shows that which one is better depends on the data-set at hand.
--cubic)To your second question: --rank will not allow higher interactions. If you try to add --cubic you'll get an error:
vw (gd_mf.cc:139): cannot use triples in matrix factorization
but it will allow multiple/additional -q (quadratic) interactions.
--lrq is less fussy, so you may add higher-order interaction options to it.
Generally, --lrq is more agnostic and independent of other vw options while --rank is using its own standalone SGD code and doesn't accept other options (like --normalized, or --adaptive). Also, memory requirements for --rank are higher.
Again, results will depend on the data, additional options, and specific interactions.
--rank example in the wiki--lrq demo in the source treelibfm (by Steffen Rendle) after which --lrq was designed with many further references.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With