I have a sparse matrix whose shape is 570000*3000.  I tried nima to do NMF (using the default nmf method, and set max_iter to 65). However, I found nimfa very slow. Have anyone used a faster library to do NMF? 
                 
                                                                            
                            I have used libNMF before. It's written in C and is very fast. There is a paper documenting the algorithm and code.
The paper also lists several alternative packages for NMF (in bunch of different languages (which I have copied here for future reference).
- The Mathworks [3, 33] 
- Matlab
- 
http://www.mathworks.com/access/helpdesk/help/toolbox/stats/nnmf.
 
- Cemgil [5] 
- Matlab 
- http://www.cmpe.boun.edu.tr/~cemgil/bnmf
 
- Cichocki et al. [6]
- Matlab
- 
http://www.bsp.brain.riken.jp/ICALAB/nmflab.
 
- Cichocki et al. [7]
- Matlab
- http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470746661.html
 
- Hansen et al. [14]
- Matlab
- http://isp.imm.dtu.dk/toolbox/nmf/index.html
 
- Hoyer [16]
- Matlab
- http://www.cs.helsinki.fi/u/phoyer/software.html
 
- Kim et al. [19]
- Matlab
- http://userweb.cs.utexas.edu/users/dmkim/Source/software/nnma/index.html
 
- Lin [25]
- Matlab/Python
- http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html
 
- Schmidt et al. [30]
- Matlab
- http://mikkelschmidt.dk/index.php?id=2
 
- Gaujoux [10]
- R 
- http://cran.r-project.org/web/packages/NMF/index.html
 
- Liu [26]
- R
- http://cran.r-project.org/web/packages/NMFN/index.html
 
- Battenberg et al. 2
- Python
- http://www.eecs.berkeley.edu/~ericb
 
- Schmitt et al. [31]
- Python 
- http://www.procoders.net/?p=409
 
- Dhillon et al. [8]
- C++
- http://www.kyb.mpg.de/bs/people/suvrit/work/progs/nnma.html
 
- Greene et al. [13]
- C++
- http://mlg.ucd.ie/nmf
 
- Pathak et al. [28]
- C++
- http://www.insight-journal.org/browse/publication/152
 
- Wang et al. [34]
- C++ 
- http://www.biomedcentral.com/1471-2105/7/175
 
Disclaimer: I have not tried any of these other packages (aside from MATLAB's).