Maximum Entropy classifier for big data sets

Question

I have been looking for a maximum entropy classification implementation which can deal with an output size of 500 classes and 1000 features. My training data has around 30,000,000 lines. I have tried using MegaM, the 64-bit R maxent package, the maxent tool from the University of Edinburgh but as expected, none of them can handle the size of data. However, the size of the data set doesn't seem too out of the world for nlp tasks of this nature. Are there any techniques that I should be employing? Or any suggestion for a toolkit which I may use? I am trying to run this on a 64-bit Windows machine with 8GB of RAM,using Cygwin where required.

Fred Foo · Accepted Answer

Vowpal Wabbit is currently regarded as the fastest large-scale learner. LibLinear is an alternative, but I'm not sure if it can handle matrices of 3e10 elements.

Note that the term "MaxEnt" is used almost exclusively by NLP people; machine learning folks call it logistic regression or logit, so if you search for that you might find many more tools than when you search for MaxEnt.

Maximum Entropy classifier for big data sets

Tags:

machine-learning

classification

nlp

atlantis

1 Answers

Fred Foo

Recent Activity

Donate For Us

Maximum Entropy classifier for big data sets

Tags:

machine-learning

classification

nlp

atlantis

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us