Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get Coarse-grained Part of Speech Tags?

I have a data set which is annotated by Collins parser. Right now, I am keeping the POS of each word in the data set as a feature. The problem is that I don't need fine-grained POS. So, I have combined some of the tags. For example, I assume all VBD,VBP,VBZ,VBG under the category of "Verb". And for nouns, I assume NNP and NNS as "Noun" category.

So, here is the list of POS tags that I have after doing all combinations:

VB, NN, TO, JJ, IN, EX, RB, WP, PRP, MD, UH, WRB, WDT, RP, CD, POS, DT, PRP$, WP$, CC, RBR

Now, my question is where can I find a list of coarse-grained POS tags? Is there any standard coarse-grained POS tag list?

In my system, If I don't combine other POS tags, I can get better results. I am wondering if I am allowed to keep my current list? Or should I combine them as well?

Thanks in advance,

like image 794
user1419243 Avatar asked Oct 23 '25 16:10

user1419243


1 Answers

You can use Petrov's universal tag set. The universal tag set is 12 in number and increases the POS tagging efficiency drastically. You can refer Universal POS tagset You can also download the code and the mappings for few taggers at POS mapping

like image 124
Denzil Avatar answered Oct 26 '25 10:10

Denzil



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!