Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word classification algorithm pro cons

As for college project I am required to build a software that, given some comments concerning a virtual construction site, detects its actual state (just started, in construction, terminated).

For example, given the comments:

  • "Happy to hear we can walk through the English Channel bridge"
  • "Yesterday I went to the newly built bridge to have a trip to France with my friends"
  • "They just finished the site and there are already cracks in the 5th miles. What a letdown!"

The system should detect that the "English Channel bridge" construction site has ended.

At the moment I'm trying to choose what word classification algorithm to use for this project. I searched online looking for the best classification algorithm to use. I've read about SVC but, since I'm not really an expert in this field, I am unsure about the compliance/goodness of SVC with my scenario.

What I'm trying to obtain is not the solution to my problem, but a list of available algorithms with their pros and cons.

like image 846
Ada Avatar asked Jan 28 '26 15:01

Ada


1 Answers

You are formulating your problem incorrectly, making it difficult for people to give you a list of pros and cons.

The problem you are describing is not really a word classification problem since you are not classifying words. What you are trying to do is:

  1. Named Entity Recognition for construction projects
  2. Classify each construction Named Entity into 3 different types based on the mention context.

The algorithm is not the real issue. Most classification algorithms (linear regression, decision trees, SVM, etc...) will work.

The problem you actually have (but don't realize based on your question) is that you have no training data for either finding construction project named entities or classifying those entities once you have them into your 3 categories.

My suggestion would be that you use one of the freely available NER toolkits/libraries out there, add in dictionary features related to construction projects (words like bridge, tower, etc...) and see how well you can do at the first part of your task.

More important considerations are:

  1. How much time/money do you have to get annotated data?
  2. What sort of performance do you need?
  3. What language/libraries are you willing toconsider (the least important question IMHO)

I'm sorry, I realize this is probably not the answer you want to hear but I suspect it is the answer you need. ;)

like image 143
ozborn Avatar answered Jan 30 '26 11:01

ozborn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!