• Project Title: Phrase Feature Extraction for Text Classification

  • BASIS Advisor: Ms. Bhattacharya

  • Internship Location: Hacker Dojo

  • Onsite Mentor: Dr. Jeyendran Balakrishnan

Natural Language Processing (NLP) is a subfield of machine learning and is used to understand and interpret human language as it applies to the English language. The main goal of my Senior Project is to identify the main topic of news articles published on the Internet using machine learning and NLP. For example, if an article was on the FIFA World Cup, my program would say that the topic of the article is related to soccer or, more generally, sports. I will be coding in Java for my project and will be collecting data from my program to make it more accurate. I will collect most of my data by testing various articles to see how accurate my program is. I will use this data to determine how I should make changes to my program to make it better using the Stanford CoreNLP library. By the end of my Senior Project, I want to be able to understand the field of machine learning as it applies to NLP and be able to learn how to use open source NLP libraries such as the Stanford CoreNLP library.