Project Title: Phrase Feature Extraction for Text Classification
BASIS Advisor: Ms. Bhattacharya
Internship Location: Hacker Dojo
Onsite Mentor: Dr. Jeyendran Balakrishnan
Natural Language Processing (NLP) is a subfield of machine learning and is used to understand and interpret human language as it applies to the English language. The main goal of my Senior Project is to identify the main topic of news articles published on the Internet using machine learning and NLP. For example, if an article was on the FIFA World Cup, my program would say that the topic of the article is related to soccer or, more generally, sports. I will be coding in Java for my project and will be collecting data from my program to make it more accurate. I will collect most of my data by testing various articles to see how accurate my program is. I will use this data to determine how I should make changes to my program to make it better using the Stanford CoreNLP library. By the end of my Senior Project, I want to be able to understand the field of machine learning as it applies to NLP and be able to learn how to use open source NLP libraries such as the Stanford CoreNLP library.
My Posts
Week 12: Creating Web App
This is the final week for my senior project and it has been a wonderful journey for me. I have learned a lot about the fundamentals of Computer Science and applied them to this project. During the last week of my senior project, I started developing my web application. I used the Google Cloud platform […]
Week 11: Testing Out A New Classifier
During Week 11, I met my external advisor to discuss what progress I have made in the previous week. He advised me to test out a different classifier, the RandomForestClassifier, to see if I could obtain better results from it. I integrated QuickML’s RandomForestClassifier into my project but did not get good results. I got […]
Week 10: Cleaning Up My Code
During Week 10, I met my external advisor to discuss the progress I have made thus far and gave me advice on what I should do this week. I first began cleaning up my code by using abstractions and methods. Because I started creating methods in my classifier class, my program started running much faster […]
Week 9: Testing the Classifier
During Week 9, I talked to my external advisor about the progress I have made and he gave me good advice on what I should be doing this week. I began tuning my classifier and trying to find which methods of the classifier get better results. I used methods such as setMem(), which changes the […]
Week 8: Trying a New Technique
During Week 8, I met with my external advisor to review the progress I had made during the last week. While I was getting decent results using my previous technique, my advisor told me to use the n-fold cross validation technique to test and train my data. Using this technique, I can test every single […]
Week 7: Gathering More Data
During week 7, I met my external advisor to review the progress I have made in the last week. I needed to gather more articles as data because I only had a little bit of data for training and testing and needed more to get more accurate results. I went on websites such as New […]
Week 6: Saving Features in Files
During Week 6, I met my external advisor to discuss the progress of my project. He gave me a few tips for making my code more readable and more organized. In the beginning of the week, I came up with the idea to save all the features in files to save time running my FeatureExtractor […]
Week 5: Creating the Classifier
During week 5, I met with my external advisor to discuss the progress I have made thus far. He helped me by giving me a few links to learn how to implement a classifier into my project, which is the machine learning model. I first started using a pattern matcher my external advisor to split […]
Week 4: Connecting my Programs Together
During Week 4, I connected my TrainingDataReader class from last week to my FeatureExtractor class, which is the NLP portion of my project. I was getting the data from the TrainingDataReader class and testing it out on my FeatureExtractor class. In the beginning of the week, talked to my external advisor on a video call […]
Week 2 and Week 3: Working with Data
During Week 2, I was sick and was not able to work on my senior project. During Week 3, I first met with my external advisor and he gave me a set of articles as data to test my programs to see how accurate they are. In the beginning of the week, I worked on […]
Week 1- Beginning My Research
During the first week of my senior project, I first began reading the book “Foundations of Statistical Natural Language Processing” to review how Natural Language Processing (NLP) works. I also familiarized myself with my summer project code to review what I have done over the summer and apply these concepts for my senior project. I […]