Project Title: Using Machine Learning to Predict Student Performance
BASIS Advisor: Dr. Brown
Internship Location: Texas State University
Onsite Mentor: Dr. Jelena Tesic
Education is extremely important in the modern world. Students who do well in school tend to do well later in life, and students who do poorly in school tend to have problems later in life. Today’s students will eventually make up the workforce of the nation, and it has been shown that nations with better education systems also tend to perform better economically. However, it can be difficult to detect the signs that a student will face trouble, which can be a problem as early intervention is essential in helping a student do better in school and gain the full benefits associated with education. Machine learning techniques can be very useful for analyzing data and can sometimes find correlations that are difficult for humans to notice. Along with Professor Tesic from Texas State University, I will create a program that can learn from student test data to detect, at a higher-than-chance rate, students who might be at risk in the future. This can be used to provide help before it is too late and the students are already failing.
My Posts
Week 11
My project is mostly finished. I just tried out a few final parameter settings for my algorithms this week. Then, I copied all of the final results down. I’m writing my PowerPoint for my final presentation now. I have things such as the best results for each data set, a basic explanation of machine learning, […]
Week 10
Finally, my project is nearing its conclusion. I am using data sets from here: https://www.kaggle.com/spscientist/students-performance-in-exams/downloads/students-performance-in-exams.zip/1, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0171207, https://archive.ics.uci.edu/ml/datasets/student+performance#, and https://www.kaggle.com/aljarah/xAPI-Edu-Data. Unfortunately, I was unable to obtain a data set from the principal I was meeting earlier, but at least the data sets I was able to use are enough for what I am working on so […]
Week 9
I’ve pretty much finished with the Kaggle data set I was using. While a few algorithms I was using were previously underperforming, I tested them with different parameters, and eventually found ones that increased the accuracy to fairly good levels (around 70-80%) correctly classified so far. Also, it seems that the algorithms are being run […]
Week 8
I’ve been making some good progress. I have some student performance datasets which my algorithms are working on. The accuracy, so far, seems to be roughly in the 70% range, from the results I am getting in Weka. It could likely be improved, but it is a good start. I’ve also been looking at this […]
Week 7
I’ve managed to make a lot of progress! I have found information on Google for implementing Logistic Regression, Naive Bayes Classifier, Support Vector Machines, Random Forests, and AdaBoost algorithms in Weka. These algorithms were the ones used in https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0171207, and I was trying to replicate its results to see if I was on the right […]
Week 6
For this week, I was on vacation. I didn’t do anything related to my project. April Fools! Actually, I did a couple of things related to my project this week. First, I continued working on the machine learning course, with the current topic being neural networks. Also, based on some discussions with my advisors, I […]
Week 5
Things have been going okay so far. I am currently continuing the machine learning course I was working on previously. I have had some problems with one of the programming activities in this course. The activity involved implementing the gradient descent algorithm. At first, the program I wrote just gave errors and did not work […]
Week 4
I have finished the first part of the Coursera machine learning course I have started taking. In this part of the course by Andrew Ng, I learned about the history and purpose of machine learning, the difference between supervised and unsupervised learning, model and cost functions, and linear algebra. I will start working on the […]
Week 3 – More analysis
I have finished the Weka tutorial I was doing earlier. I have also done some work in document analysis, which could be useful for the data sets for this project. The document analysis included training an algorithm so it could tell whether a passage was about a given topic or not. I have also joined […]
Week 2 – Machine Learning Research
This week, I learned even more about the Weka machine learning system. I am working on a tutorial that teaches about the various features of Weka. Two important types of machine learning algorithms are nearest-neighbor learning and decision trees. Nearest-neighbor learning involves classifying data points using the instances closest to them, and decision trees are […]
Preparation
There are two important parts to my project: finding a data set of student performance and studying how to do machine learning. I am working on both of these right now. I need data sets for both training the machine learning program and then testing it once it has been trained. For training, I am […]