There are two important parts to my project: finding a data set of student performance and studying how to do machine learning. I am working on both of these right now.
I need data sets for both training the machine learning program and then testing it once it has been trained. For training, I am using the Student Performance Data Set from the UCI Machine Learning Repository, available online. It was more difficult finding a test data set, as there was little publicly available data on this topic. Finally, I contacted a school district, and I was able to find the anonymous data that I needed if I stated which format it should be in. Now, I will find out which attributes are the most important from the UCI data set, so I can get the other data set in the same format to make it easier to analyze.
For writing the program itself, I am using the Weka machine learning software. I need to learn how to use this software, so I am following a tutorial. So far, I have managed to analyze some sample programs, but I still have a lot to do. Also, I will read some of the machine learning articles and papers that I have found.