Week 9 marks ¾ of the way to the end of senior projects! This week, I played around with 3 different black-box models in an attempt to determine which would be best suited for my hybrid model.
The first black-box model is called XGBoost, and is a commonly used neural network by practitioners. I started by running the covtype dataset through this neural network. This dataset was created by UC Irvine and it contains data regarding the foliage coverage in forests. Each piece of data has a label 1-7, indicating what type of tree it is. For example, type 1 is spruce/fir, type 2 is lodgepole pine, type 3 is Ponderosa pine, etc. Since the data I downloaded from UC Irvine’s website was in its rawest form, I had to dig for the normalized version of the data and use a package called “e1071” to process it. After I loaded the data into R, I had to separate it into testing and training sets (which I roughly split 20-80). Then, I ran the data through the XGBoost neural network. I repeated this process for 2 other types of black-box neural networks: adaboost and randomForest. Since I used the e1071 package to process my data, it wasn’t in the correct format for these two models. Therefore, I had to use the function as.matrix() to convert all the data into matrix form and as.integer() to convert all the labels into numbers from 1-7. There are still some errors with adaboost and randomForest which I’ll be tackling in Week 10.
Thanks for following along, see you next week!