Hi everyone, I finished goal number three from my last post!
Quick refresher: In my last post, I mentioned that I had three main goals to fulfill. My third one was to make a histogram representing the frequencies of classes, families, and subfamilies.
It was difficult for me to figure out how to do it at first because I wanted to use matplotlib (instead of R, another programming language) to create the graphs but I had not used it before. I chose matplotlib because I could continue to use Python rather than switching to R. It took me many webpages (it seems like I’m always reading one webpage or another…) to learn how to do it, and several more to figure out how to customize my graphs to make them look nicer and more presentable.
In the end, I was able to figure it out! I made three horizontal histograms and was one happy person. Here’s one of them:
Looks cool, right? That’s what I thought, too, because in the program, you can zoom in/out to see what you want to see. But the thing is, my external adviser informed me that he needed to be able to see all of the data (i.e. names and frequencies) without the zooming in and out. Makes sense when you consider what the subfamilies histogram looks like:
So, I ventured to make the graph horizontal.
It looked promising at first, but the subfamilies (the largest data set of the three) still looked really cluttered and largely unreadable. In addition, my external adviser wanted the frequencies in order from highest to lowest. Since most of the frequencies towards the end were just one (for families and subfamilies), I selected the top 20 data points for both the subfamilies and families to display. I left the classes data set alone because it had fewer than 20 total. After a bit more messing around, I got it to work!
After I got the histogram work done, my mentor said that we could start on the “real” stuff next week. Not entirely sure what that entails yet, but I can’t wait!
Meanwhile, I’m still reading those articles.
Have a great day!