After downloading a subset of data (approximately 1.2 GB) , I used a script to convert the h5 files into CSV files which I could read in PyCharm.
Most of my week, after that, was spent learning how to clean data through various websites and courses. I am dropping many fields in the dataset as they are old and unrelated to judging popularity today. I am continuing to make progress on cleaning the data and will keep you guys updated with more soon.
I noticed that there is a hotness feature that comes with each song in the dataset, so my previous idea of scraping Billboard songs to indicate popularity is unnecessary.
This week was mostly taken by learning, so there isn’t a heap of information on my blog, but there will be next week when I make progress on the machine learning model.