I am cleaning the data using the tips my internal advisor has provided me with. I am also starting to write a machine learning algorithm which classifies the song artist and popularity. This is the first step in identifying and later predicting song popularity.
I also began to have troubles with the PyCharm CE environment and hence switched to using command line + sublime text — I find this much easier and simpler to work with.
Last but not least, I encountered an error with the dataset while conversing with my internal advisors. The hdf5 files were structured like a tree, and the script I had written to generate the CSV file was not picking up the year the song was built accurately for every data point. Hence, I tried to clear this error by rerunning the script to generate another 10,000 songs, and using Excel tools to clear all entries without a year in their tags.
Till Next Time,