Monday: I began searching for datasets that I can use for this project, and came across the Million Songs Dataset and the Free Music Archive. The issue with both these datasets was that they were over 100GB each, and would be difficult to store on my computer. I ended up choosing the Million Songs Dataset as there were more factors I could analyze using it.
Tuesday: After choosing the Million Songs Dataset, I needed to browse its contents to determine whether there were enough Bollywood songs to take into consideration. I did this through downloading a DB Browser for SQLite, and briefly learning SQL to navigate through the h5 file the dataset provided. I determined that other datasets on various sites (Kaggle, etc) also did not have sufficient content to base my project on only Bollywood music. Hence, my external advisor and I decided to expand the scope of this project to World music.
Wednesday/Thursday: The first problem I discovered while starting the programming aspect of this project was the format in which the Million Songs dataset was provided. The h5 files were hard to analyze on Python, and it would be much easier with a CSV file. Hence, I began to work on a script to convert these h5 files into CSV format, and am still working on that right now.
Friday: I went out to buy an external hard drive (2 TB) to store portions of the Million Songs dataset. I also continued work on the script to convert the h5 files into CSV format.