Week 2: Learning and Data Cleaning

Mar 03, 2019


After downloading a subset of data (approximately 1.2 GB) , I used a script to convert the h5 files into CSV files which I could read in PyCharm.

Most of my week, after that, was spent learning how to clean data through various websites and courses. I am dropping many fields in the dataset as they are old and unrelated to judging popularity today. I am continuing to make progress on cleaning the data and will keep you guys updated with more soon.

I noticed that there is a hotness feature that comes with each song in the dataset, so my previous idea of scraping Billboard songs to indicate popularity is unnecessary.

This week was mostly taken by learning, so there isn’t a heap of information on my blog, but there will be next week when I make progress on the machine learning model.

2 Replies to “Week 2: Learning and Data Cleaning”

  1. Anjali S. says:

    Learning is just as important as data collection/model building, and I can’t wait to read what you have next week!

  2. Ray L. says:

    It is important to have useful data in your datasets so that it can accurately predict the outcomes for these songs. It is very smart to drop all of the unused metrics as that can save up space and can allow you to have a bigger data set

Leave a Reply

Your email address will not be published.