Week 2: Fit for the job?

Mar 04, 2019

This week was largely focused on a common problem for those working with AI – overfitting and underfitting data.

A rundown, using a direct parallel in math. Let’s say we have 200 points in a scatter graph. We need to make a function that will hit every one of these points. If these points were generated randomly, we would most likely need a function with 200 parameters (a + bx + cx^2 + dx^3 + … zx^200). That is the maximum amount of parameters we need to hit every point. However, if that data is NOT random (but is still non-linear) and if we allow a bit of leeway (if the function is say, within 0.1 away from the point, it still counts) then the function is very likely to have less than the maximum 200 parameters.

That, in a nutshell, is overfitting. Only the function is our AI’s algorithm output, and the data points is the test input. What this means in practice is a student who memorizes facts, but can’t actually apply them. The algorithm will learn the training data very quickly – yes – but fail miserably in real world scenarios, where often there are exceptions and nuances. This is sometimes caused by having too many nodes in the neural network.

So what’s underfitting? Underfitting is the opposite, where say instead of using 200 parameters for a function that really only needs 100, the function uses 50 parameters. This one’s a little more straightforward, and it’s clear to see how that could easily cause inaccuracies. This is sometimes caused by having too little nodes in your neural network.

What are some solutions to these issues? Well of course, you could increased or decrease the nodes in your network, but that isn’t always an option for a variety of reasons (dependencies, there’s no “perfect” number of nodes). In the case of underfitting, increasing the size of your dataset often and usually is the best way to solve it. For overfitting, purposely dropping nodes temporarily for each training iteration and regularizing weights are good approaches.

While I had some prior knowledge of this area, I hadn’t really seen it (or any AI training for that matter) in practice before. Here’s a bit of a show of some testing I did with the IMDB natural language processing dataset.

First things first, these are all OVERFITTING models. Some more than others.

That out of the way, this graph is a little hard to interpret, so let’s break it down.

Here we have 3 neural networks, which are created the same way (2 “processing layers” that actually does work, and 1 output layer that makes the final decision), except for the number of neural nodes they have. The smaller model (orange) has 4 nodes per “processing layer.” The baseline model (blue) has 16 nodes per processing layer, and the bigger model (green) has 50. Our x-axis is how many epochs (iterations of training) the algorithm goes through and the y-axis is binary cross-entropy (basically how INCORRECT it is). What this means is:

Lower on the y-axis is better.

So what’s going on here? You’ll notice the larger model and the baseline model eventually hit 0 cross-entropy at around 10 and 15 epochs respectively in training. What this means is after around 10 or 15 of looking at the data, it can tell you the correct answer 100% of the time. However, looking at the dotted line, we can see their inaccuracy is very high. Just like we mentioned before, they essentially memorized the answers for their homework but failed in the actual test.

How about with the smaller model? Well it doesn’t quite hit the 0 cross-entropy during training, but it does much better in actuality. However, it still doesn’t quite get where we want and still follows the patterns of its larger friends. So this one is still overfitting.

(I can’t find my screenshots for my underfitting models. Will try to remember to upload it here later).

Thanks for reading!

 

3 Replies to “Week 2: Fit for the job?”

  1. Dennis Woo says:

    I am in awe of your computer science knowledge. This blog post was very well explained, and you’ve managed to convey your intent to someone who has virtually no AI understanding.

    I cannot wait to hear more! Looking forward to seeing how you apply this to cars.

  2. Alex Y. says:

    Very specific analysis! Looking forward to hearing more from this.

  3. Aarushi N. says:

    Is there a way to reduce the inaccuracy that you’ve found?

Leave a Reply

Your email address will not be published.