Week 4: Understanding the Code and Initial Results
Hi everyone! Welcome to my fourth blog. It’s hard to believe that we’re almost halfway through this adventure! In this blog, we will explore my seizure patient detection pipeline step-by-step. My pipeline has four major building blocks: loading and registering the data, creating classes for evaluation, training the model with these custom classes, and graphing the loss. Let’s get right into it.
Data Preprocessing and Loading
As noted in Blog #2, we first need to have a way of understanding the seizure patient video data. Over 50 videos of seizure patients were annotated. In week 2, I segmented these 50 videos into 60% training, 20% validation, and 20% testing. The training set is the set of videos that the model learns from. This is where the model learns to identify patients in videos based on labeled data. The validation set is the set of videos that enables us to tune the model. Sometimes a model may learn the training set super well and even memorize it to a degree. This is called overfitting. Overfitting is bad since it means that the model won’t perform well on videos and patient data outside of the training set. Thus, we use the validation set to tune our model and identify if it is overfitting. The testing set is similar to the validation set in that the model has never seen these videos or images of patients yet. However, the testing set is designed to get the final performance of the model after tuning and validation. To load in the training data that I previously annotated and merged (see Blog #2 to understand how I did this), I registered the dataset using the COCO tool provided by Detectron2. The code below enables me to load in the training dataset of 30 videos:
The output of this code cell is the number of annotated frames across the 30 videos. The exact number is 17,686. This represents roughly 600 frames annotated across 30 videos (600*30=18,000). For the training data, we only use annotated frames in each video. To explain the processes occurring in this code, we first import libraries such as Detectron2, PathLib, and DynaConf. We then register the data using the COCO instances function. To specify the location of annotations and image data, we provide the appropriate paths. The next step in the data preprocessing and loading phase is to set up our validation dataset. We use the same code as shown above but we change the path of annotations and images to that of the validation set instead of the original training set:
As you can see, there are roughly 6,854 validation set annotations. This represents roughly 600 frames across 10 videos. The seizure patient videos used in this validation dataset are completely different from the 30 videos used in the training set. Note that each video shows a different patient and has varying environments.
Classes for Evaluating Model
Before we can train our model on the data we just loaded in, we will need to create a custom class that can store the loss of our model across both the training and validation datasets. The Detectron2 Library provides a Default Trainer and evaluator but these will not let us check for overfitting. This article provides custom classes to keep track of both total loss and validation loss. Here is the code that achieves this:
This code enables us to build what’s called a “hook.” Hooks are the method by which we store the total loss and validation loss during the training of our model. After a certain number of evaluation steps, we find the validation loss using a hook. For example, let’s say that we train the model for 50,000 iterations and we tell our model that we will use 1,000 iterations for evaluation. This means that every 1,000 iterations our model is training, it calculates one validation loss across those iterations. So for 50,000 training iterations and an evaluation period of 1,000 iteration, we get 50 validation losses (50,000/1,000 = 50). If we plot these validation losses, we will get an idea of the overfitting. Let’s dive more into training and tuning in the next section:
Training and Tuning of the Model
Now that we have our data and a way to train our model, let’s train it! We will use the MyTrainer class defined above instead of Detectron2’s Default Trainer in order to keep track of validation loss (and overfitting). The code below accomplishes the task of training:
Let’s walk through this code. We first get the object detection model from a config file (cfg). If you read last week’s blog, you will remember that Detectron2 is powered by instance segmentation architectures such as Mask R-CNN. To retrieve an existing Mask R-CNN model, we can access its “YAML” file from the Detectron2 library. The next step is to set the key parameters for training and tuning. We first identify our training and testing set. Note that our cfg.Datasets.Test is not the true test set since we are using the validation set to identify overfitting. We will also set the evaluation period of our model (100 in this case). This value impacts how often we check the validation loss.
We can now tune the performance of our model with the remaining parameters. The two most important parameters, in this case, are the batch size and learning rate. Just like in any deep learning problem, we want the highest accuracy (average precision) possible for our model. The rate at which the model converges to the minimum loss and highest accuracy is controlled by the learning rate. A high learning rate (>0.01) means that the model converges very quickly. This is not ideal since it means that the model approaches the loss minimum too fast and will likely overshoot it. A low learning rate is not optimal either since it means the model takes too long to converge. In other words, using a low learning rate will mean that our model will take too long to reach a meaningful loss minimum. You may be wondering: why do we want to hit the loss minimum? Low loss is good for our model since it translates to high accuracy and performance. Beyond the learning rate, we can also adjust a parameter known as batch size. The batch size is the number of examples that the model views before it updates during training. Batch size values usually fall between 1 and 512. A larger batch size equates to faster training while a low batch size means slower training. When we optimize our model, both the learning rate and batch size serve as hyperparameters. Hyperparameters act as knobs and dials that we can adjust to improve the performance of our object detection model. Since we are discussing this idea of fine-tuning our model, it’s good to understand how we usually minimize loss. The network that I am using, Detectron2, uses a loss optimizer known as SGD (Stochastic Gradient Descent) by default. Gradient descent is the process by which we find the loss minima for the multivariable loss function of our model. The diagram below explains a simple example with the idea of a cost function (similar to a loss function):
After we set the parameters for our model (the cfg file), we can proceed to begin training! Usually, training for more iterations takes more time and even 5,000 iterations can take upwards of 10 hours. Thus, I like to train my models overnight 🙂
Evaluating Model Performance and Overfitting
After our model finishes training, we can evaluate it to understand how it did. In this case, we will have a CSV file of the validation losses and total losses. Using libraries such as Matplotlib, we can plot the validation loss.
This graph shows 50 different validation losses over 5,000 total training iterations. This means that we had an evaluation period (cfg.TEST_EVAL_PERIOD) of 100 iterations (5,000/100 = 50). The parameters used to obtain this graph were a learning rate of 0.002 and a batch size of 64. This graph is not ideal since it shows overfitting. In the ideal case, the validation loss of our model would decrease initially and we would find a suitable minimum loss. Another graph of interest is the total loss alongside this validation loss:
This graph certainly has the overall structure that we would like since the total loss (training loss) decreases over time. However, it is quite buggy and has not correctly formatted. Note that I was able to create this graph through the Matplotlib library. Next week, I will work to further tune my model and debug these graphs!
Conclusions
This week I did a deep dive into the code behind the model that I am using to detect seizure patients. We explored how I registered my dataset of seizure patient videos, created a custom class to look for overfitting, trained a model with custom parameters, and evaluated the object detection algorithm after training. Next week, I will play around more with parameters like the learning rate and batch size while checking for overfitting. Overall, this process is like a treasure hunt. Instead of searching for buried treasure, we are digging deep for the minimum loss and highest performance! Thanks for sticking through this blog and I hope to see you next week.
Sources
- Images 1, 2, 3, 4, 6, 7: Derived from my code
- Image 5 – Graph of Stochastic Gradient Descent: Kapil, Divakar. “Stochastic vs Batch Gradient Descent.” Medium, Medium, 21 June 2019, medium.com/@divakar_239/stochastic-vs-batch-gradient-descent-8820568eada1.
- Ortega, Marcelo. “Training on Detectron2 with a Validation Set, and Plot Loss on It to Avoid Overfitting.” Medium, Medium, 22 Mar. 2020, medium.com/@apofeniaco/training-on-detectron2-with-a-validation-set-and-plot-loss-on-it-to-avoid-overfitting-6449418fbf4e.
Nice progress this week!
Just out of curiosity, is there any way to speed up the training time? I noticed that you said it takes upwards of 10 hours, and I was just wondering if there was any workarounds for that.
Really looking forward to the results!