Hi everyone! Thanks for stopping by to check out this week’s blog. This week, I will take a look at more work that I am doing on the post-processing side and discuss the overall social impact of my project. As usual, before we dive into the specifics of my work this week, let’s discuss the broad objectives of my project. The goal of my project is to design an automated computer vision pipeline that can detect seizure patients given a full video of their activity in a clinical setting. I am specifically fine-tuning object detection networks on a dataset of seizure patient videos. This is a difficult task since in many videos, seizure patients are covered in headgear and additional materials such as blankets, wiring, etc. In fact, in some videos, even I find it difficult to identify the exact location of the patient! To summarize my work this week, I made progress towards calculating the average precision of patient sub videos and added code for a grayscale data augmentation. These updates are discussed in greater detail below:
Remember that post-processing refers to the practice of normalizing or cleaning the output of our model to ensure that the results are easily reproducible and easy to understand. In the case of my task, post-processing is useful in that it lets me visualize the results of my model and make comparisons against other models and similar computer vision approaches. In my project, there are three broad parts of post-processing: video and sub video tracking, plotting metrics, and saving prediction results. I have already implemented a sub-video tracking system. Note that sub videos refer to smaller segments of full videos. This subvideo tracking system enables me to take a specific model that I have already trained and create a new MP4 video with predictions of the patient’s location overlayed. To give you an idea of what this looks like, imagine a patient resting on video. The model draws a bounding box (a set of points that construct a rectangle) and this encloses the patient. Across all the frames of the video, this bounding box is added and thus, we have a working patient tracking system. As mentioned in previous blogs, the accuracy of the predicted bounding box varies from video to video. Based on this high variance in performance, it’s clear that while our model generally performs well on easier videos, it has inaccuracies for medium and hard videos. Note that medium and hard videos refer to seizure patient videos where the patient is almost entirely covered in blankets and is in low lighting.
Now that we have covered sub video tracking, let’s discuss the other methods: plotting metrics and saving prediction results. It’s important to plot metrics and save the output of our model because it can help us understand how the model is doing on unseen data and help us avoid rerunning the model to attain outputs. This brings me to my main question: what is the goal of these post-processing steps? In short, these post-processing steps will help me calibrate the performance and output of my model. Since there are no existing approaches for this problem of seizure patient detection via video data, I will need to form a claim regarding the performance and abilities of my object detection model. To form this claim, I will need to understand the performance of my model in terms of numerical data like average precision (measured via IoU – the overlap of bounding box between real and predicted) alongside the overall performance across different types of videos. As mentioned above, these are post-processing steps, and thus, we will need to use them to best understand how the model is doing in different situations. Let’s dive into what exactly I did this week.
This week, the first major task that I accomplished was the addition of a grayscale transform. This is a useful data augmentation that enables me to transform a small number of training images from color to grayscale. This will enable our model to have more examples of gray images to train on and will improve the model’s performance on unseen grayscale data as well. The code for the transform is given below:
Beyond the addition of this custom grayscale data augmentation, I continued to work on the code for extracting the average precision (AP) from sub videos. This process essentially involves reverse-engineering the work in Blog #2 (converting + cleaning COCO data). Remember that COCO data are JSON files with a format for storing the annotations of images. The goal of this task is to create the COCO file from scratch. Currently, I am still working on this part of the project!
Project Implications and Conclusions
At the beginning of this blog, I brought up the idea of the societal implications of my project. As I enter the final weeks of my project, it’s always nice to step back and embrace the bigger picture so that we can give value to the smaller technical pieces of this pipeline. Based on the literature review that I conducted in the early stages of my work, I have seen that there are no existing seizure patient detection systems in use. Typically, a nurse will need to monitor the patient in the clinic 24/7. Why does the nurse need to monitor the patient? Of course, seizures are the main event of interest, but there is also the possibility of other artifacts (false positives) such as a fall, a patient bumping into the railing, etc. Being able to alert a clinician of an event immediately rather than relying on manual monitoring can help to improve the quality of life for seizure patients, their families, and hospital/ICU staff. Thus, an automated patient detection system has clear value in the clinical environment.
In this week’s blog, we discussed brief updates as I wrap up post-processing for my model and also addressed both technical and big picture motivation as we head into the final weeks of this journey. Thanks for stopping by and I hope to see you next week!
- Image 1: From my own code
- “Grayscale Image.” Grayscale Image – Rosetta Code, rosettacode.org/wiki/Grayscale_image.