Week 1: Deconstructing Computer Vision, Patient Tracking, and Seizure Detection

Feb 25, 2021


Hi everyone! Thanks for visiting my first Senior Project blog. My project aims to analyze videos of seizure patients and track the patient using object detection algorithms while also identifying anomalies in the patient’s movement. However, to later discuss what object detection is and how I am applying it in my project, we will first need to explain the techniques that it is grounded in. Before we get started, here is a quote that I best believes provides a lens into the topic:

There is great potential to use computer vision technology in a constructive and benevolent way.

– Dr. Fei-Fei Li

What is Computer Vision and how does it work?

If you have an iPhone, you most likely have access to a login feature known as Face ID. Simply put, Face ID is a facial-recognition technology that can identify you using your distinct facial features. It uses sensors and a dot projector to create a 3D map of your face. But how does Face ID actually work? Using Apple’s neural engine, Face ID uses mathematical models and deep learning to compare the existing 3D map of your face with new scans. Face ID is an incredible piece of technology that is a part of the growing ecosystem of computer vision applications. First of all, you may be wondering: What is computer vision? Computer vision is a growing subfield of computer science that enables computers to “see.” But what does it mean to see? Computer vision algorithms specifically enable machines to gain a high-level understanding of digital images and videos. For example, see the following image of a dog:

As humans, through tens of thousands of years of evolution, our visual cortices have become quite robust at identifying objects in images. Thus, it is quite simple for you and me to see that there is a dog in the image above. However, for a computer, it is a difficult problem since it involves the extraction of high-dimensional data from an image. There are three general stages in computer vision. The first step would be to identify domain knowledge. For the image of a dog, that means understanding what a dog is and how it is different from its surroundings. The next step is the definition of features. Features are how we distinguish between dogs and objects that are not dogs. For example, the paws, tail, and fur of the dog are high-level features that would enable the machine to classify the object as being a dog. The final step in a computer vision task would be to detect these features in an image. For example, we can feed in an image of a dog to our machine and the machine would turn the image into a matrix of red-green-blue (RGB) pixel values. The machine would then analyze the matrix of pixel values with a computer vision algorithm to define and detect features. We could then appropriately classify that the picture above is indeed a dog:

Now that we have completed a walk-through of a simple computer vision task, let’s dive into the computer vision components of my project. Given the recent popularity of machine learning, deep learning algorithms have made their way into computer vision and have produced state-of-the-art results. Specifically, convolutional neural networks (a neural network for processing images) have enabled advances in domains such as visual surveillance, autonomous vehicles, medical image analysis, and human assistance. By using a convolutional neural network, we can explore applications beyond classification. In the previous example, we explored how one would tell if the given image was of a dog or a cat, but what if we wanted to answer questions like “Where was the dog in the image?”, “What other objects were in the image?”, and “How one would describe the image?”. Each of these questions falls into a specific sub-task of computer vision. Some of these tasks include Semantic Segmentation, Object Detection, and Image Captioning. Visual examples of these tasks are given in the graphic below:

How Computer Vision connects to my Project

As a student interested in deep learning, computer vision, and its direct applications to a medical environment, a problem that piqued my interest was that of seizure detection and localization. Seizure detection is an ongoing field of study in neurology that aims to create algorithms to identify seizures. Seizure patients are typically tracked by two mediums: EEG signals and video EEG. An EEG signal is a spectrogram of the patient’s brain waves during seizures while a video EEG is a video monitor of the patient’s physical movement. Thus, seizures can be detected through both EEG signals and video EEG. Before beginning my senior project, I spent two summers working on similar video EEG experiments in Dr. Daniel Rubin’s lab at Stanford. I worked with frameworks such as OpenPose (a Pose Estimation library) to track patient movements and also helped to annotate data for video analysis. I specifically used an annotator known as CVAT to identify patients in over 50 existing videos. These videos contain at least 10,000 frames of patient positions. A public example is presented below: 

Typically, these videos are of lower resolution than the example above and are monochrome. In most instances, there is significant equipment (wired electrodes and collodion paste) on the patient’s head including an oxygen monitor, suction machine, and safety rails/pads to keep the patient safe. Now you may be wondering: why is computer vision connected to classifying seizures in these videos of patients? In my project, I aim to use computer vision, specifically object detection algorithms to track the patient. Many state-of-the-art object detection algorithms are publicly available to experiment with and I’ve started to implement these algorithms in a secure Jupyter Notebook environment. Next week, we will dive deeper into how these object detection algorithms work – including Facebook AI’s Detectron2 and DE:TR. I hope you enjoyed!

Image Sources

All credit goes to the authors of the following images:

  • Dog and Classified Dog (TowardsDataScience): Source
  • Segmentation, Object Detection, and Image Captioning (MIT): Source
  • Patient Image (SFGATE): Source

Leave a Reply

Your email address will not be published.