Project Title: The Dark Matter of Genes
BASIS Advisor: Ms. Jefferson
Internship Location: Stanford University
Onsite Mentor: Dr. Gireesh Bogu
Much of the DNA that makes up who we are doesn't actually code for anything. These are called non-coding DNA, also known as “the dark matter of genes.” But what, exactly, do these non-coding DNA do? What is their purpose in our genes? My external advisor has been seeking answers to these questions and my role is to use Python, a computer programming language, to find patterns in the data. One mark of this dark matter is extended repeats of the nucleobases (the A’s, G’s, T’s, and C’s that make up our DNA). I will conduct my research at Stanford University, and my long-term project goal is to find relationships between the repeats and the coding/non-coding portion of the DNA, to better understand their function in our genes and to discover why they exist.
My Posts
week eleven
Hi! Everything is working as it should! Not much has changed from my last post. I’m still running tophat2 and featureCounts. I only have one tiny error with a couple of files, but it’s not too big of a deal. Now I’m just waiting for tophat2 to finish and then I can move on with […]
week ten
Hi everyone! Right now it’s mostly a waiting game. I’m running tophat2 on my samples to map the reads to the human genome. I have two samples I’m working with: A control group and a Crohn’s Disease (test) group. Each run takes a minimum of around six to seven hours, and some have taken more […]
week nine
Hi everyone, I hope you had a fun week since my last post! I fixed that “small error” I mentioned last week. TopHat is super old and has fallen into disrepair. In fact, it’s been replaced by a shiny new software that gets regular maintenance (unlike TopHat). I had originally installed the latest version of […]
week eight
Hi. I got the laptop! I spent the week installing some software onto it. I needed TopHat and BowTie (of course), which seemed pretty straightforward at first. Downloading them was easy but actually installing them was a bit more of a headache. With the help of a nice person and some nice websites, I figured […]
week seven
Hi everyone, this week was all about waiting. See, we managed to get that loaner laptop (yay!) but it came with a condition: Reimage it before using. Since I am not allowed to do that on my own, we were required to leave it in the tech lab for the weekend, so hopefully I’ll be […]
week six
Hi guys! This week I really really started doing stuff. It’s been a bit slow, mainly because of my reading speed with those papers, but real progress is being made here! For the most part, I figured out how to use DESeq and played around with it. It’s what I’ll be using to analyze the data, […]
week five
Hello everyone! This week I began my studies in R. I started by using a paper I found that explained the various functions of R and tested a bunch of commands. My favorite thus far was the hist() function because it literally took my two weeks’ effort of figuring out how to plot a nice […]
week four
Hi! In the beginning of this week, I decided that my program was (1) too slow, and (2) not user-friendly enough. So, I decided to make my program more interactive by adding a “menu” option and putting stuff in functions. This way, the data can be processed just once, manipulated only when needed, and accessed […]
week three
Hi everyone, I finished goal number three from my last post! Quick refresher: In my last post, I mentioned that I had three main goals to fulfill. My third one was to make a histogram representing the frequencies of classes, families, and subfamilies. It was difficult for me to figure out how to do it […]
week two
Hi guys, this week I started coding! I used the Genome Browser to download some data to work with. In truth, it’s a bit more than just “some” data… It’s a whopping 89,513 KB file of data!!1 It took a ridiculously long time to download2 and an even longer time to figure out how to use the […]
week one
Hi guys, I’m going to be working with code and genes! The first week of my senior project consisted of reading research papers to familiarize myself with the terminology and to learn more about genetics. There’s a really cool Genome Browser from UCSC that I’m going to get data from. That’s a sample of what […]