• Project Title: "All Data is Good Data- Or Not" An Analysis Within the GTEx Consortium

  • BASIS Advisor: Mr. Thomas

  • Internship Location: Stanford SCGPM

  • Onsite Mentor: Mr. Akshay Sanghi

In the world of complex genetic analysis, databases containing gene expression data are invaluable to the progression of modern medical research. One such database is the GTEx Database, which contains “normal” genotypic data from thousands of patients, based on their tissue of origin. This database is widely used for big data analyses involving DNA and RNA data in order to study tissue-specific genetic variations. However, phenotypic data associated with GTEx uncovers the fact that not all patients recorded are of a “normal” phenotype. For my project, I will work with the Stanford Center for Genomics and Personalized Medicine to elucidate whether or not the GTEX (Genotype-Tissue Expression) Dataset truly represents the gene expression of normal tissues, through statistical approaches to gene-expression data. After a complete analysis involving the 28 tissues present in this dataset and the expression of patients in the phenotypic data, we will be able to better understand the composition of this database and its contents. This will help further validate studies which use this dataset and allow us to better use this data in future projects. Uncovering whether the GTEX dataset is truly normal will help ensure that studies involving its data are more accurate in the future, and allow comparisons to be made involving the diseased tissue within the dataset.