8- Data for Dummies

Apr 19, 2019

This week I read a lot about PCA plots and decided to create one for the thyroid tissue dataset within GTEx. Basically, PCA plots serve to demonstrate variation within a dataset, between different samples. They also help to cluster similar data together.

For the thyroid data I used, I separated the samples into “normal” phenotype samples, and samples with thyroid related autoimmune disease based on the associated clinical data. Then, I processed the data and created this:


The blue samples represent “normal” thyroid tissue, and reds represent samples with autoimmune disorder. The PCA plot serves to represent variation in the data from multiple dimensions. What I saw here is that the thyroid samples with autoimmune disease do cluster slightly away from the normal samples, but what it actually looks like is that the thyroid samples as a whole have two separate clusters.

I’ll be reading more about PCA and what tools I can use to identify the clusters properly next week. Hopefully I’ll be able to explain it in a less vague way soon :). I’ll also try to pull out the sources of variation in this graph as well, and see what’s driving the clusters.

2 Replies to “8- Data for Dummies”

  1. Eva P. says:

    That sounds cool Shreya! I’m kind of wondering why the two clusters overlap so much if one has the disease and one doesn’t? Maybe I’m misunderstanding, but I thought they would be farther apart?

  2. Cindy K. says:

    The graph looks exciting! I’m curious about the two groups of “normal” thyroid tissue and I can’t wait to hear more about what you find!

Leave a Reply

Your email address will not be published. Required fields are marked *