This week I read a lot about PCA plots and decided to create one for the thyroid tissue dataset within GTEx. Basically, PCA plots serve to demonstrate variation within a dataset, between different samples. They also help to cluster similar data together.
For the thyroid data I used, I separated the samples into “normal” phenotype samples, and samples with thyroid related autoimmune disease based on the associated clinical data. Then, I processed the data and created this:
The blue samples represent “normal” thyroid tissue, and reds represent samples with autoimmune disorder. The PCA plot serves to represent variation in the data from multiple dimensions. What I saw here is that the thyroid samples with autoimmune disease do cluster slightly away from the normal samples, but what it actually looks like is that the thyroid samples as a whole have two separate clusters.
I’ll be reading more about PCA and what tools I can use to identify the clusters properly next week. Hopefully I’ll be able to explain it in a less vague way soon :). I’ll also try to pull out the sources of variation in this graph as well, and see what’s driving the clusters.