• Project Title: Elucidating Nanopore-Based Long-Read Sequencing Limitations By Computationally Investigating RNA Sequence and Structure Level Features

  • BASIS Advisor: Swetha Bhattacharya

  • Internship Location: Stony Brook University

  • Onsite Mentor: Robert Patro

The emergence of nanopore-based long-read sequencing technologies has presented a solution to the limitations of short-read sequencing, eliminating the time consuming and costly need for DNA amplification and synthesis by allowing for direct DNA and RNA sequencing. Despite its advantages, the problem of read truncation when characterizing complex transcriptomes, such as the human transcriptome, presents an impediment to sequencing throughput and the reliability of long-read sequencing technology, thus limiting the possible applications and potential impacts of the technology. To investigate the cause of read truncation, we analyzed polyA+ RNA reads and transcripts from two human cell lines by performing sequence and structure level analysis on the reads and looking specifically at the reference sequence window around the start positions of the reads. Our findings show that read truncation is more prevalent in longer transcripts above 5,000 nucleotides in length. Further, on the sequence level, truncated reads exhibit decreased guanine-cytosine (GC)-content around their start positions, compared to the GC-rich start positions of full-length reads, which indicates decreased thermostability and reduced flexibility among the truncated reads. When looking at the extracted start sequences from a structural level, we found that truncated reads have a less negative (increased) minimum free energy (MFE), suggesting decreased structural stability within the RNA molecule. Our study concludes that sequences that are less stable, especially near their start positions, are more prone to read truncation during sequencing, and future studies on methods to stabilize these molecules during nanopore-based sequencing are needed to improve yields of full-length molecules.