Detecting Alternatively Spliced Transcript Isoforms from Single‐Molecule Long‐Read Sequences without a Reference Genome
The International Journal of Health Planning and Management
Published online on March 18, 2017
Abstract
Alternative splicing (AS) is a major source of transcript and proteome diversity, but examining AS in species without well‐annotated reference genomes remains difficult. Research on both human and mouse has demonstrated the advantages of using Iso‐Seq™ data for isoform‐level transcriptome analysis, including the study of AS and gene fusion. We applied Iso‐Seq™ to investigate AS in Amborella trichopoda, a phylogenetically pivotal species that is sister to all other living angiosperms. Our data show that, compared with RNA‐Seq data, the Iso‐Seq™ platform provides better recovery on large transcripts, new gene locus identification, and gene model correction. Reference‐based AS detection with Iso‐Seq™ data identifies AS within a higher fraction of multi‐exonic genes than observed for published RNA‐Seq analysis (45.8% vs. 37.5%). These data demonstrate that the Iso‐Seq™ approach is useful for detecting AS events. Using the Iso‐Seq‐defined transcript collection in Amborella as a reference, we further describe a pipeline for detection of AS isoforms from PacBio Iso‐Seq™ without using a reference sequence (de novo). Results using this pipeline show a 66‐76% overall success rate in identifying AS events. This de novo AS detection pipeline provides a method to accurately characterize and identify bona fide alternatively spliced transcripts in any non‐model system that lacks a reference genome sequence. Hence, our pipeline has huge potential applications and benefits to the broader biology community.
This article is protected by copyright. All rights reserved.