Workflow

One typical approach to detect splicing variants involves utilizing both genomic and transcriptomic data, followed by conducting statistical analyses to correlate genomic variations with changes in the transcriptome (Jung et al., Nature Genetics, 2015; Shiraishi et al., Genome Research, 2018; PCAWG Transcriptome Core Group et al., Nature, 2020). However, this approach requires access to both genome and transcriptome sequencing data, which is often only available in well-coordinated projects like TCGA. Contrastingly, the Sequence Read Archive hosts an extensive collection of publicly accessible transcriptome sequence data, numbering in the hundreds of thousands.

Figure 1: Example of alignment status around SSCVs by the depiction of Integrative Genomics Viewer for the SSCVs in SDHA (ENST00000264932.11: c.1751C>T) (left) and in PRKAR1A (ENST00000589228.6: c.892-129C>G) (right). We can observe mismatch bases corresponding to the SSCVs (Iida et al., bioRxiv, 2024).

The key observation behind the development of juncmut was "the frequent observation of mismatch bases, indicative of SSCVs, in the short reads of transcriptome sequence data" (Figure 1). This might initially seem contradictory, as one would expect SSCVs, particularly those on the intronic side of a new splice-site, to be spliced out and hence not present in the short reads. Nevertheless, due to the variability in splicing effects and the incomplete dominance of SSCVs (attributable to competition with pre-existing splice-sites), it appears that transcriptome data often exhibits variants supportive of SSCVs in many instances. The overview of juncmut is depicted in the Figure 2.

Figure 2: The overview of juncmut procedures (Iida et al., bioRxiv, 2024).

Main Contributer

Ai Okada
Yuichi Shiraishi

Notes

Although we conducted screenings of SSCVs using both TCGA and SRA transcriptome data, we chose to exclude those from TCGA. The current algorithm is unable to distinguish between germline and somatic variants, and there is a concern that it might inadvertently lead to the identification of individuals.

Database version and date

ClinVar: 2022-12-11
Cancer Gene Census: 2022-11-29
ACMG SF Gene: v3.2
ClinGen Dosage Sensitivity: 05 Jan, 2023