Combining structural variation analysis platforms for better results

Date Posted: September 13, 2019
Combining structural variation analysis platforms for better results

Currently most structural variation (SV) analysis is still being done using short-read next-generation sequence (NGS) data and common analysis algorithms. However, structural variants are typically longer than 1kb and in many cases contain multiple repeats. NGS sequence data are comprised of short reads around 300-700 bps and depending on the study, leaves a large amount of SV undetected. A 2019 article in Nature by Chaisson et al. showed that because of the reliance on NGS and standard algorithms up to 100,000 variants per sample are being missed.

Several recent studies have shown that combining SV analysis methods improves variant detecting capabilities. In a 2018 article published in Nature, researchers combined whole genome mapping (optical mapping) , Hi-C and whole genome sequencing to detect SV in normal and cancer cells. The study found that although each method had its own benefits, “only integrative approaches can comprehensively identify SVs in the genome”. This study found the combinatorial approach to SV analysis was effective in resolving more complex SV and phasing SV to the correct haplotype. Perhaps the most interesting finding in the study was the indication that “noncoding SVs may be underappreciated mutational drivers in cancer genomes.”

A 2019 paper published in BioRxiv on leukemia patient samples combined optical mapping and whole-genome sequencing to discover a large amount of SV that was not detected with other methods. In fact, the study found a 5-fold increase in somatic rearrangements compared to what had previously been reported with 100s of newly discovered insertions and deletions. SV was found in “a number of leukemia associated genes as well as cancer driver genes not previously associated with leukemia and genes not previously associated with cancer.” Similar to the above-mentioned Nature study, SV in intergenic regions that affects the expression of neighboring genes was also found. The authors conclude that their “results suggest that current genomic analysis methods fail to identify a majority of structural variants in leukemia samples and this lacunae (sic) may hamper diagnostic and prognostic efforts.”

Perhaps the mother of all studies (to date) is the one referenced above by Chaisson et al. This was a large multi-national effort of the Human Genome Structural Variation Consortium (HGSVC) with teams in both academia and industry that looked at the most effective combinations of 10 different SV analysis methods using 3 parent-child trios, the parents of which were part of the 1000 Genomes project. The study is impressive (and dense) and was attempting to establish a gold standard for researchers with regard to maximizing structural variation sensitivity. Similar to the above studies, although on a much larger scope, it was found that single method approaches to SV analysis, particularly NGS based approaches, are missing a significant amount of SV calls.

More and more research is showing that when using NGS alone a significant amount of SV is not detected and the use of an orthogonal platform like whole genome mapping improves results. Yet much SV research is still performed utilizing standalone sequence-based methods. Click here to learn more about whole genome mapping and Hitachi’s advanced cloud based structural variation analysis tools as both an orthogonal platform to complement NGS as well as a standalone replacement for traditional cyto and molecular cytogenetics platforms.

Subscribe to our blog to stay up to date on Hitachi High Technologies America’s work in SV.

Subscribe to our Blog!