Structural Variation Analysis Algorithms
Date Posted: September 20, 2019
A quick survey finds >100 different algorithms used in structural variation (SV) data analysis. GitHub, a popular development platform which hosts open source software, has 128 different SV related algorithms written in 10 different software languages. omicX, a methodology search engine, has 127 hits in 12 different languages. Further, each commercial SV analysis platform provider has its own set of software tools creating a confused landscape without a system or standard.
A number of recent papers and articles have discussed this issue. This study looked at 69 different SV detection algorithms in an attempt to identify which algorithms or groups of algorithms give the best results. Some researchers, in an effort to control project costs, are trying to optimize algorithms to allow for higher sensitivity using NGS data only like here and here. This article in Nature , appropriately titled Piercing the dark matter: bioinformatics of long range sequencing and mapping surveys the current landscape of bioinformatics methods of several long-read sequencing technologies along with Hi-C and optical mapping (whole genome mapping) to outline challenges and offer areas of opportunities in future work.
This poses quite a challenge for researchers and is exacerbated by older algorithms that can be slow and inaccurate or not necessarily designed for SV detection. For example, existing algorithms optimized for short-read sequencing data don’t necessarily work with long-read sequencing data or whole genome mapping data. New algorithms may only be useful for detecting certain sizes or types of structural variants. Finally, combining algorithms may be required to achieve optimal results as was found in this study. All of this assumes the researcher has access to the bioinformatics skill and computational infrastructure to deploy the software and manage the hardware or cloud environment needed to run the analysis.
No matter the approach, new algorithms are computationally intensive and have significant hardware requirements. This requires investment in expensive hardware that may or may not be proportional to the result. Purchasing expensive servers that are instrument specific may fulfill short-term project needs but may sit idle without the option for reconfiguration for other projects as throughput ebbs. Maintaining and configuring the hardware also detracts resources from performing and running the study – resources that might be better utilized in results analysis and interpretation.
For those that may not have access to bioinformatics expertise or the computational infrastructure needed to run the latest algorithms there is an opportunity for commercial providers to simplify the process. Hitachi High Technologies has several projects including the Human Chromosome Explorer (HCE) for SV analysis from whole genome mapping data. Our goal is to offer researchers validated bioinformatics methods in a simple to use web interface. Because HCE leverages the cloud, researchers are freed from the costly burden of maintaining the advanced hardware required to analyze large datasets. Researchers can focus on the science. Contact us to request a demo of HCE.
Subscribe to our blog to stay up to date on Hitachi High Technologies America’s work in SV.