Sequence based approaches to structural variation analysis

Date Posted: September 6, 2019
Sequence based approaches to structural variation analysis

Historically, structural variation (SV) analysis has been performed using traditional cytogenetic techniques. But due to low cost, easy access and base-pair resolution much SV analysis today is done using short-read, high throughput sequencing, or next generation sequencing (NGS) data. Although some types of SV analysis can be effective using NGS data, overall it can be subject to low sensitivity and a high rate of false positives. This is due to the fact that NGS data is comprised of read lengths from 300-700 bps and structural variants are typically larger than 1kb. NGS is also problematic when resolving GC-rich regions of the genome or areas with multiple repeats such as found in a particular type of SV called copy number variation (CNV). This means NGS is missing critical SV calls. Some of these challenges can be overcome by increasing sequencing coverage as recommended in this study or and combining SV analysis methods but both of these approaches increase costs.

There are methods to improve short read sequencing approaches to SV analysis such as 10x Genomics’ linked-reads and strand-specific sequencing (strand-seq) but they come with a higher cost. Linked-reads are short-reads with a unique barcode added. All reads from the same molecule share the same barcode allowing for some longer-range information to be inferred from the short reads improving haplotype resolution and de novo assembly along with structural variant detection. Strand-seq uses a modified NGS protocol to restrict sequencing to a specific strand of DNA. The process which can take up to 4 days to complete improves haplotyping, assembly and SV analysis compared to standard NGS.

Another high throughput sequencing based approach to SV analysis is a method of chromosome conformation capture (3C) technique called Hi-C. In Hi-C cells are fixed with formaldehyde which effectively preserves chromatin contacts points through covalent DNA-protein binding. After extraction, ligation and digestion the fragments retaining the contact points are sequenced. Genome wide chromatin contacts are mapped allowing the detection of some long-range interactions contributing to SV. Similar to linked-reads and strand-seq, Hi-C increases the cost of SV analysis significantly.

Advances in Long-read sequencing (LRS), also called 3rd generation sequencing (3GS), are promising. LRS platforms can generate long reads that are kilobases in length with Oxford Nanopore Technologies platforms reportedly generating reads up to 2MB. This is a marked improvement compared to NGS short-reads and potentially solves some of the challenges short reads pose, particularly when assaying repetitive regions. However, the current generation of LRS platforms are subject to significantly higher error rates and increased costs compared to NGS.

In summary sequencing-based methods of SV analysis offer better resolution than cytogenetic approaches but the high cost and error rates mean a gap still exists. Click here to learn how whole genome mapping (WGM) offers the promise of closing the current gap between cytogenetics and genomics in SV analysis.

Subscribe to our blog to stay up to date on Hitachi High Technologies America’s work in SV.

Subscribe to our Blog!