6 10

Cited 1 times in

Cited 0 times in

Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats

Authors
 Javadzadeh, Sara  ;  Adamson, Aaron  ;  Park, Jonghun  ;  Jo, Se-Young  ;  Ding, Yuan-Chun  ;  Bakhtiari, Mehrdad  ;  Bansal, Vikas  ;  Neuhausen, Susan L.  ;  Bafna, Vineet 
Citation
 PLOS COMPUTATIONAL BIOLOGY, Vol.21(4), 2025-04 
Article Number
 e1012885 
Journal Title
PLOS COMPUTATIONAL BIOLOGY
ISSN
 1553-734X 
Issue Date
2025-04
MeSH
Base Composition ; Genome, Human* ; Genotype ; Humans ; Minisatellite Repeats* ; Phenotype ; Polymorphism, Genetic ; Whole Genome Sequencing* / methods
Keywords
Gene Encoding ; Genome ; Complex Phenotype ; Genetic Variation ; Genotyping ; Highly Accurate ; Illumina ; Large Regions ; Repetitive Structure ; Tandem Repeats ; Variable Number ; Whole Genome Sequencing ; Phytochemistry ; Article ; Controlled Study ; Diagnosis ; Gc Rich Sequence ; Gene ; Genetic Variation ; Genome ; Genotype ; Genotyping ; Human ; Illumina Sequencing ; Pangenome ; Phenotype ; Variable Number Of Tandem Repeat ; Whole Genome Sequencing ; Bioinformatics ; Dna Sequencing ; Genetics ; High Throughput Sequencing ; Human Genome ; Procedures ; Computational Biology ; Genome, Human ; Genotype ; Genotyping Techniques ; High-throughput Nucleotide Sequencing ; Humans ; Minisatellite Repeats ; Phenotype ; Sequence Analysis, Dna ; Whole Genome Sequencing
Abstract
Variable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for accurate genotyping. Pacific Biosciences HiFi sequencing spans large regions and is highly accurate but relatively expensive. Therefore, targeted sequencing approaches coupled with long-read sequencing have been proposed to improve efficiency and throughput. In this paper, we systematically explored the trade-off between targeted and whole genome HiFi sequencing for genotyping VNTRs. We curated a set of 10 , 787 gene-proximal (G-)VNTRs, and 48 phenotype-associated (P-)VNTRs of interest. Illumina reads only spanned 46% of the G-VNTRs and 71% of P-VNTRs, motivating the use of HiFi sequencing. We performed targeted sequencing with hybridization by designing custom probes for 9,999 VNTRs and sequenced 8 samples using HiFi and Illumina sequencing, followed by adVNTR genotyping. We compared these results against HiFi whole genome sequencing (WGS) data from 28 samples in the Human Pangenome Reference Consortium (HPRC). With the targeted approach only 4,091 (41%) G-VNTRs and only 4 (8%) of P-VNTRs were spanned with at least 15 reads. A smaller subset of 3,579 (36%) G-VNTRs had higher median coverage of at least 63 spanning reads. The spanning behavior was consistent across all 8 samples. Among 5,638 VNTRs with low-coverage ( < 15), 67% were located within GC-rich regions ( > 60%). In contrast, the 40X WGS HiFi dataset spanned 98% of all VNTRs and 49 (98%) of P-VNTRs with at least 15 spanning reads, albeit with lower coverage. Spanning reads were sufficient for accurate genotyping in both cases. Our findings demonstrate that targeted sequencing provides consistently high coverage for a small subset of low-GC VNTRs, but WGS is more effective for broad and sufficient sampling of a large number of VNTRs.
Files in This Item:
88434.pdf Download
DOI
10.1371/journal.pcbi.1012885
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers
URI
https://ir.ymlib.yonsei.ac.kr/handle/22282913/208649
사서에게 알리기
  feedback

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse

Links