6 10

Cited 0 times in

Cited 0 times in

Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats

DC Field Value Language
dc.contributor.authorJavadzadeh, Sara-
dc.contributor.authorAdamson, Aaron-
dc.contributor.authorPark, Jonghun-
dc.contributor.authorJo, Se-Young-
dc.contributor.authorDing, Yuan-Chun-
dc.contributor.authorBakhtiari, Mehrdad-
dc.contributor.authorBansal, Vikas-
dc.contributor.authorNeuhausen, Susan L.-
dc.contributor.authorBafna, Vineet-
dc.date.accessioned2025-11-11T07:51:26Z-
dc.date.available2025-11-11T07:51:26Z-
dc.date.created2025-08-05-
dc.date.issued2025-04-
dc.identifier.issn1553-734X-
dc.identifier.urihttps://ir.ymlib.yonsei.ac.kr/handle/22282913/208649-
dc.description.abstractVariable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for accurate genotyping. Pacific Biosciences HiFi sequencing spans large regions and is highly accurate but relatively expensive. Therefore, targeted sequencing approaches coupled with long-read sequencing have been proposed to improve efficiency and throughput. In this paper, we systematically explored the trade-off between targeted and whole genome HiFi sequencing for genotyping VNTRs. We curated a set of 10 , 787 gene-proximal (G-)VNTRs, and 48 phenotype-associated (P-)VNTRs of interest. Illumina reads only spanned 46% of the G-VNTRs and 71% of P-VNTRs, motivating the use of HiFi sequencing. We performed targeted sequencing with hybridization by designing custom probes for 9,999 VNTRs and sequenced 8 samples using HiFi and Illumina sequencing, followed by adVNTR genotyping. We compared these results against HiFi whole genome sequencing (WGS) data from 28 samples in the Human Pangenome Reference Consortium (HPRC). With the targeted approach only 4,091 (41%) G-VNTRs and only 4 (8%) of P-VNTRs were spanned with at least 15 reads. A smaller subset of 3,579 (36%) G-VNTRs had higher median coverage of at least 63 spanning reads. The spanning behavior was consistent across all 8 samples. Among 5,638 VNTRs with low-coverage ( < 15), 67% were located within GC-rich regions ( > 60%). In contrast, the 40X WGS HiFi dataset spanned 98% of all VNTRs and 49 (98%) of P-VNTRs with at least 15 spanning reads, albeit with lower coverage. Spanning reads were sufficient for accurate genotyping in both cases. Our findings demonstrate that targeted sequencing provides consistently high coverage for a small subset of low-GC VNTRs, but WGS is more effective for broad and sufficient sampling of a large number of VNTRs.-
dc.languageEnglish-
dc.publisherPublic Library of Science-
dc.relation.isPartOfPLOS COMPUTATIONAL BIOLOGY-
dc.relation.isPartOfPLOS COMPUTATIONAL BIOLOGY-
dc.subject.MESHBase Composition-
dc.subject.MESHGenome, Human*-
dc.subject.MESHGenotype-
dc.subject.MESHHumans-
dc.subject.MESHMinisatellite Repeats*-
dc.subject.MESHPhenotype-
dc.subject.MESHPolymorphism, Genetic-
dc.subject.MESHWhole Genome Sequencing* / methods-
dc.titleAnalysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats-
dc.typeArticle-
dc.contributor.googleauthorJavadzadeh, Sara-
dc.contributor.googleauthorAdamson, Aaron-
dc.contributor.googleauthorPark, Jonghun-
dc.contributor.googleauthorJo, Se-Young-
dc.contributor.googleauthorDing, Yuan-Chun-
dc.contributor.googleauthorBakhtiari, Mehrdad-
dc.contributor.googleauthorBansal, Vikas-
dc.contributor.googleauthorNeuhausen, Susan L.-
dc.contributor.googleauthorBafna, Vineet-
dc.identifier.doi10.1371/journal.pcbi.1012885-
dc.relation.journalcodeJ02537-
dc.identifier.eissn1553-7358-
dc.identifier.pmid40193344-
dc.subject.keywordGene Encoding-
dc.subject.keywordGenome-
dc.subject.keywordComplex Phenotype-
dc.subject.keywordGenetic Variation-
dc.subject.keywordGenotyping-
dc.subject.keywordHighly Accurate-
dc.subject.keywordIllumina-
dc.subject.keywordLarge Regions-
dc.subject.keywordRepetitive Structure-
dc.subject.keywordTandem Repeats-
dc.subject.keywordVariable Number-
dc.subject.keywordWhole Genome Sequencing-
dc.subject.keywordPhytochemistry-
dc.subject.keywordArticle-
dc.subject.keywordControlled Study-
dc.subject.keywordDiagnosis-
dc.subject.keywordGc Rich Sequence-
dc.subject.keywordGene-
dc.subject.keywordGenetic Variation-
dc.subject.keywordGenome-
dc.subject.keywordGenotype-
dc.subject.keywordGenotyping-
dc.subject.keywordHuman-
dc.subject.keywordIllumina Sequencing-
dc.subject.keywordPangenome-
dc.subject.keywordPhenotype-
dc.subject.keywordVariable Number Of Tandem Repeat-
dc.subject.keywordWhole Genome Sequencing-
dc.subject.keywordBioinformatics-
dc.subject.keywordDna Sequencing-
dc.subject.keywordGenetics-
dc.subject.keywordHigh Throughput Sequencing-
dc.subject.keywordHuman Genome-
dc.subject.keywordProcedures-
dc.subject.keywordComputational Biology-
dc.subject.keywordGenome, Human-
dc.subject.keywordGenotype-
dc.subject.keywordGenotyping Techniques-
dc.subject.keywordHigh-throughput Nucleotide Sequencing-
dc.subject.keywordHumans-
dc.subject.keywordMinisatellite Repeats-
dc.subject.keywordPhenotype-
dc.subject.keywordSequence Analysis, Dna-
dc.subject.keywordWhole Genome Sequencing-
dc.contributor.affiliatedAuthorJo, Se-Young-
dc.identifier.scopusid2-s2.0-105002113859-
dc.identifier.wosid001462309200006-
dc.citation.volume21-
dc.citation.number4-
dc.identifier.bibliographicCitationPLOS COMPUTATIONAL BIOLOGY, Vol.21(4), 2025-04-
dc.identifier.rimsid88434-
dc.type.rimsART-
dc.description.journalClass1-
dc.description.journalClass1-
dc.subject.keywordAuthorGene Encoding-
dc.subject.keywordAuthorGenome-
dc.subject.keywordAuthorComplex Phenotype-
dc.subject.keywordAuthorGenetic Variation-
dc.subject.keywordAuthorGenotyping-
dc.subject.keywordAuthorHighly Accurate-
dc.subject.keywordAuthorIllumina-
dc.subject.keywordAuthorLarge Regions-
dc.subject.keywordAuthorRepetitive Structure-
dc.subject.keywordAuthorTandem Repeats-
dc.subject.keywordAuthorVariable Number-
dc.subject.keywordAuthorWhole Genome Sequencing-
dc.subject.keywordAuthorPhytochemistry-
dc.subject.keywordAuthorArticle-
dc.subject.keywordAuthorControlled Study-
dc.subject.keywordAuthorDiagnosis-
dc.subject.keywordAuthorGc Rich Sequence-
dc.subject.keywordAuthorGene-
dc.subject.keywordAuthorGenetic Variation-
dc.subject.keywordAuthorGenome-
dc.subject.keywordAuthorGenotype-
dc.subject.keywordAuthorGenotyping-
dc.subject.keywordAuthorHuman-
dc.subject.keywordAuthorIllumina Sequencing-
dc.subject.keywordAuthorPangenome-
dc.subject.keywordAuthorPhenotype-
dc.subject.keywordAuthorVariable Number Of Tandem Repeat-
dc.subject.keywordAuthorWhole Genome Sequencing-
dc.subject.keywordAuthorBioinformatics-
dc.subject.keywordAuthorDna Sequencing-
dc.subject.keywordAuthorGenetics-
dc.subject.keywordAuthorHigh Throughput Sequencing-
dc.subject.keywordAuthorHuman Genome-
dc.subject.keywordAuthorProcedures-
dc.subject.keywordAuthorComputational Biology-
dc.subject.keywordAuthorGenome, Human-
dc.subject.keywordAuthorGenotype-
dc.subject.keywordAuthorGenotyping Techniques-
dc.subject.keywordAuthorHigh-throughput Nucleotide Sequencing-
dc.subject.keywordAuthorHumans-
dc.subject.keywordAuthorMinisatellite Repeats-
dc.subject.keywordAuthorPhenotype-
dc.subject.keywordAuthorSequence Analysis, Dna-
dc.subject.keywordAuthorWhole Genome Sequencing-
dc.subject.keywordPlusSEROTONIN TRANSPORTER GENE-
dc.subject.keywordPlusVNTR POLYMORPHISM-
dc.subject.keywordPlusEXPANSION-
dc.subject.keywordPlusPROMOTER-
dc.subject.keywordPlusMUTATIONS-
dc.subject.keywordPlusDYSTROPHY-
dc.subject.keywordPlusDISORDER-
dc.subject.keywordPlusCAPTURE-
dc.subject.keywordPlusLINKAGE-
dc.subject.keywordPlusREGION-
dc.type.docTypeArticle-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalWebOfScienceCategoryBiochemical Research Methods-
dc.relation.journalWebOfScienceCategoryMathematical & Computational Biology-
dc.relation.journalResearchAreaBiochemistry & Molecular Biology-
dc.relation.journalResearchAreaMathematical & Computational Biology-
dc.identifier.articlenoe1012885-
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.