308 406

Cited 9 times in

Cataloging coding sequence variations in human genome databases

Authors
 Hong-Hee Won  ;  Hee-Jin Kim  ;  Kyung-A Lee  ;  Jong-Won Kim 
Citation
 PLOS ONE, Vol.3(10) : e3575, 2008 
Journal Title
PLOS ONE
Issue Date
2008
MeSH
Amino Acid Substitution/genetics ; Computational Biology/methods ; Databases, Genetic*/classification ; Gene Frequency ; Genetic Variation* ; Genome, Human* ; Humans ; Mutation ; Open Reading Frames/genetics* ; Polymorphism, Single Nucleotide ; Sequence Analysis, DNA ; Software
Keywords
Amino Acid Substitution/genetics ; Computational Biology/methods ; Databases, Genetic*/classification ; Gene Frequency ; Genetic Variation* ; Genome, Human* ; Humans ; Mutation ; Open Reading Frames/genetics* ; Polymorphism, Single Nucleotide ; Sequence Analysis, DNA ; Software
Abstract
BACKGROUND: With the recent growth of information on sequence variations in the human genome, predictions regarding the functional effects and relevance to disease phenotypes of coding sequence variations are becoming increasingly important. The aims of this study were to catalog protein-coding sequence variations (CVs) occurring in genetic variation databases and to use bioinformatic programs to analyze CVs. In addition, we aim to provide insight into the functionality of the reference databases.

METHODOLOGY AND FINDINGS: To catalog CVs on a genome-wide scale with regard to protein function and disease, we investigated three representative databases; the Human Gene Mutation Database (HGMD), the Single Nucleotide Polymorphisms database (dbSNP), and the Haplotype Map (HapMap). Using these three databases, we analyzed CVs at the protein function level with bioinformatic programs. We proposed a combinatorial approach using the Support Vector Machine (SVM) to increase the performance of the prediction programs. By cataloging the coding sequence variations using these databases, we found that 4.36% of CVs from HGMD are concurrently registered in dbSNP (8.11% of CVs from dbSNP are concurrent in HGMD). The pattern of substitutions and functional consequences predicted by three bioinformatic programs was significantly different among concurrent CVs, and CVs occurring solely in HGMD or in dbSNP. The experimental results showed that the proposed SVM combination noticeably outperformed the individual prediction programs.

CONCLUSIONS: This is the first study to compare human sequence variations in HGMD, dbSNP and HapMap at the genome-wide level. We found that a significant proportion of CVs in HGMD and dbSNP overlap, and we emphasize the need to use caution when interpreting the phenotypic relevance of these concurrent CVs. Combining bioinformatic programs can be helpful in predicting the functional consequences of CVs because it improved the performance of functional predictions.
Files in This Item:
T200801527.pdf Download
DOI
10.1371/journal.pone.0003575
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Laboratory Medicine (진단검사의학교실) > 1. Journal Papers
Yonsei Authors
Lee, Kyung A(이경아) ORCID logo https://orcid.org/0000-0001-5320-6705
URI
https://ir.ymlib.yonsei.ac.kr/handle/22282913/107850
사서에게 알리기
  feedback

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse

Links