Medical Journals

Predicting Dna-binding Sites of Proteins from Amino Acid Sequence.

Authors:
  • Yan Changhui
  • Terribilini Michael
  • Wu Feihong
  • Jernigan Robert L
  • Dobbs Drena
  • Honavar Vasant

From: Department of Computer Science, Utah State University, Logan, Utah 84341, USA. cyan@cc.usu.edu

BMC bioinformatics

  • Publish Date: 2006
  • ISSN: 1471-2105
  • Volume: 7
  • Issue:
  • Pages: 262
  • Medium: Internet
  • Language: English
  • Citation (JAMA): Yan Changhui, Terribilini Michael, Wu Feihong, et al. Predicting Dna-binding Sites of Proteins from Amino Acid Sequence.. BMC Bioinformatics 2006;7:262

Abstract

BACKGROUND: Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions. RESULTS: We start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions. CONCLUSION: Naïve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.

Mesh Headings (Keywords): Algorithms, Bayes Theorem, Binding Sites, Computational Biology, Databases, Protein, Entropy, Genetic Techniques, Humans, Models, Molecular, ROC Curve, Reproducibility of Results, Sensitivity and Specificity, Sequence Analysis, DNA, Sequence Analysis, Protein, Software


Check for Full Text / PubMed Unique Identifier (PMID): 16712732


This abstract is part of PubMed, a service of the U.S. National Library of Medicine. PubMed includes more than 17 million citations from MEDLINE and other life science journals for biomedical articles. See Copyright and Disclaimers.

Linked medical terms appearing on this page are added by Healia to help readers find more information and are not part of the original PubMed document.

The data herein was last updated on July 8th, 2008 and may not reflect the most current and accurate data available from NLM.


Advertisements

About | Privacy Policy | Business Solutions | Advertise | Contact | Add Healia to your site

©2012. Healia / Meredith Corporation  

Use of this site constitutes acceptance of our Terms of Service and Privacy Policy. All content on this Web site, including medical opinion and any other health-related information, is for informational purposes only and should not be used for a specific diagnosis or individual treatment plan for any situation. Use of this site and the information contained herein does not create a doctor-patient relationship. Always seek the direct advice of your doctor in connection with any questions or issues you may have regarding your own health or the health of others.