Medical Journals

Improved Benchmarks for Computational Motif Discovery.

Authors:
  • Sandve Geir Kjetil
  • Abul Osman
  • Walseng Vegard
  • Drabløs Finn

From: Department of Computer and Information Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway. sandve@ntnu.no

BMC bioinformatics

  • Publish Date: 2007
  • ISSN: 1471-2105
  • Volume: 8
  • Issue:
  • Pages: 193
  • Medium: Internet
  • Language: English
  • Citation (JAMA): Sandve Geir Kjetil, Abul Osman, Walseng Vegard, et al. Improved Benchmarks for Computational Motif Discovery.. BMC Bioinformatics 2007;8:193

Abstract

BACKGROUND: An important step in annotation of sequenced genomes is the identification of transcription factor binding sites. More than a hundred different computational methods have been proposed, and it is difficult to make an informed choice. Therefore, robust assessment of motif discovery methods becomes important, both for validation of existing tools and for identification of promising directions for future research. RESULTS: We use a machine learning perspective to analyze collections of transcription factors with known binding sites. Algorithms are presented for finding position weight matrices (PWMs), IUPAC-type motifs and mismatch motifs with optimal discrimination of binding sites from remaining sequence. We show that for many data sets in a recently proposed benchmark suite for motif discovery, none of the common motif models can accurately discriminate the binding sites from remaining sequence. This may obscure the distinction between the potential performance of the motif discovery tool itself versus the intrinsic complexity of the problem we are trying to solve. Synthetic data sets may avoid this problem, but we show on some previously proposed benchmarks that there may be a strong bias towards a presupposed motif model. We also propose a new approach to benchmark data set construction. This approach is based on collections of binding site fragments that are ranked according to the optimal level of discrimination achieved with our algorithms. This allows us to select subsets with specific properties. We present one benchmark suite with data sets that allow good discrimination between positive and negative instances with the common motif models. These data sets are suitable for evaluating algorithms for motif discovery that rely on these models. We present another benchmark suite where PWM, IUPAC and mismatch motif models are not able to discriminate reliably between positive and negative instances. This suite could be used for evaluating more powerful motif models. CONCLUSION: Our improved benchmark suites have been designed to differentiate between the performance of motif discovery algorithms and the power of motif models. We provide a web server where users can download our benchmark suites, submit predictions and visualize scores on the benchmarks.

Mesh Headings (Keywords): Algorithms, Amino Acid Motifs, Base Sequence, Benchmarking, Binding Sites, Chromosome Mapping, Molecular Sequence Data, Protein Binding, Sequence Alignment, Sequence Analysis, DNA, Software Validation, Transcription Factors


Check for Full Text / PubMed Unique Identifier (PMID): 17559676


This abstract is part of PubMed, a service of the U.S. National Library of Medicine. PubMed includes more than 17 million citations from MEDLINE and other life science journals for biomedical articles. See Copyright and Disclaimers.

Linked medical terms appearing on this page are added by Healia to help readers find more information and are not part of the original PubMed document.

The data herein was last updated on July 8th, 2008 and may not reflect the most current and accurate data available from NLM.


Advertisements

About | Privacy Policy | Business Solutions | Advertise | Contact | Add Healia to your site

©2012. Healia / Meredith Corporation  

Use of this site constitutes acceptance of our Terms of Service and Privacy Policy. All content on this Web site, including medical opinion and any other health-related information, is for informational purposes only and should not be used for a specific diagnosis or individual treatment plan for any situation. Use of this site and the information contained herein does not create a doctor-patient relationship. Always seek the direct advice of your doctor in connection with any questions or issues you may have regarding your own health or the health of others.