CASIMIR Database Description Framework Criteria

The explosion of biological data and the concomitant proliferation of databases in recent years leaves biologists and bioinformaticians needing to do significant work to discover the best data sources to fit their needs, and the best way to access and use them. Whilst there is an increasing richness in the quality of databases and a rapid acceleration in uptake of syntactic and semantic standards for interoperability, users often find it difficult to discover which databases are most useful to them, and which support the standards and interfaces they need to use. In response to this problem we propose a formal framework to capture key technical data about a database which can provide the required information to potential users through publication on web sites and in the literature. We suggest that widespread uptake of this simple standard will support the development of on-line resource discovery, uptake and full utilisation.

Quality and Consistency

Description: Refers to the curatorial process undertaken by the particular database or resource

 — No explicit process for assuring consistency
 — Process for assuring consistency with manual curation
 — Process for assuring consistency, automatic curation only


Description: Refers to the frequency that a database or resource is updated

 — Closed legacy database
 — Updates or versions more than once a month
 — Updates or versions more than once a year


Description: Refers to whether the resource is queryable programmatically, data is downloadable or just web-based queries are possible

 — Access via browser and database reports or database dumps
 — Access via browser and programmatic access (well defined API, SQL access or web services)
 — Access via browser only


Description: Refers to whether the resource provides data output in standard formats such as FASTA, XML, SBML

 — HTML or similar to browser and rich standard file formats e.g. XML, SBML (Systems Biology Markup Language)
 — HTML or similar to browser and sparse standard file formats e.g. FASTA
 — HTML or similar to browser only

Technical documentation

Description: Refers to whether the resource provides detailed technical documentation

 — Written text
 — Written text and formal structured description and tutorials or demonstrations on how to use them
 — Written text and formal structured description e.g. automatically generated API docs (JavaDoc), DDL (Data Description Language), DTD (Document Type Definition), UML (Unified Modelling Language) etc

Data representation standards

Description: Refers to creation and/or use of structured vocabularies and/or minimum standards from the resource

 — Data coded by local formalism only
 — General use of both recognised vocabularies or ontologies, and minimal information standards (MIBBI)
 — Some data coded by a recognised controlled vocabulary, ontology or use of minimal information standards (MIBBI)

Data structure standards

Description: Refers to whether the resource structures its data using a formal model such as an XML schema or a recognized standard data model such as FUGE

 — Data structured with formal model e.g. an XML schema
 — Data structured with local model only
 — Use of recognised standard model e.g. FUGE

User support

Description: Refers to the user support methods as offered by a resource to its user community

 — User documentation and Email/web form help desk function
 — User documentation as well as a personal contact help desk function/training
 — User documentation only

Version Control

Description: Refers to whether the resource provides previous versions of the data or the application itself and tracking between them

 — No provision
 — Previous version of database available and tracking of entities between versions
 — Previous version of database available but no tracking of entities between versions