Instituto Nacional de Bioinformática

Home / Resources / List of all databases /

The INB operations are supported by installations and customization of common databases, as well as the databases developed by INB groups, including Firestar/FireDB labeling biding sites in protein structures, Ecid collection of experimental and predicted protein interactions in bacteria, or MODEL, a very large database of molecular Dynamics trajectories with the associated analysis tools.

FireDB More info
FireDB is a database of annotated functionally important residues in protein structures.

Tags:

  • Databases and Data Integration
  • Biological Databases
  • Protein Function and Structure Analysis
  • Prediction of Functionally Important Residues
  • Prediction of Ligand Binding

Urls:

FireDB is a database of PDB structures and their associated ligands. FireDB also contains the largest set of reliably annotated functionally important residues.
The whole PDB was clustered at 97% sequence identity and every chain mapped onto a consensus sequence from the cluster, so that important positions can be mapped and binding sites collapsed into the consensus sequence. Comparison of binding sites within a cluster of sequences gives an idea of the flexibility of those regions and the capability they have to bind different ligand analogs.

References:
http://www.oxfordjournals.org/nar/database/summary/986
http://nar.oxfordjournals.org/content/35/suppl_1/D219
E. coli Interaction Database (EcID) More info
The EcID database provides a common framework for exploring the sizable amount of protein interaction-related data available for Escherichia coli.

Tags:

  • Databases and Data Integration
  • Biological Databases
  • Data Integration
  • Protein-protein interaction

Urls:

The EcID database (E. coli Interaction Database) provides a common framework for exploring the sizable amount of protein interaction-related data available for Escherichia coli. EcID integrates information related with functional interactions extracted from the following sources: EcoCyc [http://ecocyc.org/] (metabolic pathways, protein complexes and regulatory information), KEGG [http://www.genome.jp/kegg/] (metabolic pathways) and MINT [http://mint.bio.uniroma2.it/mint/] (protein interactions). It also contains information on protein complexes from high throughput pull down experiments carried out in E. coli, and potential interactions directly extracted from the literature using the web-services associated to the iHOP [http://www.ihop-net.org/UniPub/iHOP/] text-mining system. Additionally, EcID incorporates results from two protein interaction prediction methods based on genomic information (Phylogenetic Profiles and Gene Neighborhoods) and three methods based on analysis of the potential co-evolution of the corresponding protein families (Mirror Tree, In Silico 2 Hybrid and Context Mirror). EcID associates to each predicted pair a confidence score that reflects the reliability of the functional interaction between those two proteins.
CellBase More info
A comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources.

Tags:

  • Databases and Data Integration
  • Biological Databases
  • Data Integration

Urls:

During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.

References:
http://nar.oxfordjournals.org/content/40/W1/W609

ArchDB More info
ArchDB is a compilation of structural classifications of loops extracted from known protein structures.

Tags:

  • Databases and Data Integration
  • Data Integration
  • Protein Function and Structure Analysis
  • Protein Structure Prediction
  • Protein Structural Analysis

Urls:

ArchDB is a compilation of structural classifications of loops extracted from known protein structures [http://www.rcsb.org/pdb/]. Loops in ArchDB have been classified using an improved version of the original ArchType program published in 1997 by Oliva et al. (Oliva B, Bates PA, Querol E, Aviles FX, Sternberg MJ. An automated classification of the structure of protein loops.J Mol Biol. 1997 Mar 7;266(4):814-30.) [http://www.ncbi.nlm.nih.gov/pubmed/9102471?dopt=Abstract]. Conformational clusters and consensus sequences have been derived by computational analysis for loops from SCOP 40 database. Loops have been classified into five types (alpha-alpha, beta-beta links, beta-beta hairpins, alpha-beta and beta-alpha) according to the secondary structures they embrace.
IntOGen More info
IntOGen (Integrative OncoGenomics) is a resource that integrates multidimensional data for the identification of genes involved in cancer development.

Tags:

  • Databases and Data Integration
  • Biological Databases
  • Database Retrieval and Visualization
  • Data Integration
  • Clinical and Biomedical Applications
  • Biomedical Applications
  • Ontologies
  • Ontology Viewer

Urls:

Integrative OncoGenomics (IntOGen), a discovery tool for cancer researchers, is a resource that integrates multidimensional OncoGenomics Data for the identification of genes and groups of genes (biological modules) involved in cancer development.
CentrosomeDB More info
A human centrosomal proteins database

Tags:

Urls:

Active research on the biology of the centrosome during the past decades has allowed the identification and characterization of many centrosomal proteins. Unfortunately, the accumulated data are still dispersed among heterogeneous sources of information. Here we present CentrosomeDB, which intends to compile and integrate information related to the human centrosome. We have compiled a set of 383 likely human centrosomal genes, and recorded the associated supporting evidence. CentrosomeDB offers several perspectives to study the human centrosome, including evolution, function, and structure. The database contains information on orthology relationships with other species, including fungi, nematodes, arthropods, urochordates and vertebrates. Predictions of the domain organization of CentrosomeDB proteins are graphically represented in different sections of the database, including sets of alternative protein isoforms, interacting proteins, groups of orthologs, and the homologs identified with blast. CentrosomeDB also contains information related to function, gene-disease associations, SNPs and the 3D structure of proteins. Apart from important differences in the coverage of the set of centrosomal genes, our database differentiates from other similar initiatives in the way information is treated and analyzed.

References:
http://www.oxfordjournals.org/nar/database/summary/1267

http://nar.oxfordjournals.org/content/37/suppl_1/D175

ABS More info
Experimentally verified orthologous transcription factor binding sites.

Tags:

Urls:

ABS (Annotated Binding Sites) is a public database of experimentally verified orthologous transcription factor binding sites (TFBSs). Annotations have been collected from the literature and are manually curated. For each gene, the TFBSs conserved in orthologous sequences from at least two different species must be available. Promoter sequences as well as the original GenBank or RefSeq entries are additionally supplied in case of future identification conflicts. The final TSS annotation has been refined using the database dbTSS. Up to this release, 500 bps upstream the annotated transcription start site (TSS) have been always extracted to form the collection of gene promoter sequences from human, mouse, rat and chicken.

For each one of the annotated 650 regulatory sites, the position, the motif and the sequence in which the site is present are available in a very simple format. Cross-references to EntrezGene, PubMed and RefSeq are also provided for each annotation. Apart from the experimental promoter annotations, predictions by popular collections of weight matrices are also provided for each promoter sequence. In addition, global and local alignments, and graphical dotplots are also available. ABS is oriented to the study of regulatory regions in the context of pattern discovery programs. Thus, ABS provides two applications to aid during the automatical training of them: CONSTRUCTOR and EVALUATOR.

CONSTRUCTOR automatically generates artificial benchmarks by planting motifs in random sequences. The user can customize the content of the background sequence, the number of motifs that are planted, the subset of the real sites that can be used, the density of motifs on each sequence, and the length and the number of the sequences.

EVALUATOR uses the standard accuracy measures to assess the correctness of the predictions introduced by the user in contrast to the real sites also submitted. In the output, a table with the accuracy at both nucleotide and site level is supplied. The formal definitions of the values are always included to facilitate the interpretation.

References:
http://www.oxfordjournals.org/nar/database/summary/795
>http://nar.oxfordjournals.org/content/34/suppl_1/D63

HCAD More info
Human chromosome aberration database.

Tags:

Urls:

HCAD was designed to facilitate the identification of potential breakpoint genes (see Figure). This is a difficult task even though the complete human genome is now known, because of the sheer number of genes per cytoband.

The HCAD system is based on the hypothesis that genes directly affected by recurrent breakage events will be quoted more often in abstracts about the corresponding breakpoint, even if a direct proof for this association has not yet been described. The statistical analysis in HCAD thus provides probabilities for genes to be relevant for a certain breakpoint (literature evidence). False positive associations can be eliminated by crosschecking with genomic data.

Moreover, HCAD provides direct access to the original sentences that associate a certain gene with a breakpoint. This way the expert can easily confirm a proposed candidate gene on the basis of the original publication.

References:
http://www.oxfordjournals.org/nar/database/summary/683