Instituto Nacional de Bioinformática

Home / Resources / Core resources /

The INB sustains important bioinformatics resources for the national and international life science research community by co-funding their development and maintenance in the groups that originally developed them. These resources listed here are our core resources that cover key fields such as functional genomics, transcriptomics with RNA-Seq, genotyping, genomic medicine, and molecular dynamics simulations.

APPRIS More info
APPRIS determines the principal functional splice isoforms for human genes.


  • Genome Analysis
  • Isoform Prediction
  • Databases and Data Integration
  • Biological Databases
  • Sequence Analysis and Function Prediction
  • Function Prediction
  • Sequence Analysis
  • Protein Function and Structure Analysis
  • Prediction of Functionally Important Residues
  • Web Services



APPRIS is a system that deploys a range of computational methods to provide value to the annotations of the human genome. APPRIS also selects one of the CDS for each gene as the principal isoform. APPRIS defines the principal variant by combining protein structural and functional information and information from the conservation of related species.

Alternative splicing generates different gene products. It is important to designate one of the isoforms as the "principal" functional isoform in order to predict the changes in function, structure or localisation brought about by Alternative Splicing.

The APPRIS server is being used in the context of the scale up of the ENCODE project (Nature. 2012 Sep 6;489(7414):57-74) to annotate the human genome (20,700 protein-coding genes and 84,408 distinct alternative transcripts from Gencode7/Ensembl62). The main data set combines the GENCODE manual annotation using evidence from various sources and research groups with the Ensembl automatic annotation pipelines to achieve an accurate and complete annotation of the human genome.


APPRIS automates a range of computational methods used to define principal functional variants based on the Principal Variant Pipeline (protein structural information, functionally important residues, conservation of exonic structure and evidence of non-neutral evolution).

The methods of the pipeline are the following:

  1. Functionally important residues (firestar)

  2. Protein structural information (Matador-3D)

  3. Neutral evolution of exons (INERTIA)

  4. Conservation against vertebrates (CORSAIR)

  5. Presence of protein domains (SPADE)

  6. Conservation of exonic structure (CExonic)

  7. Prediction of signal peptide and sub-cellular location (CRASH)

  8. Determination of the number of trans-membrane helices (THUMP)

Access to the system

APPRIS has been designed to be portable, modular and flexible. It is possible to integrate it to other bioinformatics systems and it can be accessed in distributed systems in the form of RESTful web services. These services retrieve useful information of genes/transcripts, and the results of APPRIS methods.

APPRIS also allows the manipulation of the database through comprehensive set of Application Programming Interfaces (APIs) serve as a middle-layer between underlying database schemes. The APIs aim to encapsulate the database layout by providing efficient high-level access to data tables and isolate applications from data layout changes.

A query-oriented data management system based on BioMart (Haider S et al. Nucleic Acids Res. 2009; 37: W23-7.) is provided for annotation-like searches of complex descriptive data, and a simple visualization based on the UCSC genome browser (Nucleic Acids Res. 2012 Jan;40(Database issue):D918-23).


  • firestar--prediction of functionally important residues using structural templates and alignment reliability. G. López, A. Valencia, ML. Tress. Nucleic acids research (2007 Jun). Vol 35, Issue Web Server issue, Pubmed.
  • Determination and validation of principal gene products. ML. Tress, JJ. Wesselink, A. Frankish, G. López, N. Goldman, A. Löytynoja, T. Massingham, F. Pardi, S. Whelan, J. Harrow, A. Valencia. Bioinformatics (Oxford, England) (2007 Nov). Vol 24, Issue 1, Pubmed.
  • Firestar--advances in the prediction of functionally important residues. G. Lopez, P. Maietta, JM. Rodriguez, A. Valencia. Nucleic Acid Research (2011 Jul). Vol , Issue Web server issue, W235-241 Pubmed.

  • FireStar and FireDB More info
    FireDB is a database of PDB protein structures and their associated ligands and contains the largest set of reliably annotated functionally important residues. Firestar predicts functional residues in protein structures from structural templates and alignment reliability.


    • Protein Function and Structure Analysis
    • Prediction of Functionally Important Residues
    • Prediction of Ligand Binding


    As the structural databases expand and populate structural space, a great deal of interesting biological information is being generated. Much of this, such as the amino acid residues implicated in molecular interactions or catalysis, can be found at the residue level.

    FireDB and the firestar web server were developed specifically to make use of this data in order to predict biologically important residues in protein sequences. FireDB is a database of annotated catalytic residues and ligand-binding residues culled from the protein structures deposited in the Protein Data Bank. firestar uses the functional information in FireDB to make predictions of ligand-binding residues and catalytic residues.


    FireDB is a databank for functional information relating to proteins with known structures. It contains the most comprehensive and detailed repository of known functionally important residues, bringing together both ligand binding and catalytic residues in one site. The platform integrates biologically relevant data filtered from the close atomic contacts in Protein Data Bank crystal structures and reliably annotated catalytic residues from the Catalytic Site Atlas

    The whole PDB was clustered at 97% sequence identity and every chain mapped onto a consensus sequence from the cluster, so that important positions can be mapped and binding sites collapsed into the consensus sequence. Comparison of binding sites within a cluster of sequences gives an idea of the flexibility of those regions and the capability they have to bind different ligand analogs.

    The interface allows users to make queries by protein, ligand or keyword:

    • Uniprot: primary accession numbers are of the style Q9WXS1, P07024…

    • PDB chain: combination of four letter pdb codes and the chain identifier when it exists

    • Keyword: searches in PDB headers and Uniprot description

    Relevant biologically important residues are displayed in a simple and easy to read manner that allows users to assess binding site similarity across homologous proteins. Binding site residue variations can also be viewed with molecular visualization tools.


    firestar is an expert system for predicting catalytic and ligand-binding residues from the information extracted from remotely related protein structures. The server provides a method for extrapolating from the large inventory of functionally important residues organized in the FireDB database and adds information about the local conservation of potential-binding residues. Predictions are based on local sequence conservation matches to the biologically relevant small molecule ligand binding residues and annotated catalytic residues from the Catalytic Site Atlas.

    One feature of firestar is that it can be used to evaluate the biological relevance of small molecule ligands present in PDB structures. With the server it is easy to discern whether small molecule binding is conserved in homologous structures.

    In its latest release prediction coverage has been greatly improved with the extension of the FireDB database and the addition of alignments generated by HHsearch. Ligands in FireDB are now classified for biological relevance. Many of the changes have been motivated by the critical assessment of techniques for protein structure prediction (CASP) ligand-binding prediction experiment, which provided us with a framework to test the performance of firestar.

    firestar has been tested during the CASP7, CASP8 and CASP9 ligand-binding prediction experiments. The CASP experiments are the best testing ground for web servers, although results from the CASP ligand-binding prediction experiment should be taken with care—each CASP is a snapshot of the predictive capacity of servers and human groups over a limited time period and over a limited set of targets. Nevertheless, the results from the three CASP experiments form a body of evidence, which suggests that firestar is a state of the art ligand-binding predictor.

    Access to the system

    The interface allows users to make queries by protein sequence or structure. The user can access pairwise and multiple alignments with structures that have relevant functionally important binding sites. The results are presented in a series of easy to read displays that allow users to compare binding residue conservation across homologous proteins. The binding site residues can also be viewed with molecular visualization tools.

    firestar has been implemented as a REST web service for easy integration into other systems or workflows. It plays for example an important role as a part of the APPRIS pipeline to annotate splice variants as a part of the ENCODE project. Using the "wget" command one can for example directly query the web server to retrieve all the information provided in a summary page in a easy-to-parse format.


  • firestar--advances in the prediction of functionally important residues. G. Lopez, P. Maietta, JM. Rodriguez, A. Valencia. Nucleic Acids Research (2011 Jul). Vol 39, Issue Web server issue, W235-W241 Pubmed.
  • firestar--prediction of functionally important residues using structural templates and alignment reliability. G. López, A. Valencia, ML. Tress. Nucleic acids research (2007 Jun). Vol 35, Issue Web Server issue, Pubmed.
  • FireDB--a database of functionally important residues from proteins of known structure. G. Lopez, A. Valencia, M. Tress. Nucleic acids research (2006 Nov). Vol 35, Issue Database issue, Pubmed.
  • SQUARE--determining reliable regions in sequence alignments. ML. Tress, O. Graña. Bioinformatics (2004 Apr). Vol 20, Issue , 974-975 Pubmed.

  • Babelomics More info
    Babelomics is an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling.


    • Gene Expression Analysis
    • Transcriptome Analysis
    • Sequence Analysis and Function Prediction
    • Function Prediction
    • GO Term Predictions


    Babelomics is an integrated platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling and is a response to the growing necessity of integrating and analyzing different types of genomic data in an environment that allows an easy functional interpretation of the results. The system includes a complete suite of methods for the analysis of gene expression data that include normalization (covering most commercial platforms), pre-processing, differential gene expression (case-controls, multiclass, survival or continuous values), predictors, clustering; large-scale genotyping assays (case controls and TDTs, and allows population stratification analysis and correction). All these genomic data analysis facilities are integrated and connected to multiple options for the functional interpretation of the experiments. Different methods of functional enrichment or gene set enrichment can be used to understand the functional basis of the experiment analyzed. Many sources of biological information, which include functional (GO, KEGG, Biocarta, Reactome, etc.), regulatory (Transfac, Jaspar, ORegAnno, miRNAs, etc.), text-mining or protein–protein interaction modules can be used for this purpose. Finally a tool for the de novo functional annotation of sequences has been included in the system. This provides support for the functional analysis of non-model species.

    Babelomics as well as Gene Expression Pattern Analysis Suite (GEPAS) have been uninterruptedly running for more than 10 years. Currently, Babelomics has an average of more than 200 experiments analyzed per day, respectively , distributed among many different countries.

    The Babelomics project aims to provide the scientific community with an advanced set of methods for the integrated analysis of genomic data within the context of functional profiling analysis without renouncing to a user-friendly and intuitive use. As the Functional Genomics node of the Spanish Institute of Bioinformatics and being part of the Spanish Network of Cancer and the Network of Centres for Research in Rare Diseases (CIBERER), a direct contact with researchers is maintained that provides much of the feedback necessary to make of Babelomics a useful tool.

    Although there are many tools for the functional profiling of high-throughput experiments, Babelomics is a widely used tool which offers a combination of features and a degree of integration that makes it unique among other resources available.

    In terms of technology, Babelomics has been re-engineered, speeded up and transformed to web services. Babelomics has a new interface that allows the definition of persistent sessions and asynchronous use (a program can be left running and come back later to see the results) through a queue system. Moreover, the complete program can be installed locally and their modules can now be independently invoked as command line programs and can be integrated into analysis pipelines.


  • Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. I. Medina, J. Carbonell, L. Pulido, SC. Madeira, S. Goetz, A. Conesa, J. Tárraga, A. Pascual-Montano, R. Nogales-Cadenas, J. Santoyo, F. García, M. Marbà, D. Montaner. Nucleic Acids Research (2010 Jul). Vol 38, Issue Web Server Issue, W210-W213 Pubmed.
  • Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. F. Al-Shahrour, J. Carbonell, P. Minguez, S. Goetz, A. Conesa, J. Tárraga, I. Medina, E. Alloza, D. Montaner, J. Dopazo. Nucleic acids research (2008 May). Vol 36, Issue Web Server issue, Pubmed.
  • A function-centric approach to the biological interpretation of microarray time-series. P. Minguez, F. Al-Shahrour. Genome Informatics (2006 ). Vol 17, Issue 2, 56-57 Pubmed.
  • FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. F. Al-Shahrour, P. Minguez, J. Tárraga, I. Medina, E. Alloza, D. Montaner, J. Dopazo. Nucleic acids research (2007 May). Vol 35, Issue Web Server issue, Pubmed.
  • BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. F. Al-Shahrour, P. Minguez, J. Tárraga, D. Montaner, E. Alloza, JM. Vaquerizas, L. Conde, C. Blaschke, J. Vera. Nucleic Acids Research (2006 Jul). Vol 34, Issue Web server issue, W472-W476 Pubmed.
  • BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. F. Al-Shahrour, P. Minguez, JM. Vaquerizas, L. Conde. Nucleic Acids Research (2005 Jul). Vol 33, Issue Web Server Issue, W460-W464 Pubmed.
  • From genes to functional classes in the study of biological systems. F. Al-Shahrour, L. Arbiza, H. Dopazo, J. Huerta-Cepas, P. Mínguez, D. Montaner, J. Dopazo. BMC bioinformatics (2007 Apr). Vol 8, Issue , Pubmed.

  • aGem More info
    aGEM: an integrative system for analyzing spatial-temporal gene-expression information.


    • Data Integration
    • Databases and Data Integration
    • Gene Expression Analysis
    • Spacial Gene Expression Analysis


    Different genes are expressed in various tissues at different developmental stages. Precise regulation of spatio-temporal gene expression is crucial during the development of an organism. It is essential to know the exact timing and location of gene transcripts when studying the functions of genes involved in developmental processes.

    The aim of the Visual Genomics project is to facilitate the access to information related to anatomical patterns of gene expression in several model organisms (such as mouse and human) to complement functional genomics studies. This has been accomplished by the development of a platform that allows the integration of gene expression data with spatial-temporal anatomic data by means of an intuitive and user friendly viewer.

    The aGEM (anatomic Gene Expression Mapping) Platform provides information to answer three main questions. (i) Which genes are expressed in a given anatomical component? (ii) In which anatomical structures are a given gene or set of genes expressed? And (iii) is there any correlation among these findings?

    Currently there are a variety of anatomical gene expression data bases, but extracting information from them can be hampered by their diversity and heterogeneity. aGEM addresses the issues of diversity and heterogeneity of anatomical gene expression data bases by integrating six mouse gene expression resources (EMAGE, GXD, GENSAT, Allen Brain Atlas data base, EUREXPRESS and BioGPS) and three human gene expression data bases (HUDSEN, Human Protein Atlas and BioGPS). Furthermore, aGEM provides new cross analysis tools to bridge these resources.

    These gene-expression data are mostly obtained from in situ techniques together with a broad set of image-derived annotations. Moreover, general biological information from databases such as KEGG, OMIM and MTB is integrated too. aGEM not only gives an integrated view of the databases mentioned above, but also allows the experimentalist to retrieve relevant statistical information relating gene expression, anatomical structure (space) and developmental stage (time).

    aGEM can be queried by gene and by anatomical structure. The first type of query can be carried out by using the ENSEMBL identifier, common gene symbol, MGI identifier or UniProt accession number. Querying by anatomical structure is simplified by displaying the terms from the chosen developmental stage (Carnegie Stage for human or Theiler Stage for mouse) in hierarchical trees.

    By integrating the KEGG pathways data base in aGEM, the user can query using a set of genes involved in a given process. This utility is very useful as it allows gene expression differences and similarities to be compared in physiological and disease states. Emphasizing the disease state, information from OMIM (human diseases) and MTB (tumours in mice) data bases is displayed associated with genes in the results interface, when available.

    Output information is presented in a friendly format, allowing the user to display expression maps and correlation matrices for a gene or structure during development. An in depth study of a specific developmental stage is also possible using heatmaps that relate gene expression with anatomical components.

    Summarizing, aGEM Platform is a powerful tool in the gene expression field that makes easy the access to information related to the anatomical pattern of gene expression in human and mouse, so that it can complement many functional genomics studies. The platform allows the integration of gene expression data with spatial-temporal anatomic data by means of an intuitive and user friendly display.


  • aGEM: an integrative system for analyzing spatial-temporal gene-expression information. N. Jiménez-Lozano, J. Segura, JR. Macías, J. Vega, JM. Carazo. Bioinformatics (Oxford, England) (2009 Jul). Vol 25, Issue 19, Pubmed.
  • Integrating human and murine anatomical gene expression data for improved comparisons. N. Jiménez-Lozano, J. Segura, JR. Macías, J. Vega. Bioinformatics (2012 Feb). Vol 28, Issue , 397-402 Pubmed.

  • 3D Electron Microscopy Benchmark More info
    The 3D Electron Microscopy Benchmark web portal is used to compare algorithms for image processing in Structural Biology.


    • Visual Bionformatics
    • Digital Image Analysis
    • Evaluations and Assessments
    • Algorithms Evaluation


    Instruct is one of the European Strategic Infrastructure projects. Its objective is mainly to provide an integrated infrastructure and training for structural biologists facilitating the bridge between Structural Biology and Bioinformatics. Instruct is the dynamic hub of structural biology providing an integrated infrastructure of cutting-edge technology, scientific expertise and pioneering training.

    Instruct is being established as a pan-European distributed infrastructure to facilitate access to state-of-the-art research facilities and expertise across Europe in order to support excellent science that integrates an understanding of biological structure with cellular function. Building on this, Instruct also aims to advise and work with funders and industry on the development and implementation of a co-ordinated European strategy for investments in structural biology facilities.

    The role of INB in this project is to foster the development of new advanced methods in the area of image processing (3D-EM Benchmarking), integration of different software packages in a workflow framework (Scipion) and integration of 3D-EM volumes with other kind of data (PeppeR).

    Scipion fills some gaps existing in the structural biology field such as workflow management, software integration and traceability. PeppeR provides a friendly interface for the study of hybrid models made by fitting structures at the atomic level of resolution into 3D-EM volumes. PeppeR allows the retrieval and integration of a whole set of annotations of particular interest for structural biologists, experimental scientists and even clinicians. Specifically, the objective of PeppeR is to become the default Protein Data Bank viewer for those entries with fitted structures.

    By managing specific “Challenges” that will be issued periodically the 3D-EM Benchmarking Platformconstitutes an excellent evaluation system in the Structural Biology field. The initiative as well as the challenges will have the support of the United States National Center for Macromolecular Imaging (NCMI). The goal is to foster the development of new advanced methods in the area of image processing in Structural Biology. To this end a robust computational infrastructure capable of supporting the automatic and standardized benchmarking of image processing applications is provided.

    SNPator (SNP Analysis To Results) More info
    SNPator is a user-friendly web-based SNP data analysis suite that integrates the most common steps of a SNP association study.


    • Genome Analysis
    • SNP Analysis


    Single nucleotide polymorphisms (SNPs) are the most widely used marker in studies to assess associations between genetic variants and complex traits or diseases. The vast number of SNPs identified in the last few years and the development of high-throughput genotyping technologies have provided the opportunity for many research groups to undertake association studies of varying scales on a regular basis. SNP association studies have become crucial in the uncovering of genetic correlations of genomic variants with complex diseases, quantitative traits and physiological responses to drugs. SNPs are also increasingly employed to study the history of populations and the evolution of species.

    In spite of the increasing popularity of SNP studies, processing and analyzing the huge amounts of data generated by genotyping technologies is still a burdensome and time consuming task. Hundreds of different software packages have been developed to deal with particular problems and are available on the Web ( Much time and effort is required, not only to identify the most appropriate algorithms and programs for each goal, but also to install them on local computers, to learn how they work or to give the appropriate format to input data. Within many genotyping projects, post-genotyping data management and analysis have become a bottleneck hindering the achievement of results.

    In order to help tackling these problems we have developed a web-based software solution called SNPator (for SNP analysis to results). SNPator is a user-friendly web-based SNP data analysis suite that integrates, among many other algorithms, the most common steps of a SNP association study. It frees the user from the need to have large computer facilities and an in depth knowledge of genetic software installation and management. Genotype data is directly read from the output files of the usual genotyping platforms. Phenotypic data on the samples can also be easily uploaded. Many different quality control and analysis procedures can be performed either by using built-in SNPator algorithms or by calling standard genetic software.

    Users can log into the application via web using a standard browser. Users have different levels of privileges and can only access their own studies. A study is a working space—shared by as many users as necessary—where a set of data and all results generated from its analysis are stored.

    Each study starts with three types of data in highly customizable tables: a set of SNPs with related genomic information, a set of samples with population or phenotypic information and a set of genotypes. SNP and sample information can be easily uploaded using several methods, including, for SNPs, automatic upload from public databases such as dbSNP or HapMap.

    Once data have been uploaded, SNPator offers many quality control and analysis possibilities. Quality control options range from the detection of contradictory genotypes to the generation of graphical reports of uploaded plates. As to analysis, the simplest way in which SNPator can be used is to generate formatted data files ready to be used by other programs.


  • SNP analysis to results (SNPator): a web-based environment oriented to statistical genomics analyses upon SNP data. C. Morcillo-Suarez, J. Alegre, R. Sangros, E. Gazave, R. de Cid, R. Milne, J. Amigo, A. Ferrer-Admetlla, A. Moreno-Estrada, M. Gardner, F. Casals, A. Pérez-Lezaun, D. Comas, E. Bosch, F. Calafell, J. Bertranpetit, A. Navarro. Bioinformatics (Oxford, England) (2008 May). Vol 24, Issue 14, Pubmed.

  • GRAPE: RNAseq Analysis Pipeline Environment More info
    GRAPE is a robust, efficient and scalable software system for the storage, organization, access, and analysis of the RNASeq data produced.


    • Gene Expression Analysis
    • Transcriptome Analysis
    • RNA-seq


    Next generation sequencing technologies provide an unprecedented capacity for surveying the nucleic acid content of cells. In particular since these techniques were applied to transcriptome sequencing we are becoming increasingly aware of the large number of genes that show alternative splice forms in human as well as the large variety of splice forms that these genes can have, that may range from just two splice variants to hundreds. On the other hand, the accelerating rate of data production with these new technologies is moving the bottleneck in many studies from the data generation to the actual analysis of these data. Because of this it is important to design methods with which we can analyze them in a fast and efficient manner.

    GRAPE aims to use the data form these experiments in order to determine the exact transcript abundances within the cell. Not only as a list of the transcripts that are expressed at the qualitative level, but also the exact expression level of each transcript and alternative variant within the cell, while at the same time developing a highly automated method that will allow to take advantage of the huge amounts of data available.

    GRAPE is a robust, efficient and scalable software system for the initial processing, storage, analysis and management of RNAseq data, as well as the implementation of this pipeline using a workflow management system. The pipeline is designed to automate the main analyses necessary for RNAseq data.

    The system has three main components (see figure 1). A structured repository hosting the raw, and processed data, an RNASeq pipeline to transparently produce transcript models and quantifications from sequence reads, and a common interface to both, data and analysis.

    The pipeline starts from the raw sequencing reads, or BAM alignments, a genome file and an annotation of the species from which the reads originate. It includes a quality control step, a mapping step that uses GEM mapper, and the quantification of the annotated transcripts and genes using certain custom scripts and the program Flux Capacitor.

    This pipeline is linked to a MySQL database that keeps track of those steps that have been completed as well as storing information generated at each of these steps. The database can be accessed using scripts or through a web application (figures 2-4) that allows for easy browsing of the results. This web application is still under active development, however a working prototype is available at

    The system will be open source, self-contained and easily portable, so that can be used and enhanced by other researchers outside from this project. GRAPE may be run locally or it can use a queuing system such as SGE to improve the speed and allow for parallelization of the different steps.

    On top of the system, methods for complex querying and analysis within and across experiments can be implemented. New steps can be easily added, allowing it to be extended in order to fit specific needs.

    Realizing that the current model for the storage of publicly available biomolecular data, based on centralized repositories, may not be longer sustainable, we will design our software system assuming a peer-to-peer like network of distributed RNASeq resources, across which bioinformatics analysis may need to be transparently performed. We have already a working prototypes of some elements of the system. We plan to develop additional ones and integrate them in a solid, professional and open platform.

    Figure 1. The GRAPE RNASeq Analysis System

    Figure 2. Grape interface detail.

    Figure 3. Grape interface detail.

    Figure 4. Grape interface detail.

    IntOGen More info
    IntOGen (Integrative OncoGenomics) is a discovery tool for cancer researchers hat integrates multidimensional oncogenomics data for the identification of genes and biological modules involved in cancer development.


    • Databases and Data Integration
    • Biological Databases
    • Database Retrieval and Visualization
    • Biomedical Applications
    • Ontologies


    The use of high-throughput techniques has come to the fore in modern cancer research. The vast amount of oncogenomic data produced to date, together with data from new, large-scale projects such as The Cancer Genome Atlas and the International Cancer Genome Consortium provides two new challenges:

    1. biologically relevant integration of the information coming from heterogeneous sources and

    2. an intuitive visualization system to capture changes important to tumorigenesis (driver alterations).

    IntOGen is a framework that addresses these issues by collating, organizing, analyzing and integrating data from genome-wide experiments that study several types of alterations in different human cancers. The current version contains data from experiments studying transcriptomic alterations, genomic gains and losses and somatic mutation information and the system is designed to incorporate data from new experiments and from other alteration types when available.

    There are several characteristics that make IntOGen unique:

    1. Samples are annotated manually according to the same structured vocabulary (the International Classification of Disease for Oncology) which specifies tumor topography and morphology. This way specific alterations can be relate with clinical annotations and combined and compared with results for different experiments with the same disease classification to detect shared patterns.

    2. To identify the most relevant alterations statistical methodologies are applied based on the rationale that driver alterations are found in more samples than expected by chance, and combined evidence from multiple studies is provided that analyze the same tumor type.

    3. Analysis at the level of individual genes does not capture the full biological complexity. For this the contribution of biological modules (for example, pathways) to cancer needs to analyzed. Analysis of TCGA glioblastoma expression data using the IntOGen methodology successfully identified similar genes and modules as highlighted in the original study.

    4. As this exhaustive analysis generates huge amounts of data IntOGen has a powerful and intuitive web interface designed to be a discovery tool for cancer research. The users can identify the genes and modules significantly altered in a cancer type or explore the alteration pattern of a gene module of interest across many cancer types. Users can easily browse results of individual experiments or of combinations of experiments with the same clinical annotation or from pathways to genes in these pathways.

    In addition to its web interface the BioMart interface of IntOGen (available at provides access to high-throughput data related to genomic and transcriptomic alterations taking place in different types of cancers. This interface allows complex queries and facilitates the bulk download of the all analysis results. IntOGen data can also be accessed using Gitools, a Java application for the analysis and visualization of genomics data using interactive heat maps. For example, one can easily perform enrichment analyses on IntOGen data with modules or gene sets from various Biomart portals to explore large-scale patterns in cancer genomics data (see for examples).



  • IntOGen: Integration and data-mining of multidimensional oncogenomic data. . G. Gundem, C. Perez-Llamas, A. Jene-Sanz, A. Kedzierska, A. Islam, J. Deu-Pons, S.J. Furney, N. Lopez-Bigas. Nature Methods (2010 ). Vol 7, Issue 2, 92-93 Pubmed.
  • Integrative cancer genomics (IntOGen) in Biomart. C. Perez-Llamas, G. Gundem. Database (Oxford) (2011 Sep). Vol bar039, Issue , Pubmed.

  • FlexPortal More info
    FlexPortal is an integrated platform for macromolecular flexibility studies.


    • Databases and Data Integration
    • Biological Databases
    • Protein Function and Structure Analysis
    • Protein Structure Prediction
    • Protein Mutation Analysis
    • Protein Structural Analysis
    • Molecular Dynamics Simulation
    • Clinical and Biomedical Applications
    • Biomedical Applications


    Biological function is largely based on molecular recognition. Biological macromolecules interact with each other following strict rules on the complementarity of 3D structures and interactions. The understanding of molecular recognition has been based traditionally on the analysis of static models of protein and nucleic acid 3D structures as found in the Protein Data Bank. However, molecular recognition requires precise adjustments of the structures to optimize binding, what is possible due to the intrinsic flexibility of biological macromolecules, but very difficult to follow using static pictures of those structures. Although some information about flexibility and induced fit could be extracted from the set of conformations available in PDB, only theoretical methods can draw a full picture of the phenomenon

    Molecular dynamics is the central methodology to obtain a dynamic view of macromolecular systems. The central part of the platform is MoDEL (Molecular Dynamics Extended Library) that holds over 2000 atomistic simulations obtained in standard conditions. Information about the biological system to be analyzed is first obtained from Uniprot, PDB and other databases. Relevant details like the ensemble of conformations available, the existence of conformational transitions, or the involvement in ligand or protein binding are retrieved. Trajectories are either obtained from MoDEL or generated through the automated tool MDWeb and, eventually, conformational transitions are populated using coarse-grained MD. The generated trajectories are finally analyzed and dynamic and flexibility properties extracted.

    Access to the Server Platform (in construction):

    Tools involved:

    • MoDEL (Molecular Dynamics Extended Library) is a database holding over 1,800 trajectories ranging from 10 ns to 1μs in length, from a representative set of protein structures.

    • MDWeb & MDMoby. A comprehensive set of tools to allow automated setup and analysis of molecular dynamics trajectories. Access is available through a Web portal, but also through a series of web-services developed in the BioMoby framework

    • MorphWeb. Web portal to analyze large scale conformational changes in proteins, through Discrete Dynamics simulations and Normal Mode analysis.

    • FlexServ. Web portal for the analysis of flexibility including the generation of coarse-grained molecular dynamics trajectories, analysis of essential dynamics, BFactors, hinge points and flexibility correlations.


  • Protein flexibility from discrete molecular dynamics simulations using quasi-physical potentials. A. Emperador, T. Meyer, M. Orozco. Proteins (2009 Oct). Vol 78, Issue 1, Pubmed.
  • FlexServ: an integrated tool for the analysis of protein flexibility. J. Camps, O. Carrillo, A. Emperador, L. Orellana, A. Hospital, M. Rueda, D. Cicin-Sain, M. D'Abramo, JL. Gelpi, M. Orozco. Bioinformatics (Oxford, England) (2009 May). Vol 25, Issue 13, Pubmed.
  • MoDEL (Molecular Dynamics Extended Library): a database of atomistic molecular dynamics trajectories. T. Meyer, M. D'Abramo, A. Hospital, M. Rueda, C. Ferrer-Costa, A. Pérez, O. Carrillo, J. Camps, C. Fenollosa, D. Repchevsky, JL. Gelpí, M. Orozco. Structure (2011 Nov). Vol 18, Issue 11, 1399-1409 Pubmed.
  • Coarse-grained representation of protein flexibility. Foundations, successes, and shortcomings. M. Orozco, L. Orellana, A. Hospital, AN. Naganathan, A. Emperador, O. Carrillo. Advance in Protein Chemistry and Structural Biology (2011 ). Vol , Issue 85, 183-215 Pubmed.