#!/perl/bioinfo: papers

29 de diciembre de 2011

Jornadas Bioinformáticas JBI 2012 (XI Edición)

Aprovecho esta última entrada del año para dar difusión a las Jornadas de Bioinformática JBI 2012. Como muchos sabréis, este congreso es el principal punto de encuentro anual de nuestra comunidad en la península ibérica, así que nuestro laboratorio también estará en Barcelona del 23 al 25 de Enero. El programa completo de las jornadas se puede descargar en este enlace.

http://sgu.bioinfo.cipf.es/jbi2012

Este año presentaremos parte de nuestro trabajo reciente:

"Genome-wide clustering of transcription factors by comparison of predicted protein-DNA interfaces"

donde explicamos y evaluamos la anotación de interfaces de reconocimiento de DNA en secuencias de proteínas por medio de diferentes aproximaciones como BLAST, TFmodeller, DP-Bind y DISIS.

El tema principal de las jornadas será "Arquitectura genómica, anotación y diseño", sobre el cual se discutirán los diferentes avances en la integración de los campos de la Biología, Medicina e Informática en el campo de la Genómica. Además se tratarán los siguientes temas:
- Análisis de datos de secuenciación de alto rendimiento (NGS)
- Bioinformática estructural
- Algoritmos de biología computacional y computación de alto rendimiento
- Análisis de sequencias, filogenética y evolución
- Bases de datos, herramientas y tecnologías en biología computacional
- Bioinformática en transcriptómica y proteómica
- Biología de sistemas

ENGLISH:

The XIth Spanish Symposium on Bioinformatics (JBI2012) will take place in January 23-25, 2012 in Barcelona, Spain. Co-organised by the Spanish Institut of Bioinformatics and the Portuguese Bioinformatics Network and hosted by the Barcelona Biomedical Research Park (PRBB). The full program can be downloaded from this link.

This year, the reference topic is “Genome Architecture, Annotation and Design” for which the conference will provide the opportunity to discuss the state of the art for the integration of the fields of biology, medicine and informatics. We invite you to submit your work and share your experiences in the following topics of interest including, but not limited to:

- Analysis of high throughput data (NGS)
- Structural Bioinformatics
- Algorithms for computational biology and HPC
- Sequence analysis, phylogenetics and evolution
- Databases, Tools and technologies for computational biology
- Bioinformatics in Transcriptomics and Proteomics
- System and Synthetic Biology

Our contribution to the congress:

Genome-wide clustering of transcription factors by comparison of predicted protein-DNA interfaces

Transcription Factors (TFs) play a central role in gene regulation by binding to DNA target sequences, mostly in promoter regions. However, even for the best annotated genomes, only a fraction of these critical proteins have been experimentally characterized and linked to some of their target sites. The dimension of this problem increases in multicellular organisms, which tend to have large collections of TFs, sometimes with redundant roles, that result of whole-genome duplication events and lineage-specific expansions. In this work we set to study the repertoire of Arabidopsis thaliana TFs from the perspective of their predicted interfaces, to evaluate the degree of DNA-binding redundancy at a genome scale. First, we critically compare the performance of a variety of methods that predict the interface residues of DNA-binding proteins, those responsible for specific recognition, and measure their sensitivity and specificity. Second, we apply the best predictors to the complete A.thaliana repertoire and build clusters of transcription factors with similar interfaces. Finally, we use our in-house footprintDB to benchmark to what extent TFs in the same cluster specifically bind to similar DNA sites. Our results indicate that there is substantial overlap of DNA binding specificities in most TF families. This observation supports the use of interface predictions to construct reduced representation of TF sets with common DNA binding preferences.

13 de septiembre de 2010

Jornadas Bioinformáticas JBI 2010 (X Edición), nuestro laboratorio estará allí...

Las Jornadas Bioinformáticas son la cita anual obligada para los bioinformáticos españoles. Este año se celebrará su décima edición del 27 al 29 de Octubre en Torremolinos (Málaga). La organización de las mismas corre a cargo de la Universidad de Málaga, el Instituto Nacional de Bioinformática y la Red Portuguesa de Bioinformática. Este año el tema central es "La bioinformática aplicada a la medicina personalizada", sobre el cual se discutirá la integración de los campos de la biología, medicina e informática para el desarrollo de terapias más específicas y efectivas. Sin embargo, éste no será el único tema a tratar, también se compartirán resultados y experiencias en otros campos:
- Análisis de datos en técnicas de alto rendimiento como la secuenciación de nueva generación.
- Bioinformática estructural
- Algoritmos de biología computacional y técnicas de computación de alto rendimiento
- Análisis de secuencias, filogenética y evolución
- Bases de datos, herramientas y tecnologías de biología computacional
- Bioinformática en transcriptómica y proteómica
- Biología sintética y de sistemas

IN ENGLISH:

The Xth Spanish Symposium on Bioinformatics (JBI2010) will take place in October 27-29, 2010 in Torremolinos-Málaga, Spain. Co-organised by the National Institute of Bioinformatics-Spain and the Portuguese Bioinformatics Network and hosted by the University of Malaga (Spain).

This year, the reference topic is “Bioinformatics for personalized medicine” for which the conference will provide the opportunity to discuss the state of the art for the integration of the fields of biology, medicine and informatics. We invite you to submit your work and share your experiences in the following topics of interest including, but not limited to:
- Analysis of high throughput data (NGS)
- Structural Bioinformatics
- Algorithms for computational biology and HPC
- Sequence analysis, phylogenetics and evolution
- Databases, Tools and technologies for computational biology
- Bioinformatics in Transcriptomics and Proteomics
- System and Synthetic Biology

Nuestras aportaciones

Nuestro laboratorio va a participar en las Jornadas Bioinformáticas con tres contribuciones que presentaré a continuación:

3D-footprint: a database for the structural analysis of protein–DNA complexes (paper)
The relation between amino-acid substitutions in the interface of transcription factors and their recognized DNA motifs
101DNA: a set of tools for Protein-DNA interface analysis

3D-footprint: a database for the structural analysis of protein–DNA complexes
3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein–DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs and footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expert-curated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead.csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.

The relation between amino-acid substitutions in the interface of transcription factors and their recognized DNA motifs

Transcription Factors (TFs) play a key role in gene regulation by binding to DNA target sequences. While there is a vast literature describing computational methods to define patterns and match DNA regulatory motifs within genomic sequences, the prediction of DNA binding motifs (DBMs) that might be recognized by a particular TF is a relatively unexplored field. Numerous DNA-binding proteins are annotated as TFs in databases; however, for many of these orphan TFs the corresponding DBMs remain uncharacterized. Standard annotation practice transfer DBMs of well known TFs to those orphan protein sequences which can be confidently aligned to them, usually by means of local alignment tools such as BLAST, but these predictions are known to be error-prone. With the aim of improving these predictions, we test whether the knowledge of protein-DNA interface architectures and existing TF-DNA binding experimental data can be used to generate family-wise interface substitution matrices (ISUMs). An experiment with 85 Drosophila melanogaster homeobox proteins demonstrate that ISUMs: i) capture information about the correlation between the substitution of a TF interface residue and the conservation of the DBM; ii) are valuable to evaluate TFs alignments and iii) are better classifiers than generic amino-acid substitution matrices and that BLAST E-value when deciding whether two aligned homeobox proteins bind to the same DNA motif.

101DNA: a set of tools for Protein-DNA interface analysis

Analysis of protein-DNA interfaces has shown a great structural dependency. Despite the observation that related proteins tend to use the same pattern of amino acid and base contacting positions, no simple recognition code has been found. While protein contacts with the sugar-phosphate backbone of DNA provide stability and yield very little specificity information, contacts between amino acid side-chains and DNA bases (direct readout) apparently define specificity, in addition to some constrains defined by DNA sequence-dependent features, namely indirect readout.
Recent approaches have proposed bipartite graphs as an structural way of analysing interfaces from a protein-DNA-centric viewpoint. With this perspective in mind, we have developed a set of tools for the dissection and comparison of protein-DNA interfaces. Taking a protein-DNA complex file in PDB format as input, the software generates a 2D matrix that represents a bipartite graph of residue contacts obtained after applying a simple distance threshold that captures all non-covalent interactions. The generated 2D matrices allow a fast and simple visual inspection of the interface and have been successfully produced for the current non-redundant set of protein-DNA complexes in the 3D-footprint database.
As a second utility to compare 2 interfaces, the 101DNA software includes an aligment tool where a dynamic programming matrix is created with the Local Affine Gap algorithm and traced back as a finite state automata. The scores between pairs of interface amino acid residues are calculated as a function of the observed contacts with DNA nitrogen bases. This tool produces local interface alignments which are independent of the underlying protein sequence, but that faithfully represent the binding architecture. Preliminary tests show that these local alignments successfully identify binding interfaces that share striking similarity despite belonging to different protein superfamilies, and these observations support this graph-theory approach.