- Análisis de datos en técnicas de alto rendimiento como la secuenciación de nueva generación.
- Bioinformática estructural
- Algoritmos de biología computacional y técnicas de computación de alto rendimiento
- Análisis de secuencias, filogenética y evolución
- Bases de datos, herramientas y tecnologías de biología computacional
- Bioinformática en transcriptómica y proteómica
- Biología sintética y de sistemas
IN ENGLISH:
The Xth Spanish Symposium on Bioinformatics (JBI2010) will take place in October 27-29, 2010 in Torremolinos-Málaga, Spain. Co-organised by the National Institute of Bioinformatics-Spain and the Portuguese Bioinformatics Network and hosted by the University of Malaga (Spain).
This year, the reference topic is “Bioinformatics for personalized medicine” for which the conference will provide the opportunity to discuss the state of the art for the integration of the fields of biology, medicine and informatics. We invite you to submit your work and share your experiences in the following topics of interest including, but not limited to:
- Analysis of high throughput data (NGS)
- Structural Bioinformatics
- Algorithms for computational biology and HPC
- Sequence analysis, phylogenetics and evolution
- Databases, Tools and technologies for computational biology
- Bioinformatics in Transcriptomics and Proteomics
- System and Synthetic Biology
Nuestras aportaciones
Nuestro laboratorio va a participar en las Jornadas Bioinformáticas con tres contribuciones que presentaré a continuación:
- 3D-footprint: a database for the structural analysis of protein–DNA complexes (paper)
- The relation between amino-acid substitutions in the interface of transcription factors and their recognized DNA motifs
- 101DNA: a set of tools for Protein-DNA interface analysis
3D-footprint: a database for the structural analysis of protein–DNA complexes
3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein–DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs and footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expert-curated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead.csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.
The relation between amino-acid substitutions in the interface of transcription factors and their recognized DNA motifs
Transcription Factors (TFs) play a key role in gene regulation by binding to DNA target sequences. While there is a vast literature describing computational methods to define patterns and match DNA regulatory motifs within genomic sequences, the prediction of DNA binding motifs (DBMs) that might be recognized by a particular TF is a relatively unexplored field. Numerous DNA-binding proteins are annotated as TFs in databases; however, for many of these orphan TFs the corresponding DBMs remain uncharacterized. Standard annotation practice transfer DBMs of well known TFs to those orphan protein sequences which can be confidently aligned to them, usually by means of local alignment tools such as BLAST, but these predictions are known to be error-prone. With the aim of improving these predictions, we test whether the knowledge of protein-DNA interface architectures and existing TF-DNA binding experimental data can be used to generate family-wise interface substitution matrices (ISUMs). An experiment with 85 Drosophila melanogaster homeobox proteins demonstrate that ISUMs: i) capture information about the correlation between the substitution of a TF interface residue and the conservation of the DBM; ii) are valuable to evaluate TFs alignments and iii) are better classifiers than generic amino-acid substitution matrices and that BLAST E-value when deciding whether two aligned homeobox proteins bind to the same DNA motif.
101DNA: a set of tools for Protein-DNA interface analysis
Analysis of protein-DNA interfaces has shown a great structural dependency. Despite the observation that related proteins tend to use the same pattern of amino acid and base contacting positions, no simple recognition code has been found. While protein contacts with the sugar-phosphate backbone of DNA provide stability and yield very little specificity information, contacts between amino acid side-chains and DNA bases (direct readout) apparently define specificity, in addition to some constrains defined by DNA sequence-dependent features, namely indirect readout.
Recent approaches have proposed bipartite graphs as an structural way of analysing interfaces from a protein-DNA-centric viewpoint. With this perspective in mind, we have developed a set of tools for the dissection and comparison of protein-DNA interfaces. Taking a protein-DNA complex file in PDB format as input, the software generates a 2D matrix that represents a bipartite graph of residue contacts obtained after applying a simple distance threshold that captures all non-covalent interactions. The generated 2D matrices allow a fast and simple visual inspection of the interface and have been successfully produced for the current non-redundant set of protein-DNA complexes in the 3D-footprint database.
As a second utility to compare 2 interfaces, the 101DNA software includes an aligment tool where a dynamic programming matrix is created with the Local Affine Gap algorithm and traced back as a finite state automata. The scores between pairs of interface amino acid residues are calculated as a function of the observed contacts with DNA nitrogen bases. This tool produces local interface alignments which are independent of the underlying protein sequence, but that faithfully represent the binding architecture. Preliminary tests show that these local alignments successfully identify binding interfaces that share striking similarity despite belonging to different protein superfamilies, and these observations support this graph-theory approach.