#!/perl/bioinfo: sequence

18 de septiembre de 2012

TFcompare - a tool for structural alignment of DNA binding protein complexes

I want to introduce you the new bioinformatic contribution of our lab to the science world: TFcompare (http://floresta.eead.csic.es/tfcompare/)

TFcompare is a tool for structural alignment of DNA motifs and protein domains from DNA binding protein complexes in Protein Data Bank.

The TFcompare algorithm calculates structural alignments between three dimensional structures of two DNA-protein complexes. The most interesting feature of TFcompare when compared with other methods is that it extracts individual protein domains and their recognized DNA sequences, aligning them separately and returning not only the structure superposition but the DNA sequence superposition too. In this way we can compare single domain affinity for different DNA sequences in DNA-protein complexes, especially transcription factors and their recognized cis elements.

The working schema of TFcompare is the following:

TFcompare takes as input two PDB identifiers. Structures from PDB are retrieved automatically and Pfam domains contacting DNA are calculated and trimmed from the original structure. Then all the domains from the first structure are aligned to all the domains from the second in several steps:

The program MAMMOTH performs the structural alignment.
The produced transformation matrices are applied to the coordinates of the DNA binding sites in order to derive the equivalent cis element superpositions.
Root-mean-squared deviations of superposed coordinates are calculated with beta-carbon atoms (proteins) and with N9 (purines) and N1 (pyrimidines) atoms (DNA).
Structural alignments are scored in terms of i) the number of identical superposed nucleotides (DNA Score 1-0) and ii) the sum of N9 and N1 atom pairs within 3.5 Å (DNA Score).

We can take as example the alignment of 1D5Y and 1BL0 structures, both are bacterial proteins with Helix-turn-helix (HTH) protein domains binding DNA.

We obtain the following results:

Results are ordered by structural similarity (RMSD), from both protein domain and DNA. In green colour are showed the alignment of similar structures (protein RMSD <=5.0 Å and DNA RMSD <= 3.5 Å) and in red colour dissimilar ones. 1D5Y contains two protein chains with HTH domains contacting DNA (trimmed domain structures are 1d5y_A1 and 1d5y_C1). 1BL0 have two HTH domains in its unique chain A (1bl0_A1 and 1bl0_A2). Structural alignment results show how 1bl0_A1 superposes very well with 1d5y_A1 and 1d5y_C1 (green colour). When DNA sequences recognized by these domains are aligned, they show a DNA motif conserved with three common nucleotides ‘CAC’. However, 1bl0_A2 superpositions (red colour) are not as good as previous ones, and DNA motif ‘CAC’ is not preserved when checking the resulting DNA alignment.

Each row contains an alignment of a pair of DNA binding domains, showing a picture of their structures before and after superposition. DNA alignment is also shown.

PDB files with aligned structures can be downloaded by left-clicking on the domain names and DNA sequences. Opening them with PDB viewer software (Pymol for ex.) is possible to visualize the resulting superposition after structural alignment.

Results column headers and their meaning:
Pair: Pair number
Domain_Query: PDB name, chain and domain number of the Query
Domain_Sbjct: PDB name, chain and domain number of the Sbjct
DNA_Query: DNA site recognized by the Query domain
DNA_Sbjct: DNA site recognized by the Query domain
Similar: 1 if both protein domains and DNA sites are below RMSD thresholds, 5.0 A and 3.5 A
respectively
DNA_Alignment: DNA sites structurally aligned
DNA_Aligned: Number of aligned nucleotides
DNA_Score_1-0: Number of identical nucleotides
DNA_Score: Structural alignment score
DNA_RMSD: RMSD of the structurally aligned DNA sites
PROT_RMSD: RMSD of the structurally aligned protein domains
3D_Alignment: 3D Visualization of aligned structures