9 de marzo de 2018

growth of protein-DNA complexes in the Protein Data Bank

Hi,
while checking the update logs of our good old 3D-footprint, a database of DNA-binding protein structures updated weekly from the Protein Data Bank, I found a folder with logs starting Februrary, 2009. The plot below shows how the number of non-redundant complexes, filtered in terms of protein sequence identity, has doubled in just a decade:

The nr95 bundle can be downloaded in PDB format at
http://maya.ccg.unam.mx/tfmodeller/get_library.cgi

Other related files are available at:
http://floresta.eead.csic.es/3dfootprint/download.html

cheers,
Bruno

1 de marzo de 2018

sustituyendo el operador smartmatch en Perl5

Hola,
tras el anuncio reciente de que la versión 5.28 de Perl5 eliminaría el operador smartmatch ~~ (ver aquí) me he encontrado un programa viejito dónde se usaba, a pesar de que ha sido experimental desde hace mucho tiempo. Con ayuda de

$ perldoc perlop

cuelgo aquí un ejemplo de cómo sustituir este operador por código estándar:

use strict;
use warnings;

my @array = qw( JASPAR footprintDB UNIPROBE );
my %hash  = ( JASPAR => 1, footprintDB => 2, UNIPROBE => 3 );

my $element = 'footprintDB';

# array context
if ($element ~~ @array){
  print "\@array contains element '$element' (smartmatch)\n";
}

if (grep { $element eq $_ } @array){
  print "\@array contains element '$element' (core Perl5)\n";
}

# hash context
if(/$element/ ~~ %hash){
  print "\%hash contains a key matching regex /$element/ (smartmatch)\n";
}

if(grep { /$element/ } keys(%hash)){
  print "\%hash contains a key matching regex /$element/ (core Perl5)\n";
}

Un saludo,
Bruno

8 de febrero de 2018

Modelling transcription factor complexes in the terminal

Hi,
I just updated our good old server TFmodeller, available at http://www.ccg.unam.mx/tfmodeller,
so that it uses the current collection of 95% non-redundant protein-DNA complexes extracted from the Protein Data Bank. As of Feb 7, 2018, there are 977 such complexes, which can be downloaded.
In addition, I just wrote a Perl client so that predictions can be ordered from the terminal via a SOAP interface, producing XML output which should be easy to parse. The PDB format coordinates of the resulting model are marked-up with tags. The input is a peptide FASTA file. This is the code:

#!/usr/bin/perl -w
use strict;
use SOAP::Lite;

my $URL = 'http://maya.ccg.unam.mx:8080/axis';
my $WSDL = "$URL/TFmodellerService.jws?WSDL";

my $infile = $ARGV[0] || die "# usage: $0 \n";
my ($inFASTA,$result);
open(FASTA,'<',$infile) ||die "#cannot read $infile\n";
$/ = undef;
$inFASTA = ; # slurp
close(FASTA);

my $soap = SOAP::Lite->uri($URL)
                     ->proxy($URL, timeout => 300 )
                     ->service($WSDL);

eval { $result = $soap->TFmodeller($inFASTA) };
if($@){ die $@ }
else{ print $result }

The original Java client can still be found here. Note that the output includes a sequence alignment of query and template with residues contacting DNA nitrogen bases highlighted:

HEADER model 1zrf_A 203 DNACOMPLEX resol=2.10 21 8e-46
REMARK query    MILLLSKKNAEERLAAFIYNLSRRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLG
REMARK template KVGNLAFLDVTGRIAQTLLNLAKQ-PDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILK
REMARK contacts ........................ ................*........***...*...

Bruno

6 de febrero de 2018

Bioinformática Estructural 2018 en la LCG-UNAM

Hola,
desde hoy martes 6 hasta el viernes 9 de febrero pasaremos las mañanas en la Licenciatura de Ciencias Genómicas de la UNAM aprendiendo a modelar las secuencias de ADN y de proteínas como moléculas que se pliegan y cumplen su función en 3D. Para ello usaremos algoritmos y software descritos en este material, actualizado en enero de 2018:

http://eead-csic-compbio.github.io/bioinformatica_estructural

Composición de dominios de Cas9 en complejo con crRNAs, tomada de https://www.ncbi.nlm.nih.gov/pubmed/29035385.



Se puede también descargar en PDF,
hasta luego,
Bruno

12 de enero de 2018

Summary of Teshome's visit to the lab

Hi,
Teshome Dagne Mulugeta visited the lab for a few months in 2017, with the goal of learning how to apply our tools and resources on transcriptional regulation to his project with Salmo salar and SalmoBase. This post is just to share a link to Teshome's own account on his time in the lab in 2017:

https://norbis.w.uib.no/learning-advanced-analysis-of-gene-regulation-in-zaragoza
Logo of ORE element annotated in https://tinyurl.com/ybznvlhe

Cheers,
Bruno