#!/perl/bioinfo

8 de febrero de 2018

Modelling transcription factor complexes in the terminal

Hi,
I just updated our good old server TFmodeller, available at http://www.ccg.unam.mx/tfmodeller,
so that it uses the current collection of 95% non-redundant protein-DNA complexes extracted from the Protein Data Bank. As of Feb 7, 2018, there are 977 such complexes, which can be downloaded.
In addition, I just wrote a Perl client so that predictions can be ordered from the terminal via a SOAP interface, producing XML output which should be easy to parse. The PDB format coordinates of the resulting model are marked-up with tags. The input is a peptide FASTA file. This is the code:

#!/usr/bin/perl -w
use strict;
use SOAP::Lite;

my $URL = 'http://maya.ccg.unam.mx:8080/axis';
my $WSDL = "$URL/TFmodellerService.jws?WSDL";

my $infile = $ARGV[0] || die "# usage: $0 \n";
my ($inFASTA,$result);
open(FASTA,'<',$infile) ||die "#cannot read $infile\n";
$/ = undef;
$inFASTA = ; # slurp
close(FASTA);

my $soap = SOAP::Lite->uri($URL)
                     ->proxy($URL, timeout => 300 )
                     ->service($WSDL);

eval { $result = $soap->TFmodeller($inFASTA) };
if($@){ die $@ }
else{ print $result }

The original Java client can still be found here. Note that the output includes a sequence alignment of query and template with residues contacting DNA nitrogen bases highlighted:

HEADER model 1zrf_A 203 DNACOMPLEX resol=2.10 21 8e-46
REMARK query    MILLLSKKNAEERLAAFIYNLSRRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLG
REMARK template KVGNLAFLDVTGRIAQTLLNLAKQ-PDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILK
REMARK contacts ........................ ................*........***...*...

Bruno

6 de febrero de 2018

Bioinformática Estructural 2018 en la LCG-UNAM

Hola,
desde hoy martes 6 hasta el viernes 9 de febrero pasaremos las mañanas en la Licenciatura de Ciencias Genómicas de la UNAM aprendiendo a modelar las secuencias de ADN y de proteínas como moléculas que se pliegan y cumplen su función en 3D. Para ello usaremos algoritmos y software descritos en este material, actualizado en enero de 2018:

http://eead-csic-compbio.github.io/bioinformatica_estructural

Composición de dominios de Cas9 en complejo con crRNAs, tomada de https://www.ncbi.nlm.nih.gov/pubmed/29035385.

Se puede también descargar en PDF,
hasta luego,
Bruno

12 de enero de 2018

Summary of Teshome's visit to the lab

Hi,
Teshome Dagne Mulugeta visited the lab for a few months in 2017, with the goal of learning how to apply our tools and resources on transcriptional regulation to his project with Salmo salar and SalmoBase. This post is just to share a link to Teshome's own account on his time in the lab in 2017:

https://norbis.w.uib.no/learning-advanced-analysis-of-gene-regulation-in-zaragoza

Logo of ORE element annotated in https://tinyurl.com/ybznvlhe

Cheers,
Bruno

27 de diciembre de 2017

más one-liners Perl

Hola,
antes de que se acabe el año aprovecho para compartir con vosotros un excelente tutorial de one-liners de Perl, esos comandos que en una línea permiten ejecutar complejas operaciones en el terminal de Linux, el símbolo del sistema de Windows, o, mejor aún, desde dentro de una ventana de MobaXterm.
El tutorial se aloja e:

https://github.com/learnbyexample/Command-line-text-processing/blob/master/perl_the_swiss_knife.md

y tiene ejemplos tan útiles como:

# 1) calcula máximo de una lista de números separados por comas

$ echo '34,17,6' | perl -MList::Util=max -F, -lane 'print max @F'
34

# 2) valida y expande un one-liner a un programa completo más comprensible

$perl -MO=Deparse -ne 'if(!$#ARGV){$h{$_}=1; next} print if $h{$_}'
LINE: while (defined($_ = )) {
    unless ($#ARGV) {
        $h{$_} = 1;
        next;
    }
    print $_ if $h{$_};
}
-e syntax OK

El tutorial tiene también recetas para usar el resto de herramientas del terminal Linux, como grep, sed y muchas otras, en

https://github.com/learnbyexample/Command-line-text-processing

Feliz año!
Bruno

12 de diciembre de 2017

Secuencia de referencia para experimento TagSeq

Hola,
cada vez se van publicando más trabajos donde se emplea TagSeq, una versión low cost de RNAseq que se especializa en secuenciar el máximo número de transcritos posibles, pero sólo unos cuantos cientos de bases de su extremo 3', contando desde la cola poliA. Un tamaño típico de librería TagSeq es 500b.

Protocolo TagSeq, tomado de https://tinyurl.com/y9yc4u5a.

Cuando obtenemos lecturas o reads de este tipo y las queremos alinear contra los transcritos anotados del genoma de referencia puede ser útil, con vistas a posibles normalizaciones posteriores que consideren la longitud original del gen, recortar las secuencias de referencia. Os pongo un ejemplo en Perl:

zcat primaryTranscriptOnly.fa.gz | \
     perl -lne 'if(/^(>.*)/){$h=$1}else{$fa{$h} .= $_} END{ foreach $s (sort keys(%fa)){ print "$s\n".substr($fa{$s},-500)."\n" }}' > \     
     primaryTranscriptOnly.TagSeq500b.fa

Hasta luego,
Bruno