#!/perl/bioinfo: web services

Mostrando entradas con la etiqueta web services. Mostrar todas las entradas

3 de febrero de 2021

Consultas GraphQL al Protein Data Bank

Hola,

durante el mantenimiento anual de footprintDB he descubierto que el Protein Data Bank ha cerrado su interfaz de web servicios (WS) https://www.rcsb.org/pages/webservices/rest-fetch . Leyendo en https://data.rcsb.org/index.html#data-api veo que ahora se pueden hacer consultas REST o GraphQL, y opté por la segunda por aprender un poco.

Hay documentación en https://data.rcsb.org/migration-guide.html#legacy-fetch-api para migrar las consultas WS antiguas a GraphQL. Las nuevas consultas tienen este aspecto:

query={
 entry(entry_id:"9ANT"{
  polymer_entities{
   rcsb_polymer_entity{
    pdbx_description
   }
   rcsb_entity_source_organism{
    scientific_name
   }
   rcsb_polymer_entity_container_identifiers{
    entry_id
    auth_asym_ids
    reference_sequence_identifiers{
     database_accession,
     database_name
    }
   }
  }
  struct{
   title
  }
  rcsb_primary_citation{
   pdbx_database_id_PubMed
  }
 }
}

Y se pueden convertir a una URL como https://data.rcsb.org/graphql?query={entry(entry_id:%229ANT%22){polymer_entities{rcsb_polymer_entity{pdbx_description}rcsb_entity_source_organism{scientific_name},rcsb_polymer_entity_container_identifiers{entry_id,auth_asym_ids,reference_sequence_identifiers{database_accession,database_name}}}struct{title}rcsb_primary_citation{pdbx_database_id_PubMed}}}

Si haces click verás que en Chrome o Firefox no obtienes el resultado esperado, con un error:

Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

Creo que se debe a un problema del servidor Tomcat del PDB. Sin embargo, sí funciona con wget y así obtienes la salida JSON que puedes procesar con tu lenguaje favorito:

wget 'https://data.rcsb.org/graphql?query={entry(entry_id:%229ANT%22){polymer_entities{rcsb_polymer_entity{pdbx_description}rcsb_entity_source_organism{scientific_name},rcsb_polymer_entity_container_identifiers{entry_id,auth_asym_ids,reference_sequence_identifiers{database_accession,database_name}}}struct{title}rcsb_primary_citation{pdbx_database_id_PubMed}}}' -O-

Hasta pronto,

Bruno

8 de febrero de 2018

Modelling transcription factor complexes in the terminal

Hi,
I just updated our good old server TFmodeller, available at http://www.ccg.unam.mx/tfmodeller,
so that it uses the current collection of 95% non-redundant protein-DNA complexes extracted from the Protein Data Bank. As of Feb 7, 2018, there are 977 such complexes, which can be downloaded.
In addition, I just wrote a Perl client so that predictions can be ordered from the terminal via a SOAP interface, producing XML output which should be easy to parse. The PDB format coordinates of the resulting model are marked-up with tags. The input is a peptide FASTA file. This is the code:

#!/usr/bin/perl -w
use strict;
use SOAP::Lite;

my $URL = 'http://maya.ccg.unam.mx:8080/axis';
my $WSDL = "$URL/TFmodellerService.jws?WSDL";

my $infile = $ARGV[0] || die "# usage: $0 \n";
my ($inFASTA,$result);
open(FASTA,'<',$infile) ||die "#cannot read $infile\n";
$/ = undef;
$inFASTA = ; # slurp
close(FASTA);

my $soap = SOAP::Lite->uri($URL)
                     ->proxy($URL, timeout => 300 )
                     ->service($WSDL);

eval { $result = $soap->TFmodeller($inFASTA) };
if($@){ die $@ }
else{ print $result }

The original Java client can still be found here. Note that the output includes a sequence alignment of query and template with residues contacting DNA nitrogen bases highlighted:

HEADER model 1zrf_A 203 DNACOMPLEX resol=2.10 21 8e-46
REMARK query    MILLLSKKNAEERLAAFIYNLSRRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLG
REMARK template KVGNLAFLDVTGRIAQTLLNLAKQ-PDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILK
REMARK contacts ........................ ................*........***...*...

Bruno

24 de octubre de 2017

SOAP interface of footprintDB

[Deprecated, see https://github.com/eead-csic-compbio/footprintDBclient]

Hi,
this entry shows how to query footprintDB from a Perl script.
First, make sure you have module SOAP::lite, which you can install with: $ sudo cpan -i SOAP::Lite. The following Perl5 code shows how to make all dna, protein and text queries, obtaining XML output in all cases.
Note that if you register you can query also your private databases (see details in documentation). Also note that protein searches are time consuming, and if you wish to annotate a large number of proteins it is advised that BLASTP searches are done in your own hardware, with the appropriate FASTA files., as explained in a previous post. Cheers, Bruno.

#!/usr/bin/perl -w
use strict;
use SOAP::Lite;

my $footprintDBusername = ''; # type your username if registered
my ($result,$sequence,$sequence_name,$datatype,$keyword) = ('','','','','');
my $server = SOAP::Lite
-> uri('footprintdb')
-> proxy('http://floresta.eead.csic.es/footprintdb/ws.cgi');

## sample protein sequence
$sequence_name = 'test';
$sequence = 'IYNLSRRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAVKGKYITIEN';

$result = $server->protein_query($sequence_name,$sequence,$footprintDBusername);
unless($result->fault()){
 print $result->result(); 
}else{
 print 'error: ' . join(', ',$result->faultcode(),$result->faultstring());
}

## sample regulatory motif sequence
#$sequence = 'TGTGANNN'; # possible format
#$sequence = "TGTGA\nTGTGG\nTGTAG"; # another format
#transfac format for position weight matrices can be used as heredoc
$sequence= <<EOM;
DE 1a0a_AB
01 1 93 0 2
02 0 96 0 0
03 58 33 3 2
04 8 78 6 4
05 8 5 75 8
06 1 2 47 46
07 1 2 84 9
XX
EOM

$result = $server->DNA_motif_query($sequence_name,$sequence,$footprintDBusername);
unless($result->fault()){
 print $result->result();
}else{
 print 'error: ' . join(', ',$result->faultcode(),$result->faultstring());
}

$keyword = "myb";
$datatype = "site";
$result = $server->text_query($keyword,$datatype,$footprintDBusername);
unless($result->fault()){
 print $result->result();
}else{
 print 'error: ' . join(', ',$result->faultcode(),$result->faultstring());
}