30 de enero de 2024

HOWTO run primers4clades with Docker container

One of the curses of developing bioinformatics software is the long-term sustainability of the code. Inevitably some tools reach dead ends and are abandoned. This can happen for several reason, notably lack of funding and hardware failures. 

One such example is our primers4clades, which used to live at http://floresta.eead.csic.es/primers4clades and http://maya.ccg.unam.mx/primers4clades, but is not available as a Web server anymore. However, Pablo Vinuesa and me thought it would be great to still provide it as a legacy tool and eventually packaged it as a Docker image, which is freely available at https://hub.docker.com/r/csicunam/primers4clades . This is easy to run, as explained below, and avoids installing the pipeline locally, which is challenging due to a number of dependencies which are no longer easy to find (read more at https://github.com/eead-csic-compbio/primers4clades).

1) Docker installation

You can find instructions for installing the Docker (Server) engine on Linux at https://docs.docker.com/engine/install/#server .

On Windows, our recommended procedure is to i) install the Windows Subsystem for Linux (WSL) and then ii) the Docker engine:

Please let us know if you manage to run the examples below with Docker Desktop for MacOS.


2) Running primers4clades on Docker

The following command-line examples show how to run primers4clades using Docker with an input FASTA file named 'bla1.fna' containing nucleotide CDS sequences. You can check the format required here:

# 2024-01-20. Running primers4clades using a Docker container

#>>> STEP1) produce clusters and display NJ-tree
$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2clusters.pl -i bla1.fna 


# /primers4clades/CVS_code/marfil/Fasta2clusters.pl -i bla1.fna -c universal -d 2.21 -f 0 -m 0 -s 0 -M  -b  -e  -D 0 -S 0

# read_FASTA_sequence : skipped sequence identical to 013:
[Stenotrophomonas_maltophilia]_TEXM3A_NACIP1_4


# read_FASTA_sequence : skipped sequence identical to 013:
[Stenotrophomonas_maltophilia]_TEXM3D_NACIP1_4

# number of sequences read = 14
# number of recognised taxa = 9
# aligning translated sequences...
# computing distance matrix...
Using SPRNG -- Scalable Parallel Random Number Generator
# run_PUZZLE_DIST alpha = 0.92
# alignment stats: length = 321 %gaps = 9.81 %constant = 26.5
# distance matrix = bla1_aln.phy.dist

# mean distance = 0.34813
# max distance = 1.91051 ( 001 <=> 006 )
# min distance = 0.00000 ( 003 <=> 005 )

# NJ tree with cluster labels :


    +-004__0 [Stenotrophomonas_maltophilia_D457]_D457_...
  +-3
  ! +--007__0 [Stenotrophomonas_maltophilia_JV3]_JV3_...
  !
  !   +-009__0 [Stenotrophomonas_maltophilia_R551_3]_R551_3_...
  ! +-4
  ! ! ! +-011__0 [Stenotrophomonas_maltophilia]_TEXM2S_MKIMI4_21_...
  ! ! +-2
  ! !   +-015__0 [Stenotrophomonas_maltophilia]_TEXM3S_MKIMI4_5_...
  5-6
  ! !   +006__0 [Stenotrophomonas_maltophilia]_ESTM1A_MKCAZ1_5_...
  ! ! +-1
  ! ! ! +010__0 [Stenotrophomonas_maltophilia]_TEXM2D_MKIMI4_3_...
  ! +-7
  !   ! +013__0 [Stenotrophomonas_maltophilia]_TEXM3D_MKCAZ8_7_...
  !   ! !
  !   +-8  +003__0 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555_...
  !     !  !
  !     +-10  +005__0 [Stenotrophomonas_maltophilia_EPM1]_EPM1_...
  !        !  !
  !        +-11    +002__0 [Stenotrophomonas_maltophilia]_13637_...
  !           !  +-9
  !           +-12 +008__0 [Stenotrophomonas_maltophilia_K279a]_K279a_...
  !              !
  !              +016__0 [Stenotrophomonas_sp_]_YAU14D1_LEIMI4_7_...
  !
  +-----------------------------------------------------001__0 [Stenotrophomonas_acidaminiphila]_TEXM2D_MKIMI4_3_...

# labelled tree : bla1_aln_nj.ph
# fully labelled tree : bla1_aln_nj_labelled.ph
# text formatted tree : bla1_aln_nj_txt.graph
# fully labelled text formatted tree : bla1_aln_nj_txt_labelled.graph
# CLUSTER 0 members = 14 : .//bla1_cluster_0.fna .//bla1_cluster_0.faa




#>>> STEP2). Select a specific cluster, e.g. 003__0,016__0

$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2clusters.pl -i bla1.fna -c 11 -M bla1_aln.phy.dist -b 003__0,016__0 


# /primers4clades/CVS_code/marfil/Fasta2clusters.pl -i bla1.fna -c 11 -d 2.21 -f 0 -m 0 -s 0 -M bla1_aln.phy.dist -b 003__0,016__0 -e  -D 0 -S 0

# read_FASTA_sequence : skipped sequence identical to 014:
[Stenotrophomonas_maltophilia]_TEXM3A_NACIP1_4

# read_FASTA_sequence : skipped sequence identical to 014:[Stenotrophomonas_maltophilia]_TEXM3D_MKCAZ8_7

# number of sequences read = 14
# number of recognised taxa = 9
# number of valid cluster boundaries = 2
# size of user-selected cluster = 5
# distance matrix = bla1_aln.phy.dist


# max distance = 1.91051 ( 001 <=> 006 )
# min distance = 0.00000 ( 003 <=> 005 )

# NJ tree with cluster labels :


    +-004
  +-3
  ! +--007
  !
  !   +-009
  ! +-4
  ! ! ! +-011
  ! ! +-2
  ! !   +-015
  5-6
  ! !   +006
  ! ! +-1
  ! ! ! +010
  ! +-7
  !   ! +013
  !   ! !
  !   +-8  +003__0 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555_...
  !     !  !
  !     +-10  +005__0 [Stenotrophomonas_maltophilia_EPM1]_EPM1_...
  !        !  !
  !        +-11    +002__0 [Stenotrophomonas_maltophilia]_13637_...
  !           !  +-9
  !           +-12 +008__0 [Stenotrophomonas_maltophilia_K279a]_K279a_...
  !              !
  !              +016__0 [Stenotrophomonas_sp_]_YAU14D1_LEIMI4_7_...
  !
  +-----------------------------------------------------001

# labelled tree : ./bla1_aln_nj.ph
# fully labelled tree : ./bla1_aln_nj_labelled.ph
# text formatted tree : ./bla1_aln_nj_txt.graph
# fully labelled text formatted tree : ./bla1_aln_nj_txt_labelled.graph
# CLUSTER 0 members = 5 : .//bla1_cluster_0.fna .//bla1_cluster_0.faa

#>>> STEP3) Search for primers on selected cluster bla1_cluster_0 and evaluate  amplicon with nucleotide substitution models, min quality 50, compute codon frequencies from input (-C)
$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2primers.pl -n bla1_cluster_0.fna -p bla1_cluster_0.faa -C -P 50


# /primers4clades/CVS_code/marfil/Fasta2primers.pl -n bla1_cluster_0.fna -p bla1_cluster_0.faa -e  -P 50 -M 0 -k 0 -r  -S 0 -u 0 -T 55 -o c -s 0 -B 0 -L 0 -c  -C 1 -f 0 -m  -l 0 -R 0 -D

# /primers4clades/CVS_code/marfil/Fasta2primers.pl : cwd = /home/vinuesa/tmp/test_P4C_docker

# cutting amplicons...
## table redundancy vs amplicons stats:
# Stenotrophomonas_sp_TA57 redundancy = 0.78 amplicons = 1
# input_derived_005a0a18 redundancy = 0.86 amplicons = 1
# Stenotrophomonas_maltophilia redundancy = 0.87 amplicons = 0

# evaluating primers...
# primer file.tab = bla1_cluster_0.fna_primers.list.oligo_eval.tab
# ranking pairs of primers...

# mapfile : bla1_cluster_0.fna_primers.list.oligo_eval.tab.png


## Amplicon 1 codon_usage_table = Stenotrophomonas_sp_TA57 :
CCGGCCATCTTCTTGATaayatgaagyt 5'->3' N 100 294 (aligned residues)
ccggtcacctgctggacaacatgaagct >001 002 [Stenotrophomonas_maltophilia]_13637
ccggtcacctgctggacaacatgaagct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
ccggtcacctgctggacaacatgaagct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
ccggtcacctgctggacaacatgaagct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
ccggtcacctgctggacaacatgaaact >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
....!..!..!..!..!..!.....?!.
CCGGTCACCTGCTGGACaacatgaarct codeh_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct relax_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct degen_corr bla1_cluster_0_amp1_N100

GGGCAAGTTGGGCATcraacttcttyt 5'->3' C 100 294 (aligned residues)
tggccaactgcgcgtcgaacttcttct >001 002 [Stenotrophomonas_maltophilia]_13637
tggccaactgcgcgtcgaacttcttct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
tggccaactgcgcgtcgaacttcttct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
tggccaactgcgcgtcgaacttcttct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
tggccaactgcgcgtcgaacttcttct >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
!...!.!!..!..!..!........!.
TGGCCAACTGCGCGTcgaacttcttct codeh_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct relax_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct degen_corr bla1_cluster_0_amp1_C294

# primer pair quality = 100%
# expected PCR product length (nt) = 585
# fwd: minTm = 65.6 maxTm = 67.3
# rev: minTm = 66.7 maxTm = 66.7
#--
# phylogenetic amplicon evaluation:
# subs.model = TrNI
# n_of_seqs = 5 n_alignments_sites = 613
# n_of_seqs_with_composition_bias = 0
# %fully resolved quartets = 60 %partially resolved quartets = 0 %non resolved quartets = 40.0
# alpha = 0
# mean aLRT = 0.46 median aLRT = 0.46
# compressed outfile = my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57.tgz
# end_of_amplicon

#>>> STEP 3.2) As in STEP 3, get primers on selected cluster bla1_cluster_0, but evaluate amplicon with amino acid replacement models
$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2primers.pl \
  -n bla1_cluster_0.fna -p bla1_cluster_0.faa -C -P 50 -M 


# /primers4clades/CVS_code/marfil/Fasta2primers.pl -n bla1_cluster_0.fna -p bla1_cluster_0.faa -e  -P 50 -M 1 -k 0 -r  -S 0 -u 0 -T 55 -o c -s 0 -B 0 -L 0 -c  -C 1 -f 0 -m  -l 0 -R 0 -D

# /primers4clades/CVS_code/marfil/Fasta2primers.pl : cwd = /home/vinuesa/tmp/test_P4C_docker

# cutting amplicons...
## table redundancy vs amplicons stats:
# Stenotrophomonas_sp_TA57 redundancy = 0.78 amplicons = 1
# input_derived_182c455f redundancy = 0.86 amplicons = 1
# Stenotrophomonas_maltophilia redundancy = 0.87 amplicons = 0

# evaluating primers...
# primer file.tab = bla1_cluster_0.fna_primers.list.oligo_eval.tab
# ranking pairs of primers...

# mapfile : bla1_cluster_0.fna_primers.list.oligo_eval.tab.png


## Amplicon 1 codon_usage_table = Stenotrophomonas_sp_TA57 :
CCGGCCATCTTCTTGATaayatgaagyt 5'->3' N 100 294 (aligned residues)
ccggtcacctgctggacaacatgaagct >001 002 [Stenotrophomonas_maltophilia]_13637
ccggtcacctgctggacaacatgaagct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
ccggtcacctgctggacaacatgaagct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
ccggtcacctgctggacaacatgaagct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
ccggtcacctgctggacaacatgaaact >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
....!..!..!..!..!..!.....?!.
CCGGTCACCTGCTGGACaacatgaarct codeh_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct relax_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct degen_corr bla1_cluster_0_amp1_N100

GGGCAAGTTGGGCATcraacttcttyt 5'->3' C 100 294 (aligned residues)
tggccaactgcgcgtcgaacttcttct >001 002 [Stenotrophomonas_maltophilia]_13637
tggccaactgcgcgtcgaacttcttct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
tggccaactgcgcgtcgaacttcttct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
tggccaactgcgcgtcgaacttcttct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
tggccaactgcgcgtcgaacttcttct >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
!...!.!!..!..!..!........!.
TGGCCAACTGCGCGTcgaacttcttct codeh_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct relax_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct degen_corr bla1_cluster_0_amp1_C294

# primer pair quality = 100%
# expected PCR product length (nt) = 585
# fwd: minTm = 65.6 maxTm = 67.3
# rev: minTm = 66.7 maxTm = 66.7
#--

ProtTest launched!

Please, watch out the content of file "my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57_aln.phy.prottest" for information on the
progress of the analysis and for possible error messages
Using SPRNG -- Scalable Parallel Random Number Generator
# phylogenetic amplicon evaluation:
# subs.model = WAG
# n_of_seqs = 5 n_alignments_sites = 205
# n_of_seqs_with_composition_bias = 0
# %fully resolved quartets = 60 %partially resolved quartets = 0 %non resolved quartets = 40.0
# alpha = 0
# mean aLRT = 0.46 median aLRT = 0.46
# compressed outfile = my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57.tgz
# end_of_amplicon



## Amplicon 2 codon_usage_table = input_derived_182c455f :
GCCAGCTGGCTGcarccvatggc 5'->3' N 55 243 (aligned residues)
gcgtcctggctgcagccgatggc >001 002 [Stenotrophomonas_maltophilia]_13637
gcgtcctggctgcagccgatggc >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
gcgtcctggctgcagccgatggc >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
gcgtcctggctgcagccgatggc >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
gcgtcctggctgcagccgatggc >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
..!!!.........!..!.....
GCGTCCTGGCTGcagccgatggc codeh_corr bla1_cluster_0_amp1_N55
gcgtcctggctgcagccgatggc relax_corr bla1_cluster_0_amp1_N55
gcgtcctggctgcagccgatggc degen_corr bla1_cluster_0_amp1_N55

GCGAAGCTGCGCTtrtartcytcga 5'->3' C 55 243 (aligned residues)
gcgaagctgcgcttgtagtcctcga >001 002 [Stenotrophomonas_maltophilia]_13637
gcgaagctgcgcttgtagtcctcga >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
gcaaagctgcgcttgtagtcctcga >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
gcgaagctgcgcttgtagtcctcga >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
gcgaagctgcgcttgtagtcctcga >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
..?...........!..!..!....
GCGAAGCTGCGCTtgtagtcctcga codeh_corr bla1_cluster_0_amp1_C243
gcraagctgcgcttgtagtcctcga relax_corr bla1_cluster_0_amp1_C243
gcraagctgcgcttgtagtcctcga degen_corr bla1_cluster_0_amp1_C243

# primer pair quality = 100%
# expected PCR product length (nt) = 567
# fwd: minTm = 74.4 maxTm = 74.4
# rev: minTm = 67.5 maxTm = 67.5
#--

ProtTest launched!

Please, watch out the content of file "my_bla1_cluster_0_aln_1__input_derived_182c455f_aln.phy.prottest" for information on the
progress of the analysis and for possible error messages
Using SPRNG -- Scalable Parallel Random Number Generator
# phylogenetic amplicon evaluation:
# subs.model = WAG
# n_of_seqs = 5 n_alignments_sites = 198
# n_of_seqs_with_composition_bias = 0
# %fully resolved quartets = 0 %partially resolved quartets = 0 %non resolved quartets = 100.0
# alpha = 0
# mean aLRT = 0.00 median aLRT = 0.00
# compressed outfile = my_bla1_cluster_0_aln_1__input_derived_182c455f.tgz
# end_of_amplicon




#################################################################
## overall quality stats:
# good/bad pairs ratio = 2/2
# individual quality checks stats:


## abbreviations:
# quality     = quality estimation [best = 100, worst = 0] %
# crosspot    = potential of cross-hybridization [0-1]
# relaxdeg    = relaxed (3' extended segment) degeneracy
# fulldeg     = full primer degeneracy
# minTm       = minimum Tm for the pool of relaxed primers
# maxTm       = maximum Tm for the pool of relaxed primers
# hpinpot     = potential of primer hairpin [0-1]
# selfpot     = potential of primer self-priming [0-1]
# aLRT        = approximate likelihood-ratio test [0-1],
#               overall confidence on trees built using this amplicon

#>>> STEP4) Analyze output files, including TAR.GZ bundles and PNG figures
$ ls

bla1_aln.faa my_bla1_cluster_0_aln_1__input_derived_005a0a18.tgz
bla1_aln_nj_labelled.ph my_bla1_cluster_0_aln_1__input_derived_182c455f_aln.aa_amp
bla1_aln_nj.ph my_bla1_cluster_0_aln_1__input_derived_182c455f.tgz
bla1_aln_nj_txt.graph my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57_aln.aa_amp
bla1_aln_nj_txt_labelled.graph my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57.tgz
bla1_aln.phy my_bla1_cluster_0_aln.faa
bla1_aln.phy.dist my_bla1_cluster_0_aln.fna
bla1_cluster_0.faa my_bla1_cluster_0_aln__input_derived_005a0a18.codon.use__codehops.out
bla1_cluster_0.fna my_bla1_cluster_0_aln__input_derived_182c455f.codon.use__codehops.out
bla1_cluster_0.fna_primers.list.oligo_eval.tab my_bla1_cluster_0_aln__Stenotrophomonas_maltophilia.codon.use__codehops.out
bla1_cluster_0.fna_primers.list.oligo_eval.tab.png my_bla1_cluster_0_aln__Stenotrophomonas_sp_TA57.codon.use__codehops.out
bla1.faa                                            my_bla1_cluster_0.faa
bla1.fna                                            my_bla1_cluster_0.f



8 de enero de 2024

footprintDB added new TFs, cis elements and binding interfaces

Over the last couple of weeks I have carried out the annual update of footprintDB, our database of transcription factors (TFs) with annotated cis elements and binding interfaces.

This involved three consecutive steps:

1) Updated 3d-footprint (completed 21/12/2023). This means that the collection of protein-DNA complexes from the Protein Data bank was updated and will help annotate more interface residues in TFs.

2) Updated the EEADannot collection with plant motifs and sites manually curated by us from papers published recently. You can see the sources at https://github.com/eead-csic-compbio/EEADannot .

3) Added JASPAR 2024 data and predicted interface residues for all included TFs.

 

You can check the current contents of each collection at https://footprintdb.eead.csic.es/index.php?databases , an overall summary is shown here:

 


The data can be downloaded in FASTA and TRANSFAC formats at https://footprintdb.eead.csic.es/download and the motifs have also been updated in https://github.com/rsa-tools/motif_databases , which feeds RSAT servers (see the plants server here). Note that some collections were left out due to licensing limitations.

Bruno