#!/perl/bioinfo

30 de enero de 2024

HOWTO run primers4clades with Docker container

One of the curses of developing bioinformatics software is the long-term sustainability of the code. Inevitably some tools reach dead ends and are abandoned. This can happen for several reason, notably lack of funding and hardware failures.

One such example is our primers4clades, which used to live at http://floresta.eead.csic.es/primers4clades and ~~http://maya.ccg.unam.mx/primers4clades~~, but is not available as a Web server anymore. However, Pablo Vinuesa and me thought it would be great to still provide it as a legacy tool and eventually packaged it as a Docker image, which is freely available at https://hub.docker.com/r/csicunam/primers4clades . This is easy to run, as explained below, and avoids installing the pipeline locally, which is challenging due to a number of dependencies which are no longer easy to find (read more at https://github.com/eead-csic-compbio/primers4clades).

1) Docker installation

You can find instructions for installing the Docker (Server) engine on Linux at https://docs.docker.com/engine/install/#server .

On Windows, our recommended procedure is to i) install the Windows Subsystem for Linux (WSL) and then ii) the Docker engine:

The installation of WSL is simple, as it you can fetch it from the Microsoft Store for free (read more here and here).
The installation of Docker engine within WSL is explained at https://dev.to/felipecrs/simply-run-docker-on-wsl2-3o8 and https://docs.docker.com/engine/install/ubuntu .

Please let us know if you manage to run the examples below with Docker Desktop for MacOS.

2) Running primers4clades on Docker

The following command-line examples show how to run primers4clades using Docker with an input FASTA file named 'bla1.fna' containing nucleotide CDS sequences. You can check the format required here:

# 2024-01-20. Running primers4clades using a Docker container

#>>> STEP1) produce clusters and display NJ-tree
$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2clusters.pl -i bla1.fna

# /primers4clades/CVS_code/marfil/Fasta2clusters.pl -i bla1.fna -c universal -d 2.21 -f 0 -m 0 -s 0 -M -b -e -D 0 -S 0

# read_FASTA_sequence : skipped sequence identical to 013: [Stenotrophomonas_maltophilia]_TEXM3A_NACIP1_4

# read_FASTA_sequence : skipped sequence identical to 013:[Stenotrophomonas_maltophilia]_TEXM3D_NACIP1_4

# number of sequences read = 14
# number of recognised taxa = 9
# aligning translated sequences...
# computing distance matrix...
Using SPRNG -- Scalable Parallel Random Number Generator
# run_PUZZLE_DIST alpha = 0.92
# alignment stats: length = 321 %gaps = 9.81 %constant = 26.5
# distance matrix = bla1_aln.phy.dist

# mean distance = 0.34813
# max distance = 1.91051 ( 001 <=> 006 )
# min distance = 0.00000 ( 003 <=> 005 )

# NJ tree with cluster labels :

    +-004__0 [Stenotrophomonas_maltophilia_D457]_D457_...
+-3
! +--007__0 [Stenotrophomonas_maltophilia_JV3]_JV3_...
!
!   +-009__0 [Stenotrophomonas_maltophilia_R551_3]_R551_3_...
! +-4
! ! ! +-011__0 [Stenotrophomonas_maltophilia]_TEXM2S_MKIMI4_21_...
! ! +-2
! !   +-015__0 [Stenotrophomonas_maltophilia]_TEXM3S_MKIMI4_5_...
5-6
! !   +006__0 [Stenotrophomonas_maltophilia]_ESTM1A_MKCAZ1_5_...
! ! +-1
! ! ! +010__0 [Stenotrophomonas_maltophilia]_TEXM2D_MKIMI4_3_...
! +-7
!   ! +013__0 [Stenotrophomonas_maltophilia]_TEXM3D_MKCAZ8_7_...
!   ! !
!   +-8 +003__0 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555_...
!     ! !
!     +-10 +005__0 [Stenotrophomonas_maltophilia_EPM1]_EPM1_...
!        ! !
!        +-11    +002__0 [Stenotrophomonas_maltophilia]_13637_...
!           ! +-9
!           +-12 +008__0 [Stenotrophomonas_maltophilia_K279a]_K279a_...
!              !
!              +016__0 [Stenotrophomonas_sp_]_YAU14D1_LEIMI4_7_...
!
+-----------------------------------------------------001__0 [Stenotrophomonas_acidaminiphila]_TEXM2D_MKIMI4_3_...

# labelled tree : bla1_aln_nj.ph
# fully labelled tree : bla1_aln_nj_labelled.ph
# text formatted tree : bla1_aln_nj_txt.graph
# fully labelled text formatted tree : bla1_aln_nj_txt_labelled.graph
# CLUSTER 0 members = 14 : .//bla1_cluster_0.fna .//bla1_cluster_0.faa

#>>> STEP2). Select a specific cluster, e.g. 003__0,016__0
$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2clusters.pl -i bla1.fna -c 11 -M bla1_aln.phy.dist -b 003__0,016__0

# /primers4clades/CVS_code/marfil/Fasta2clusters.pl -i bla1.fna -c 11 -d 2.21 -f 0 -m 0 -s 0 -M bla1_aln.phy.dist -b 003__0,016__0 -e -D 0 -S 0

# read_FASTA_sequence : skipped sequence identical to 014:[Stenotrophomonas_maltophilia]_TEXM3A_NACIP1_4

# read_FASTA_sequence : skipped sequence identical to 014:[Stenotrophomonas_maltophilia]_TEXM3D_MKCAZ8_7

# number of sequences read = 14
# number of recognised taxa = 9
# number of valid cluster boundaries = 2
# size of user-selected cluster = 5
# distance matrix = bla1_aln.phy.dist

# max distance = 1.91051 ( 001 <=> 006 )
# min distance = 0.00000 ( 003 <=> 005 )

# NJ tree with cluster labels :

    +-004
+-3
! +--007
!
!   +-009
! +-4
! ! ! +-011
! ! +-2
! !   +-015
5-6
! !   +006
! ! +-1
! ! ! +010
! +-7
!   ! +013
!   ! !
!   +-8 +003__0 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555_...
!     ! !
!     +-10 +005__0 [Stenotrophomonas_maltophilia_EPM1]_EPM1_...
!        ! !
!        +-11    +002__0 [Stenotrophomonas_maltophilia]_13637_...
!           ! +-9
!           +-12 +008__0 [Stenotrophomonas_maltophilia_K279a]_K279a_...
!              !
!              +016__0 [Stenotrophomonas_sp_]_YAU14D1_LEIMI4_7_...
!
+-----------------------------------------------------001

# labelled tree : ./bla1_aln_nj.ph
# fully labelled tree : ./bla1_aln_nj_labelled.ph
# text formatted tree : ./bla1_aln_nj_txt.graph
# fully labelled text formatted tree : ./bla1_aln_nj_txt_labelled.graph
# CLUSTER 0 members = 5 : .//bla1_cluster_0.fna .//bla1_cluster_0.faa

#>>> STEP3) Search for primers on selected cluster bla1_cluster_0 and evaluate amplicon with nucleotide substitution models, min quality 50, compute codon frequencies from input (-C)
$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2primers.pl -n bla1_cluster_0.fna -p bla1_cluster_0.faa -C -P 50

# /primers4clades/CVS_code/marfil/Fasta2primers.pl -n bla1_cluster_0.fna -p bla1_cluster_0.faa -e -P 50 -M 0 -k 0 -r -S 0 -u 0 -T 55 -o c -s 0 -B 0 -L 0 -c -C 1 -f 0 -m -l 0 -R 0 -D

# /primers4clades/CVS_code/marfil/Fasta2primers.pl : cwd = /home/vinuesa/tmp/test_P4C_docker

# cutting amplicons...
## table redundancy vs amplicons stats:
# Stenotrophomonas_sp_TA57 redundancy = 0.78 amplicons = 1
# input_derived_005a0a18 redundancy = 0.86 amplicons = 1
# Stenotrophomonas_maltophilia redundancy = 0.87 amplicons = 0

# evaluating primers...
# primer file.tab = bla1_cluster_0.fna_primers.list.oligo_eval.tab
# ranking pairs of primers...

# mapfile : bla1_cluster_0.fna_primers.list.oligo_eval.tab.png

## Amplicon 1 codon_usage_table = Stenotrophomonas_sp_TA57 :
CCGGCCATCTTCTTGATaayatgaagyt 5'->3' N 100 294 (aligned residues)
ccggtcacctgctggacaacatgaagct >001 002 [Stenotrophomonas_maltophilia]_13637
ccggtcacctgctggacaacatgaagct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
ccggtcacctgctggacaacatgaagct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
ccggtcacctgctggacaacatgaagct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
ccggtcacctgctggacaacatgaaact >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
....!..!..!..!..!..!.....?!.
CCGGTCACCTGCTGGACaacatgaarct codeh_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct relax_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct degen_corr bla1_cluster_0_amp1_N100

GGGCAAGTTGGGCATcraacttcttyt 5'->3' C 100 294 (aligned residues)
tggccaactgcgcgtcgaacttcttct >001 002 [Stenotrophomonas_maltophilia]_13637
tggccaactgcgcgtcgaacttcttct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
tggccaactgcgcgtcgaacttcttct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
tggccaactgcgcgtcgaacttcttct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
tggccaactgcgcgtcgaacttcttct >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
!...!.!!..!..!..!........!.
TGGCCAACTGCGCGTcgaacttcttct codeh_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct relax_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct degen_corr bla1_cluster_0_amp1_C294

# primer pair quality = 100%
# expected PCR product length (nt) = 585
# fwd: minTm = 65.6 maxTm = 67.3
# rev: minTm = 66.7 maxTm = 66.7
#--
# phylogenetic amplicon evaluation:
# subs.model = TrNI
# n_of_seqs = 5 n_alignments_sites = 613
# n_of_seqs_with_composition_bias = 0
# %fully resolved quartets = 60 %partially resolved quartets = 0 %non resolved quartets = 40.0
# alpha = 0
# mean aLRT = 0.46 median aLRT = 0.46
# compressed outfile = my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57.tgz
# end_of_amplicon

#>>> STEP 3.2) As in STEP 3, get primers on selected cluster bla1_cluster_0, but evaluate amplicon with amino acid replacement models
$ docker run --rm -v "$PWD:$PWD" -w "$PWD" -u $UID:$GROUPS -it csicunam/primers4clades Fasta2primers.pl \
-n bla1_cluster_0.fna -p bla1_cluster_0.faa -C -P 50 -M

# /primers4clades/CVS_code/marfil/Fasta2primers.pl -n bla1_cluster_0.fna -p bla1_cluster_0.faa -e -P 50 -M 1 -k 0 -r -S 0 -u 0 -T 55 -o c -s 0 -B 0 -L 0 -c -C 1 -f 0 -m -l 0 -R 0 -D

# /primers4clades/CVS_code/marfil/Fasta2primers.pl : cwd = /home/vinuesa/tmp/test_P4C_docker

# cutting amplicons...
## table redundancy vs amplicons stats:
# Stenotrophomonas_sp_TA57 redundancy = 0.78 amplicons = 1
# input_derived_182c455f redundancy = 0.86 amplicons = 1
# Stenotrophomonas_maltophilia redundancy = 0.87 amplicons = 0

# evaluating primers...
# primer file.tab = bla1_cluster_0.fna_primers.list.oligo_eval.tab
# ranking pairs of primers...

# mapfile : bla1_cluster_0.fna_primers.list.oligo_eval.tab.png

## Amplicon 1 codon_usage_table = Stenotrophomonas_sp_TA57 :
CCGGCCATCTTCTTGATaayatgaagyt 5'->3' N 100 294 (aligned residues)
ccggtcacctgctggacaacatgaagct >001 002 [Stenotrophomonas_maltophilia]_13637
ccggtcacctgctggacaacatgaagct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
ccggtcacctgctggacaacatgaagct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
ccggtcacctgctggacaacatgaagct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
ccggtcacctgctggacaacatgaaact >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
....!..!..!..!..!..!.....?!.
CCGGTCACCTGCTGGACaacatgaarct codeh_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct relax_corr bla1_cluster_0_amp1_N100
ccggtcacctgctggacaacatgaarct degen_corr bla1_cluster_0_amp1_N100

GGGCAAGTTGGGCATcraacttcttyt 5'->3' C 100 294 (aligned residues)
tggccaactgcgcgtcgaacttcttct >001 002 [Stenotrophomonas_maltophilia]_13637
tggccaactgcgcgtcgaacttcttct >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
tggccaactgcgcgtcgaacttcttct >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
tggccaactgcgcgtcgaacttcttct >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
tggccaactgcgcgtcgaacttcttct >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
!...!.!!..!..!..!........!.
TGGCCAACTGCGCGTcgaacttcttct codeh_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct relax_corr bla1_cluster_0_amp1_C294
tggccaactgcgcgtcgaacttcttct degen_corr bla1_cluster_0_amp1_C294

# primer pair quality = 100%
# expected PCR product length (nt) = 585
# fwd: minTm = 65.6 maxTm = 67.3
# rev: minTm = 66.7 maxTm = 66.7
#--

ProtTest launched!

Please, watch out the content of file "my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57_aln.phy.prottest" for information on the
progress of the analysis and for possible error messages
Using SPRNG -- Scalable Parallel Random Number Generator
# phylogenetic amplicon evaluation:
# subs.model = WAG
# n_of_seqs = 5 n_alignments_sites = 205
# n_of_seqs_with_composition_bias = 0
# %fully resolved quartets = 60 %partially resolved quartets = 0 %non resolved quartets = 40.0
# alpha = 0
# mean aLRT = 0.46 median aLRT = 0.46
# compressed outfile = my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57.tgz
# end_of_amplicon

## Amplicon 2 codon_usage_table = input_derived_182c455f :
GCCAGCTGGCTGcarccvatggc 5'->3' N 55 243 (aligned residues)
gcgtcctggctgcagccgatggc >001 002 [Stenotrophomonas_maltophilia]_13637
gcgtcctggctgcagccgatggc >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
gcgtcctggctgcagccgatggc >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
gcgtcctggctgcagccgatggc >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
gcgtcctggctgcagccgatggc >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
..!!!.........!..!.....
GCGTCCTGGCTGcagccgatggc codeh_corr bla1_cluster_0_amp1_N55
gcgtcctggctgcagccgatggc relax_corr bla1_cluster_0_amp1_N55
gcgtcctggctgcagccgatggc degen_corr bla1_cluster_0_amp1_N55

GCGAAGCTGCGCTtrtartcytcga 5'->3' C 55 243 (aligned residues)
gcgaagctgcgcttgtagtcctcga >001 002 [Stenotrophomonas_maltophilia]_13637
gcgaagctgcgcttgtagtcctcga >002 003 [Stenotrophomonas_maltophilia_Ab55555]_Ab55555
gcaaagctgcgcttgtagtcctcga >003 005 [Stenotrophomonas_maltophilia_EPM1]_EPM1
gcgaagctgcgcttgtagtcctcga >004 008 [Stenotrophomonas_maltophilia_K279a]_K279a
gcgaagctgcgcttgtagtcctcga >005 016 [Stenotrophomonas_sp.]_YAU14D1_LEIMI4_7
..?...........!..!..!....
GCGAAGCTGCGCTtgtagtcctcga codeh_corr bla1_cluster_0_amp1_C243
gcraagctgcgcttgtagtcctcga relax_corr bla1_cluster_0_amp1_C243
gcraagctgcgcttgtagtcctcga degen_corr bla1_cluster_0_amp1_C243

# primer pair quality = 100%
# expected PCR product length (nt) = 567
# fwd: minTm = 74.4 maxTm = 74.4
# rev: minTm = 67.5 maxTm = 67.5
#--

ProtTest launched!

Please, watch out the content of file "my_bla1_cluster_0_aln_1__input_derived_182c455f_aln.phy.prottest" for information on the
progress of the analysis and for possible error messages
Using SPRNG -- Scalable Parallel Random Number Generator
# phylogenetic amplicon evaluation:
# subs.model = WAG
# n_of_seqs = 5 n_alignments_sites = 198
# n_of_seqs_with_composition_bias = 0
# %fully resolved quartets = 0 %partially resolved quartets = 0 %non resolved quartets = 100.0
# alpha = 0
# mean aLRT = 0.00 median aLRT = 0.00
# compressed outfile = my_bla1_cluster_0_aln_1__input_derived_182c455f.tgz
# end_of_amplicon

#################################################################
## overall quality stats:
# good/bad pairs ratio = 2/2
# individual quality checks stats:

## abbreviations:
# quality     = quality estimation [best = 100, worst = 0] %
# crosspot    = potential of cross-hybridization [0-1]
# relaxdeg    = relaxed (3' extended segment) degeneracy
# fulldeg     = full primer degeneracy
# minTm       = minimum Tm for the pool of relaxed primers
# maxTm       = maximum Tm for the pool of relaxed primers
# hpinpot     = potential of primer hairpin [0-1]
# selfpot     = potential of primer self-priming [0-1]
# aLRT        = approximate likelihood-ratio test [0-1],
#               overall confidence on trees built using this amplicon

#>>> STEP4) Analyze output files, including TAR.GZ bundles and PNG figures
$ ls

bla1_aln.faa my_bla1_cluster_0_aln_1__input_derived_005a0a18.tgz
bla1_aln_nj_labelled.ph my_bla1_cluster_0_aln_1__input_derived_182c455f_aln.aa_amp
bla1_aln_nj.ph my_bla1_cluster_0_aln_1__input_derived_182c455f.tgz
bla1_aln_nj_txt.graph my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57_aln.aa_amp
bla1_aln_nj_txt_labelled.graph my_bla1_cluster_0_aln_1__Stenotrophomonas_sp_TA57.tgz
bla1_aln.phy my_bla1_cluster_0_aln.faa
bla1_aln.phy.dist my_bla1_cluster_0_aln.fna
bla1_cluster_0.faa my_bla1_cluster_0_aln__input_derived_005a0a18.codon.use__codehops.out
bla1_cluster_0.fna my_bla1_cluster_0_aln__input_derived_182c455f.codon.use__codehops.out
bla1_cluster_0.fna_primers.list.oligo_eval.tab my_bla1_cluster_0_aln__Stenotrophomonas_maltophilia.codon.use__codehops.out
bla1_cluster_0.fna_primers.list.oligo_eval.tab.png my_bla1_cluster_0_aln__Stenotrophomonas_sp_TA57.codon.use__codehops.out
bla1.faa                                            my_bla1_cluster_0.faa
bla1.fna                                            my_bla1_cluster_0.f

8 de enero de 2024

footprintDB added new TFs, cis elements and binding interfaces

Over the last couple of weeks I have carried out the annual update of footprintDB, our database of transcription factors (TFs) with annotated cis elements and binding interfaces.

This involved three consecutive steps:

1) Updated 3d-footprint (completed 21/12/2023). This means that the collection of protein-DNA complexes from the Protein Data bank was updated and will help annotate more interface residues in TFs.

2) Updated the EEADannot collection with plant motifs and sites manually curated by us from papers published recently. You can see the sources at https://github.com/eead-csic-compbio/EEADannot .

3) Added JASPAR 2024 data and predicted interface residues for all included TFs.

You can check the current contents of each collection at https://footprintdb.eead.csic.es/index.php?databases , an overall summary is shown here:

The data can be downloaded in FASTA and TRANSFAC formats at https://footprintdb.eead.csic.es/download and the motifs have also been updated in https://github.com/rsa-tools/motif_databases , which feeds RSAT servers (see the plants server here). Note that some collections were left out due to licensing limitations.

Bruno

28 de noviembre de 2023

ChatGPT: Academic Friend or Cheating Liar?

Hace unos días tuvimos de visita a Ivan Molineris, un colega de la Università degli Studi di Torino, que nos vino a contar su experiencia reciente usando ChatGPT en el mundo académico.

Vídeo de la charla completa aquí.

El resumen de la charla, producido con GPT-4, decía:

"En el cambiante mundo de la inteligencia artificial (IA) tenemos al alcance de la mano sofisticados modelos de lenguaje. Éstos ponen a disposición vastos conocimientos, respuestas rápidas, escriben software e interaccionan con seres humanos de la manera más natural, hablando. Pero, ¿qué supone esto para la ciencia?

Este seminario explora la doble naturaleza de ChatGPT como valiosa herramienta académica y como un potencial recurso deshonesto. Por un lado, ChatGPT es un asistente de investigación dinámico, que ofrece conocimientos en muchos temas, ofrece explicaciones y ayuda a resolver problemas. Para muchos es una manera innovadora de sortear áreas en las que no tenemos experiencia, mejorando nuestra comprensión de temas complejos. Por otro lado, es fácil abusar de estas capacidades. Ahora que estudiantes e investigadores podemos escribir, responder preguntas y crear nuevos contenidos con tanta facilidad, ¿dónde ponemos la línea que separa el plagio del trabajo legítimo?

En este seminario exploraremos las implicaciones éticas, salvaguardas y la relación cambiante entre la IA y la academia. Por medio de ejemplos intentaremos decidir si ChatGPT es una herramienta amiga que supera los límites convencionales o si, en cambio, miente y compromete la integridad de nuestras investigaciones."

En esta entrada resumo los principales mensajes de Iván.

No te fíes de un chatBot.

Pregunta algo relacionado con un problema científico. Pídele afirmaciones que sean falseables, que puedas comprobar.
A continuación pídele referencias que soporten la información que te proporcione.
Comprueba y lee las referencias para validarlas (parece que GPT-4 es mucho mejor que GPT-3.5 a este respecto).

El tamaño del corpus de entrenamiento importa. Fíjate en el número de parámetros usados para entrenar modelos grandes de lenguaje (LLMs):

GPT4: 1.76E12(30USD/mes, gratis con limitaciones en Bing chat)
GPT3.5: 175E9
Bard: 137E9 (pero alucina menos que GPT-3.5 según Ivan)
LLMA2 70E9

Un chatBot es una herramienta para todas la aplicaciones. Comparado con aplicaciones tradicionales, como Google translate por ejemplo, tiene la ventaja de que le puedes preguntar sobre sus respuestas, refinarlas y pedirle que escriba sus respuestas con un cierto estilo.
En qué destacan los chatBots? En lenguaje natural. Son estupendos para preguntarles cosas que nos llevaría mucho tiempo producir pero poco en comprobar.
Los LLMs tienen sesgos e ideología. Por ejemplo, un experimento reciente con ChatGPT mostró que está a favor de impuestos a las líneas aéreas.
chatGPT4 puede escribir código en R, python, javascript (yo también he probado Perl), pero:

La calidad es mejor cuanto más pequeño el problema, así mejor subdivide tus tareas antes.
Que el código funcione no significa que los resultados ni los parámetros sugeridos sean correctos en todos los escenarios.
Debes entender el código que propone, de hecho le puedes pedir que te lo explique.

GPT-4 puede calcular (mejor que GPT-3), pero para debes pedirle que te haga las operaciones paso a paso (chain of thought).

Para terminar os dejo algunas curiosidades:

El artículo ya clásico "Attention Is All You Need"
Un libro de mecánico de fluidos escrito por Javier Blasco con ayuda de ChatGPT.
El repositorio para las ideas de Ivan para un LLM

Hasta luego,

Bruno