Mostrando entradas con la etiqueta bioinformática estructural. Mostrar todas las entradas
Mostrando entradas con la etiqueta bioinformática estructural. Mostrar todas las entradas

26 de febrero de 2024

Cómo modelar proteínas con colabfold en tu GPU local

Hola,

hoy explicaré cómo he configurado ColabFold para ejecutarlo en hardware local, en concreto en una máquina con Ubuntu 20.04 que tiene una CPU Xeon CascadeLake Silver 4210R y una tarjeta gráfica NVIDIA RTX 3090. Puedes leer más sobre AlphaFold y ColabFold aquí o en este vídeo.

1) Necesité actualizar cuda, en concreto con la versión 11.8, algo que hice como se explica aquí:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt update
sudo apt install cuda-toolkit-11-8

2) Tras reinicar, actualicé la variable de ambiente $PATH añadiendo estas líneas a mi fichero .bashrc:

export PATH=/usr/local/cuda/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

3) Seguí las instrucciones para Linux en https://github.com/YoshitakaMo/localcolabfold?tab=readme-ov-file#for-linux . En mi caso tardó unos pocos minutos y sumó 15G al disco duro. 

4) Probé que todo funciona con un fichero FASTA qee contiene varias secuencias, guardando los resultados en la carpeta multi/ :

colabfold_batch test.multi.faa multi/


Ahora resumo los resultados que obtuve:

  • Por defecto colabfold_batch se conecta a https://api.colabfold.com para hacer búsquedas de secuencias similares y construir alineamientos múltiples (MSA) en un formato similar a FASTA que se llama a3m. Por tanto esa parte del trabajo no se hace localmente y tendrás que usarla con medida. Si quieres saber qué versión de las bases de datos de secuencias de ColabFold estás usando puedes consultar https://github.com/sokrypton/ColabFold/wiki/MSA-Server-Database-History
  • Las primeras secuencias que usé para construir modelos en formato PDB tenían entre 114 y 162 resíduos y tardaban un par de minutos, pego aquí el log: 
  • 2024-02-26 13:05:56,639 Running colabfold 1.5.5 (d36504fad856a0e1df511c5b0434957707030319)
    2024-02-26 13:05:56,862 Running on GPU
    2024-02-26 13:05:57,354 Found 5 citations for tools or databases
    2024-02-26 13:05:57,355 Query 1/29: test1 (length 114)
    2024-02-26 13:05:58,348 Sleeping for 6s. Reason: PENDING
    2024-02-26 13:06:05,308 Sleeping for 10s. Reason: RUNNING
    2024-02-26 13:06:30,822 Padding length to 124
    2024-02-26 13:06:58,791 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=67.9 pTM=0.31
    2024-02-26 13:07:00,321 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=68.8 pTM=0.329 tol=9.09
    2024-02-26 13:07:01,845 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=69.7 pTM=0.358 tol=2.28
    2024-02-26 13:07:03,373 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=69.8 pTM=0.367 tol=3.04
    2024-02-26 13:07:03,374 alphafold2_ptm_model_1_seed_000 took 32.6s (3 recycles)
    2024-02-26 13:07:04,871 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=71.2 pTM=0.308
    2024-02-26 13:07:06,323 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=71.6 pTM=0.346 tol=2.14
    2024-02-26 13:07:07,848 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=71.7 pTM=0.358 tol=2.38
    2024-02-26 13:07:09,345 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=71.8 pTM=0.365 tol=1.31
    2024-02-26 13:07:09,346 alphafold2_ptm_model_2_seed_000 took 5.9s (3 recycles)
    2024-02-26 13:07:10,984 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=68.1 pTM=0.298
    2024-02-26 13:07:12,529 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=68.6 pTM=0.34 tol=4.11
    2024-02-26 13:07:13,992 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=69.2 pTM=0.36 tol=2.49
    2024-02-26 13:07:15,484 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=68.8 pTM=0.367 tol=1.67
    2024-02-26 13:07:15,485 alphafold2_ptm_model_3_seed_000 took 6.1s (3 recycles)
    2024-02-26 13:07:16,987 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=66.1 pTM=0.289
    2024-02-26 13:07:18,435 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=66.8 pTM=0.283 tol=5.61
    2024-02-26 13:07:19,933 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=67.7 pTM=0.298 tol=1.03
    2024-02-26 13:07:21,444 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=67.9 pTM=0.318 tol=2.04
    2024-02-26 13:07:21,445 alphafold2_ptm_model_4_seed_000 took 5.9s (3 recycles)
    2024-02-26 13:07:22,931 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=66.8 pTM=0.322
    2024-02-26 13:07:24,403 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=68.2 pTM=0.345 tol=9.46
    2024-02-26 13:07:25,860 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=68.8 pTM=0.354 tol=2.3
    2024-02-26 13:07:27,342 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=69.4 pTM=0.358 tol=1.58
    2024-02-26 13:07:27,342 alphafold2_ptm_model_5_seed_000 took 5.9s (3 recycles)
    2024-02-26 13:07:27,369 reranking models by 'plddt' metric
    2024-02-26 13:07:27,369 rank_001_alphafold2_ptm_model_2_seed_000 pLDDT=71.8 pTM=0.365
    2024-02-26 13:07:27,369 rank_002_alphafold2_ptm_model_1_seed_000 pLDDT=69.8 pTM=0.367
    2024-02-26 13:07:27,370 rank_003_alphafold2_ptm_model_5_seed_000 pLDDT=69.4 pTM=0.358
    2024-02-26 13:07:27,370 rank_004_alphafold2_ptm_model_3_seed_000 pLDDT=68.8 pTM=0.367
    2024-02-26 13:07:27,370 rank_005_alphafold2_ptm_model_4_seed_000 pLDDT=67.9 pTM=0.318
    2024-02-26 13:07:28,679 Query 2/29: test2 (length 120)
    2024-02-26 13:07:29,695 Sleeping for 9s. Reason: PENDING
    2024-02-26 13:07:39,667 Sleeping for 9s. Reason: PENDING
    2024-02-26 13:07:49,628 Sleeping for 6s. Reason: PENDING
    2024-02-26 13:07:56,610 Sleeping for 6s. Reason: PENDING
    2024-02-26 13:08:03,608 Sleeping for 5s. Reason: PENDING
    2024-02-26 13:08:09,564 Sleeping for 6s. Reason: PENDING
    2024-02-26 13:08:16,534 Sleeping for 7s. Reason: PENDING
    2024-02-26 13:08:24,518 Sleeping for 5s. Reason: PENDING
    2024-02-26 13:08:30,471 Sleeping for 7s. Reason: PENDING
    2024-02-26 13:08:38,498 Sleeping for 5s. Reason: PENDING
    2024-02-26 13:08:44,459 Sleeping for 6s. Reason: PENDING
    2024-02-26 13:08:51,412 Sleeping for 9s. Reason: PENDING
    2024-02-26 13:09:01,412 Sleeping for 9s. Reason: PENDING
    2024-02-26 13:09:11,370 Sleeping for 8s. Reason: PENDING
    2024-02-26 13:09:20,337 Sleeping for 8s. Reason: PENDING
    2024-02-26 13:09:29,316 Sleeping for 6s. Reason: RUNNING
    2024-02-26 13:09:39,703 Padding length to 124
    2024-02-26 13:09:41,194 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=73.9 pTM=0.55
    2024-02-26 13:09:42,664 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=73.8 pTM=0.549 tol=3.08
    2024-02-26 13:09:44,110 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=73.6 pTM=0.549 tol=1.59
    2024-02-26 13:09:45,593 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=74.4 pTM=0.555 tol=1.67
    2024-02-26 13:09:45,593 alphafold2_ptm_model_1_seed_000 took 5.9s (3 recycles)
    2024-02-26 13:09:47,073 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=76.7 pTM=0.565
    2024-02-26 13:09:48,523 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=77.1 pTM=0.57 tol=0.571
    2024-02-26 13:09:49,977 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=76.7 pTM=0.569 tol=0.958
    2024-02-26 13:09:51,421 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=76.9 pTM=0.572 tol=0.881
    2024-02-26 13:09:51,421 alphafold2_ptm_model_2_seed_000 took 5.8s (3 recycles)
    2024-02-26 13:09:52,877 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=75.6 pTM=0.542
    2024-02-26 13:09:54,315 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=75.9 pTM=0.548 tol=1.52
    2024-02-26 13:09:55,763 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=75.9 pTM=0.552 tol=1.69
    2024-02-26 13:09:57,218 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=75.8 pTM=0.555 tol=0.883
    2024-02-26 13:09:57,219 alphafold2_ptm_model_3_seed_000 took 5.8s (3 recycles)
    2024-02-26 13:09:58,705 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=73.9 pTM=0.56
    2024-02-26 13:10:00,177 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=75.1 pTM=0.57 tol=2.2
    2024-02-26 13:10:01,620 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=75.4 pTM=0.571 tol=1.78
    2024-02-26 13:10:03,076 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=75.7 pTM=0.575 tol=2.04
    2024-02-26 13:10:03,077 alphafold2_ptm_model_4_seed_000 took 5.8s (3 recycles)
    2024-02-26 13:10:04,572 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=75.2 pTM=0.573
    2024-02-26 13:10:06,026 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=76.2 pTM=0.585 tol=2.12
    2024-02-26 13:10:07,498 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=76.2 pTM=0.587 tol=1.44
    2024-02-26 13:10:08,958 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=76.6 pTM=0.589 tol=1.21
    2024-02-26 13:10:08,959 alphafold2_ptm_model_5_seed_000 took 5.9s (3 recycles)
    2024-02-26 13:10:08,986 reranking models by 'plddt' metric
    2024-02-26 13:10:08,987 rank_001_alphafold2_ptm_model_2_seed_000 pLDDT=76.9 pTM=0.572
    2024-02-26 13:10:08,987 rank_002_alphafold2_ptm_model_5_seed_000 pLDDT=76.6 pTM=0.589
    2024-02-26 13:10:08,987 rank_003_alphafold2_ptm_model_3_seed_000 pLDDT=75.8 pTM=0.555
    2024-02-26 13:10:08,987 rank_004_alphafold2_ptm_model_4_seed_000 pLDDT=75.7 pTM=0.575
    2024-02-26 13:10:08,987 rank_005_alphafold2_ptm_model_1_seed_000 pLDDT=74.4 pTM=0.555
    2024-02-26 13:10:10,274 Query 3/29: test3 (length 162)
    2024-02-26 13:10:11,241 Sleeping for 8s. Reason: PENDING
    2024-02-26 13:10:20,230 Sleeping for 10s. Reason: PENDING
    2024-02-26 13:10:31,195 Sleeping for 5s. Reason: RUNNING
    2024-02-26 13:10:37,194 Sleeping for 6s. Reason: RUNNING
    2024-02-26 13:10:44,153 Sleeping for 9s. Reason: RUNNING
    2024-02-26 13:10:54,142 Sleeping for 10s. Reason: RUNNING
    2024-02-26 13:11:05,109 Sleeping for 8s. Reason: RUNNING
    2024-02-26 13:11:14,082 Sleeping for 6s. Reason: RUNNING
    2024-02-26 13:11:21,030 Sleeping for 8s. Reason: RUNNING
    2024-02-26 13:11:30,005 Sleeping for 9s. Reason: RUNNING
    2024-02-26 13:11:39,984 Sleeping for 7s. Reason: RUNNING
    2024-02-26 13:11:47,941 Sleeping for 10s. Reason: RUNNING
    2024-02-26 13:11:58,903 Sleeping for 9s. Reason: RUNNING
    2024-02-26 13:12:08,881 Sleeping for 5s. Reason: RUNNING
    2024-02-26 13:12:14,891 Sleeping for 9s. Reason: RUNNING
    2024-02-26 13:12:32,470 Padding length to 172
    2024-02-26 13:13:00,100 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=62.9 pTM=0.433
    2024-02-26 13:13:02,186 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=63.4 pTM=0.433 tol=8.27
    2024-02-26 13:13:04,282 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=64.1 pTM=0.431 tol=8.02
    2024-02-26 13:13:06,403 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=63.8 pTM=0.427 tol=8.51
    2024-02-26 13:13:06,404 alphafold2_ptm_model_1_seed_000 took 33.9s (3 recycles)
    2024-02-26 13:13:08,535 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=60.2 pTM=0.417
    2024-02-26 13:13:10,637 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=61 pTM=0.423 tol=6.09
    2024-02-26 13:13:12,742 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=61.4 pTM=0.428 tol=3.33
    2024-02-26 13:13:14,846 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=61.2 pTM=0.425 tol=1.8
    2024-02-26 13:13:14,846 alphafold2_ptm_model_2_seed_000 took 8.4s (3 recycles)
    2024-02-26 13:13:16,979 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=62 pTM=0.425
    2024-02-26 13:13:19,099 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=62.3 pTM=0.43 tol=7.21
    2024-02-26 13:13:21,197 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=61.9 pTM=0.426 tol=4.32
    2024-02-26 13:13:23,303 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=62.1 pTM=0.427 tol=5.17
    2024-02-26 13:13:23,304 alphafold2_ptm_model_3_seed_000 took 8.4s (3 recycles)
    2024-02-26 13:13:25,461 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=60.5 pTM=0.418
    2024-02-26 13:13:27,552 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=60.8 pTM=0.417 tol=9.52
    2024-02-26 13:13:29,658 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=60.3 pTM=0.41 tol=9.23
    2024-02-26 13:13:31,749 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=60.5 pTM=0.411 tol=6.08
    2024-02-26 13:13:31,750 alphafold2_ptm_model_4_seed_000 took 8.4s (3 recycles)
    2024-02-26 13:13:33,905 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=59.9 pTM=0.416
    2024-02-26 13:13:36,038 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=60.1 pTM=0.415 tol=9.96
    2024-02-26 13:13:38,154 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=59.7 pTM=0.409 tol=3.89
    2024-02-26 13:13:40,252 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=59.4 pTM=0.415 tol=11.4
    2024-02-26 13:13:40,253 alphafold2_ptm_model_5_seed_000 took 8.5s (3 recycles)
    2024-02-26 13:13:40,294 reranking models by 'plddt' metric
    2024-02-26 13:13:40,294 rank_001_alphafold2_ptm_model_1_seed_000 pLDDT=63.8 pTM=0.427
    2024-02-26 13:13:40,294 rank_002_alphafold2_ptm_model_3_seed_000 pLDDT=62.1 pTM=0.427
    2024-02-26 13:13:40,294 rank_003_alphafold2_ptm_model_2_seed_000 pLDDT=61.2 pTM=0.425
    2024-02-26 13:13:40,295 rank_004_alphafold2_ptm_model_4_seed_000 pLDDT=60.5 pTM=0.411
    2024-02-26 13:13:40,295 rank_005_alphafold2_ptm_model_5_seed_000 pLDDT=59.4 pTM=0.415
  • Como ves el propio script espera cuando el servidor remoto está ocupado.
  • Para cada secuencia problema obtienes figuras como éstas:




Hasta pronto,

Bruno

PD Cuando acabes de instalar deberías tener algo similar en tu fichero $HOME/.bashrc:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH="$HOME/colabfold/colabfold-conda/bin:$PATH"

Cuando no vayas a usar colabfold comenta estas líneas para usar perl y python del sistema

PD2 Me comentan colegas de ULiverpool que haciendo 800-900 MSAs al día en https://api.colabfold.com no han tenido problemas 

PD3 Si no reconoce tu GPU mira posibles soluciones en https://github.com/YoshitakaMo/localcolabfold/issues/210

PD4: Puedes bloquear la versión de CUDA que hayas instalado con algo como:

 sudo apt-mark hold cuda-toolkit-11-8


8 de junio de 2023

Janet Thornton, se jubila la madre de la bioinformática estructural

Hola, ayer escuché por videoconferencia un rato de la última charla que dio Janet Thornton en el EMBL-European Bioinformatics Institute antes de jubilarse.

Woman standing at podium
fuente: EMBL

Janet ha sido sin duda una de las madres de la bioinformática, sobre todo en el área de la bioinformática estructural. Por ejemplo, tiene su propio modelo de sustitución de aminoácidos para hacer filogenias (JTT). Podéis ver su enorme influencia en la literatura en EuropePMC, o las palabras que le dedican Alfonso Valencia o Roland Dunbrack. Entre la larga lista de discípulos están por ejemplo David Jones (la J del modelo JTT, parte del equipo de AlphaFold y examinador de mi tesis), Christine Orengo o Nick Luscombe, todos ellos autores a los que he citado innumerables veces.

Yo la conocí personalmente en un congreso en Brasil, el ISMB2006, donde tuve la fortuna de tener una reunión cara a cara con ella donde me dio consejos y ánimos para mi incipiente carrera en la ciencia. Hace 4 años volvimos a coincidir en la cafetería del EMBL-EBI y recordando ese rato me dijo algo como "no te ha ido tan mal, verdad?".

En su charla recordaba observaciones que ella y su grupo habían hecho en las últimas décadas sobre la lista de aminoácidos importantes para explicar la catálisis de las enzimas. Eso le dio pie a repasar los resultados de los últimos años de trabajo, liderados por Antonio Ribeiro, donde se han centrado en sistematizar las reglas y en medir de manera objetiva la similitud entre mecanismos enzimáticos, ganando capacidad predictiva por el camino (ver por ejemplo https://europepmc.org/article/MED/36659981 y https://europepmc.org/article/PPR/PPR540240). Terminó esta parte de la charla, la última que pude seguir, diciendo que lo más difícil de jubilarse era no poder planear nuevos experimentos y estudios para todo lo que queda por saber. Creo que esa curiosidad es el motor para muchos de nosotros, no tengo nada más que añadir,

hasta pronto,

Bruno


 


 

 


 

 



12 de enero de 2023

Hito 200K del Protein Data Bank

Hola,

a punto de empezar el congreso PAG30, empezamos el año con una buena noticia: 

undefined 

Pego aquí la reseña oficial del Protein Data Bank: 

Date: Wed, 11 Jan 2023 09:55:41 -0500
Subject: pdb-l: PDB Reaches a New Milestone: 200,000+ Entries

With this week's update, the PDB archive contains a record 200,069 
entries. The archive passed 150,000 structures in 2019 and 
100,000 structures in 2014. 

Established in 1971, this central, public archive has reached this 
critical milestone thanks to the efforts of structural biologists 
throughout the world who contribute their experimentally-determined 
protein and nucleic acid structure data.

wwPDB data centers support online access to three-dimensional structures 
of biological macromolecules that help researchers understand many 
facets of biomedicine, agriculture, and ecology, from protein synthesis 
to health and disease to biological energy. Many milestones have been 
reached since the archive released the 100,000th structure in 2014. PDB 
data have been seminal in understanding SARS-CoV-2, and provided the 
foundation for the development of AI/ML techniques for predicting 
protein structure. The 50th anniversary of the PDB was celebrated 
throughout 2021 <https://www.wwpdb.org/pdb50>.

Today, the archive is quite large, containing more than 3,000,000 files 
related to these PDB entries that require more than 1086 Gbytes of 
storage. PDB structures contain more than 1.8 billion non-hydrogen atoms.


Function follows form
In the 1950s, scientists had their first direct look at the structures 
of proteins and DNA at the atomic level. Determination of these early 
three-dimensional structures by X-ray crystallography ushered in a new 
era in biology-one driven by the intimate link between form and 
biological function. As the value of archiving and sharing these data 
were quickly recognized by the scientific community, the Protein Data 
Bank (PDB) was established as the first open access digital resource in 
all of biology by an international collaboration in 1971 with data 
centers located in the US and the UK.

Among the first structures deposited in the PDB were those of myoglobin 
and hemoglobin, two oxygen-binding molecules whose structures were 
elucidated by Chemistry Nobel Laureates John Kendrew and Max Perutz. 
With this week's regular update, the PDB welcomes 266 new structures 
into the archive. These structures join others vital to drug discovery, 
bioinformatics and education.

The PDB is growing rapidly, increasing in size ~13% since 2011. In 2022, 
an average of 275 new structures were released to the scientific 
community each week. The resource is accessed hundreds of millions of 
times annually by researchers, students, and educators intent on 
exploring how different proteins are related to one another, to clarify 
fundamental biological mechanisms and discover new medicines.

Twenty Years of Collaboration
Since its inception, the PDB has been a community-driven enterprise, 
evolving into a mission critical international resource for biological 
research. The wwPDB partnership was established in July 2003 with PDBe, 
PDBj, and RCSB PDB. Today, the collaboration includes partners BMRB 
(joined in 2006) and EMDB (2021).

The wwPDB ensures that these valuable PDB data are securely stored, 
expertly managed, and made freely available for the benefit of 
scientists and educators around the globe. wwPDB data centers work 
closely with community experts to define deposition and annotation 
policies, resolve data representation issues, and implement community 
validation standards. In addition, the wwPDB works to raise the profile 
of structural biology with increasingly broad audiences.

Each structure submitted to the archive is carefully curated by wwPDB 
staff before release. New depositions are checked and enhanced with 
value-added annotations and linked with other important biological data 
to ensure that PDB structures are discoverable and interpretable by 
users with a wide range of backgrounds and interests.

wwPDB eagerly awaits the next 100,000 structures and the invaluable 
knowledge these new data will bring.

Hasta pronto,

Bruno

16 de noviembre de 2022

Algoritmos en Bioinformática Estructural v2022

Hola, 

tras un parón de casi 4 años acabo de actualizar el curso de Algoritmos en Bioinformática Estructural que llevaba manteniendo desde 2008 para mis antiguos alumnos de la Licenciatura en Ciencias Genómicas de la UNAM en Cuernavaca.

Puedes encontrar la v2022 en:

http://eead-csic-compbio.github.io/bioinformatica_estructural


 

Figura. Comparación de predicciones de AlphaFold2 y OpenFold para la estructura 7KDX:B. Figura tomada de https://github.com/aqlaboratory/openfold.


Principales novedades:

 

Hasta pronto,

Bruno