Hola,
hoy explicaré cómo he configurado ColabFold para ejecutarlo en hardware local, en concreto en una máquina con Ubuntu 20.04 que tiene una CPU Xeon CascadeLake Silver 4210R y una tarjeta gráfica NVIDIA RTX 3090. Puedes leer más sobre AlphaFold y ColabFold aquí o en este vídeo.
1) Necesité actualizar cuda, en concreto con la versión 11.8, algo que hice como se explica aquí:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt update
sudo apt install cuda-toolkit-11-8
2) Tras reinicar, actualicé la variable de ambiente $PATH añadiendo estas líneas a mi fichero .bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
3) Seguí las instrucciones para Linux en https://github.com/YoshitakaMo/localcolabfold?tab=readme-ov-file#for-linux . En mi caso tardó unos pocos minutos y sumó 15G al disco duro.
4) Probé que todo funciona con un fichero FASTA qee contiene varias secuencias, guardando los resultados en la carpeta multi/ :
colabfold_batch test.multi.faa multi/Ahora resumo los resultados que obtuve:
- Por defecto colabfold_batch se conecta a https://api.colabfold.com para hacer búsquedas de secuencias similares y construir alineamientos múltiples (MSA) en un formato similar a FASTA que se llama a3m. Por tanto esa parte del trabajo no se hace localmente y tendrás que usarla con medida. Si quieres saber qué versión de las bases de datos de secuencias de ColabFold estás usando puedes consultar https://github.com/sokrypton/ColabFold/wiki/MSA-Server-Database-History
- Las primeras secuencias que usé para construir modelos en formato PDB tenían entre 114 y 162 resíduos y tardaban un par de minutos, pego aquí el log:
- 2024-02-26 13:05:56,639 Running colabfold 1.5.5 (d36504fad856a0e1df511c5b0434957707030319)
2024-02-26 13:05:56,862 Running on GPU
2024-02-26 13:05:57,354 Found 5 citations for tools or databases
2024-02-26 13:05:57,355 Query 1/29: test1 (length 114)
2024-02-26 13:05:58,348 Sleeping for 6s. Reason: PENDING
2024-02-26 13:06:05,308 Sleeping for 10s. Reason: RUNNING
2024-02-26 13:06:30,822 Padding length to 124
2024-02-26 13:06:58,791 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=67.9 pTM=0.31
2024-02-26 13:07:00,321 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=68.8 pTM=0.329 tol=9.09
2024-02-26 13:07:01,845 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=69.7 pTM=0.358 tol=2.28
2024-02-26 13:07:03,373 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=69.8 pTM=0.367 tol=3.04
2024-02-26 13:07:03,374 alphafold2_ptm_model_1_seed_000 took 32.6s (3 recycles)
2024-02-26 13:07:04,871 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=71.2 pTM=0.308
2024-02-26 13:07:06,323 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=71.6 pTM=0.346 tol=2.14
2024-02-26 13:07:07,848 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=71.7 pTM=0.358 tol=2.38
2024-02-26 13:07:09,345 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=71.8 pTM=0.365 tol=1.31
2024-02-26 13:07:09,346 alphafold2_ptm_model_2_seed_000 took 5.9s (3 recycles)
2024-02-26 13:07:10,984 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=68.1 pTM=0.298
2024-02-26 13:07:12,529 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=68.6 pTM=0.34 tol=4.11
2024-02-26 13:07:13,992 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=69.2 pTM=0.36 tol=2.49
2024-02-26 13:07:15,484 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=68.8 pTM=0.367 tol=1.67
2024-02-26 13:07:15,485 alphafold2_ptm_model_3_seed_000 took 6.1s (3 recycles)
2024-02-26 13:07:16,987 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=66.1 pTM=0.289
2024-02-26 13:07:18,435 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=66.8 pTM=0.283 tol=5.61
2024-02-26 13:07:19,933 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=67.7 pTM=0.298 tol=1.03
2024-02-26 13:07:21,444 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=67.9 pTM=0.318 tol=2.04
2024-02-26 13:07:21,445 alphafold2_ptm_model_4_seed_000 took 5.9s (3 recycles)
2024-02-26 13:07:22,931 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=66.8 pTM=0.322
2024-02-26 13:07:24,403 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=68.2 pTM=0.345 tol=9.46
2024-02-26 13:07:25,860 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=68.8 pTM=0.354 tol=2.3
2024-02-26 13:07:27,342 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=69.4 pTM=0.358 tol=1.58
2024-02-26 13:07:27,342 alphafold2_ptm_model_5_seed_000 took 5.9s (3 recycles)
2024-02-26 13:07:27,369 reranking models by 'plddt' metric
2024-02-26 13:07:27,369 rank_001_alphafold2_ptm_model_2_seed_000 pLDDT=71.8 pTM=0.365
2024-02-26 13:07:27,369 rank_002_alphafold2_ptm_model_1_seed_000 pLDDT=69.8 pTM=0.367
2024-02-26 13:07:27,370 rank_003_alphafold2_ptm_model_5_seed_000 pLDDT=69.4 pTM=0.358
2024-02-26 13:07:27,370 rank_004_alphafold2_ptm_model_3_seed_000 pLDDT=68.8 pTM=0.367
2024-02-26 13:07:27,370 rank_005_alphafold2_ptm_model_4_seed_000 pLDDT=67.9 pTM=0.318
2024-02-26 13:07:28,679 Query 2/29: test2 (length 120)
2024-02-26 13:07:29,695 Sleeping for 9s. Reason: PENDING
2024-02-26 13:07:39,667 Sleeping for 9s. Reason: PENDING
2024-02-26 13:07:49,628 Sleeping for 6s. Reason: PENDING
2024-02-26 13:07:56,610 Sleeping for 6s. Reason: PENDING
2024-02-26 13:08:03,608 Sleeping for 5s. Reason: PENDING
2024-02-26 13:08:09,564 Sleeping for 6s. Reason: PENDING
2024-02-26 13:08:16,534 Sleeping for 7s. Reason: PENDING
2024-02-26 13:08:24,518 Sleeping for 5s. Reason: PENDING
2024-02-26 13:08:30,471 Sleeping for 7s. Reason: PENDING
2024-02-26 13:08:38,498 Sleeping for 5s. Reason: PENDING
2024-02-26 13:08:44,459 Sleeping for 6s. Reason: PENDING
2024-02-26 13:08:51,412 Sleeping for 9s. Reason: PENDING
2024-02-26 13:09:01,412 Sleeping for 9s. Reason: PENDING
2024-02-26 13:09:11,370 Sleeping for 8s. Reason: PENDING
2024-02-26 13:09:20,337 Sleeping for 8s. Reason: PENDING
2024-02-26 13:09:29,316 Sleeping for 6s. Reason: RUNNING
2024-02-26 13:09:39,703 Padding length to 124
2024-02-26 13:09:41,194 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=73.9 pTM=0.55
2024-02-26 13:09:42,664 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=73.8 pTM=0.549 tol=3.08
2024-02-26 13:09:44,110 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=73.6 pTM=0.549 tol=1.59
2024-02-26 13:09:45,593 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=74.4 pTM=0.555 tol=1.67
2024-02-26 13:09:45,593 alphafold2_ptm_model_1_seed_000 took 5.9s (3 recycles)
2024-02-26 13:09:47,073 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=76.7 pTM=0.565
2024-02-26 13:09:48,523 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=77.1 pTM=0.57 tol=0.571
2024-02-26 13:09:49,977 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=76.7 pTM=0.569 tol=0.958
2024-02-26 13:09:51,421 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=76.9 pTM=0.572 tol=0.881
2024-02-26 13:09:51,421 alphafold2_ptm_model_2_seed_000 took 5.8s (3 recycles)
2024-02-26 13:09:52,877 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=75.6 pTM=0.542
2024-02-26 13:09:54,315 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=75.9 pTM=0.548 tol=1.52
2024-02-26 13:09:55,763 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=75.9 pTM=0.552 tol=1.69
2024-02-26 13:09:57,218 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=75.8 pTM=0.555 tol=0.883
2024-02-26 13:09:57,219 alphafold2_ptm_model_3_seed_000 took 5.8s (3 recycles)
2024-02-26 13:09:58,705 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=73.9 pTM=0.56
2024-02-26 13:10:00,177 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=75.1 pTM=0.57 tol=2.2
2024-02-26 13:10:01,620 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=75.4 pTM=0.571 tol=1.78
2024-02-26 13:10:03,076 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=75.7 pTM=0.575 tol=2.04
2024-02-26 13:10:03,077 alphafold2_ptm_model_4_seed_000 took 5.8s (3 recycles)
2024-02-26 13:10:04,572 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=75.2 pTM=0.573
2024-02-26 13:10:06,026 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=76.2 pTM=0.585 tol=2.12
2024-02-26 13:10:07,498 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=76.2 pTM=0.587 tol=1.44
2024-02-26 13:10:08,958 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=76.6 pTM=0.589 tol=1.21
2024-02-26 13:10:08,959 alphafold2_ptm_model_5_seed_000 took 5.9s (3 recycles)
2024-02-26 13:10:08,986 reranking models by 'plddt' metric
2024-02-26 13:10:08,987 rank_001_alphafold2_ptm_model_2_seed_000 pLDDT=76.9 pTM=0.572
2024-02-26 13:10:08,987 rank_002_alphafold2_ptm_model_5_seed_000 pLDDT=76.6 pTM=0.589
2024-02-26 13:10:08,987 rank_003_alphafold2_ptm_model_3_seed_000 pLDDT=75.8 pTM=0.555
2024-02-26 13:10:08,987 rank_004_alphafold2_ptm_model_4_seed_000 pLDDT=75.7 pTM=0.575
2024-02-26 13:10:08,987 rank_005_alphafold2_ptm_model_1_seed_000 pLDDT=74.4 pTM=0.555
2024-02-26 13:10:10,274 Query 3/29: test3 (length 162)
2024-02-26 13:10:11,241 Sleeping for 8s. Reason: PENDING
2024-02-26 13:10:20,230 Sleeping for 10s. Reason: PENDING
2024-02-26 13:10:31,195 Sleeping for 5s. Reason: RUNNING
2024-02-26 13:10:37,194 Sleeping for 6s. Reason: RUNNING
2024-02-26 13:10:44,153 Sleeping for 9s. Reason: RUNNING
2024-02-26 13:10:54,142 Sleeping for 10s. Reason: RUNNING
2024-02-26 13:11:05,109 Sleeping for 8s. Reason: RUNNING
2024-02-26 13:11:14,082 Sleeping for 6s. Reason: RUNNING
2024-02-26 13:11:21,030 Sleeping for 8s. Reason: RUNNING
2024-02-26 13:11:30,005 Sleeping for 9s. Reason: RUNNING
2024-02-26 13:11:39,984 Sleeping for 7s. Reason: RUNNING
2024-02-26 13:11:47,941 Sleeping for 10s. Reason: RUNNING
2024-02-26 13:11:58,903 Sleeping for 9s. Reason: RUNNING
2024-02-26 13:12:08,881 Sleeping for 5s. Reason: RUNNING
2024-02-26 13:12:14,891 Sleeping for 9s. Reason: RUNNING
2024-02-26 13:12:32,470 Padding length to 172
2024-02-26 13:13:00,100 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=62.9 pTM=0.433
2024-02-26 13:13:02,186 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=63.4 pTM=0.433 tol=8.27
2024-02-26 13:13:04,282 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=64.1 pTM=0.431 tol=8.02
2024-02-26 13:13:06,403 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=63.8 pTM=0.427 tol=8.51
2024-02-26 13:13:06,404 alphafold2_ptm_model_1_seed_000 took 33.9s (3 recycles)
2024-02-26 13:13:08,535 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=60.2 pTM=0.417
2024-02-26 13:13:10,637 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=61 pTM=0.423 tol=6.09
2024-02-26 13:13:12,742 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=61.4 pTM=0.428 tol=3.33
2024-02-26 13:13:14,846 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=61.2 pTM=0.425 tol=1.8
2024-02-26 13:13:14,846 alphafold2_ptm_model_2_seed_000 took 8.4s (3 recycles)
2024-02-26 13:13:16,979 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=62 pTM=0.425
2024-02-26 13:13:19,099 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=62.3 pTM=0.43 tol=7.21
2024-02-26 13:13:21,197 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=61.9 pTM=0.426 tol=4.32
2024-02-26 13:13:23,303 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=62.1 pTM=0.427 tol=5.17
2024-02-26 13:13:23,304 alphafold2_ptm_model_3_seed_000 took 8.4s (3 recycles)
2024-02-26 13:13:25,461 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=60.5 pTM=0.418
2024-02-26 13:13:27,552 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=60.8 pTM=0.417 tol=9.52
2024-02-26 13:13:29,658 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=60.3 pTM=0.41 tol=9.23
2024-02-26 13:13:31,749 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=60.5 pTM=0.411 tol=6.08
2024-02-26 13:13:31,750 alphafold2_ptm_model_4_seed_000 took 8.4s (3 recycles)
2024-02-26 13:13:33,905 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=59.9 pTM=0.416
2024-02-26 13:13:36,038 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=60.1 pTM=0.415 tol=9.96
2024-02-26 13:13:38,154 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=59.7 pTM=0.409 tol=3.89
2024-02-26 13:13:40,252 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=59.4 pTM=0.415 tol=11.4
2024-02-26 13:13:40,253 alphafold2_ptm_model_5_seed_000 took 8.5s (3 recycles)
2024-02-26 13:13:40,294 reranking models by 'plddt' metric
2024-02-26 13:13:40,294 rank_001_alphafold2_ptm_model_1_seed_000 pLDDT=63.8 pTM=0.427
2024-02-26 13:13:40,294 rank_002_alphafold2_ptm_model_3_seed_000 pLDDT=62.1 pTM=0.427
2024-02-26 13:13:40,294 rank_003_alphafold2_ptm_model_2_seed_000 pLDDT=61.2 pTM=0.425
2024-02-26 13:13:40,295 rank_004_alphafold2_ptm_model_4_seed_000 pLDDT=60.5 pTM=0.411
2024-02-26 13:13:40,295 rank_005_alphafold2_ptm_model_5_seed_000 pLDDT=59.4 pTM=0.415 - Como ves el propio script espera cuando el servidor remoto está ocupado.
- Para cada secuencia problema obtienes figuras como éstas:
Hasta pronto,
Bruno
PD Cuando acabes de instalar deberías tener algo similar en tu fichero $HOME/.bashrc:
Cuando no vayas a usar colabfold comenta estas líneas para usar perl y python del sistema
PD2 Me comentan colegas de ULiverpool que haciendo 800-900 MSAs al día en https://api.colabfold.com no han tenido problemas
PD3 Si no reconoce tu GPU mira posibles soluciones en https://github.com/YoshitakaMo/localcolabfold/issues/210
PD4: Puedes bloquear la versión de CUDA que hayas instalado con algo como:
sudo apt-mark hold cuda-toolkit-11-8