#!/perl/bioinfo: get

Mostrando entradas con la etiqueta get_homologues. Mostrar todas las entradas

16 de mayo de 2022

Cómo portar tu software a bioconda, ejemplo paso a paso con GET_HOMOLOGUES

Hola,

hace poco logramos portar el software GET_HOMOLOGUES a la plataforma bioconda (ver artículo), el canal especializado en bioinformática del gestor de paquetes conda. En esta entrada voy a resumir cómo lo hicimos por si sirve de ayuda a otros programadores desprevenidos. La documentación en línea que he consultado por el camino incluye estos enlaces, me pareció que estaba todo mejor documentado en conda, el proyecto principal, que en bioconda:

Para enterarte de cómo es el proceso de creación de una receta de principio a fin: https://bioconda.github.io/contributor/workflow.html
La guía con los pasos esenciales: https://bioconda.github.io/contributor/guidelines.html
Cómo formatear tu fichero meta.yaml: https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html
Solamente si necesitas variables de ambiente: https://docs.conda.io/projects/conda-build/en/latest/user-guide/environment-variables.html

Los pasos que seguí fueron:

Análisis de dependencias del software GET_HOMOLOGUES. Como ya lo teníamos en GitHub, Travis y Dockerhub lo teníamos bastante claro, pero aun así tuvimos que explorar las recetas ya existentes en bioconda (https://github.com/bioconda/bioconda-recipes/tree/master/recipes) para aprovechar los binarios disponibles. Tras esta tarea descubrimos que las recetas blast, hmmer, diamond, mcl y phylip resolvían casi todas nuestras necesidades, pero faltaba el software COGtriangles, necesario para ejecutar $ get_homologues.pl -G
Por recomendación de mi colega Borja Latorre, y siguiendo la filosofía de bioconda, lo lógico era crear una nueva receta independiente para esta dependencia, así que eso decidimos hacer.
Hacer un fork del repositorio bioconda (https://github.com/bioconda/bioconda-recipes) con tu usuario de GitHub
Clonar localmente el nuevo repositorio
$ cd bioconda-recipes/recipes
Crea una nueva rama como se explica en la documentación y selecciónala
$ mkdir cogtriangles
Los siguientes pasos consistieron en preparar la receta cogtriangles, lo cual implicó editar dos ficheros de texto con ayuda de la documentación y cotilleando otras recetas en el repositorio:

meta.yaml : define los metadatos de la receta, incluyendo el origen del código fuente, su suma SHA256 para comprobar que la descarga es correcta, su licencia y publicaciones, las dependencias para construir esta receta (las recetas make, el compilador de C++ y perl) y, lo más importante, la batería de tests para comprobar que la receta funciona.
build.sh : define cómo se construye el paquete, colocando los binarios compilados de COGtriangles en la carpeta $PREFIX/bin

$ cd .. && mkdir get_homologues && cd get_homologues
Edito dos ficheros similares para esta segunda receta:

meta.yaml :

en la sección build puedes declarar variables de ambiente
verás que la sección requirements->run importa la receta cogtriangles
los tests se ejecutan invocando directamente dos comandos sin indicar un camino/path

build.sh : copia a la carpeta $PREFIX/bin los scripts y dependencias de GET_HOMOLOGUES para que los tests funcionen tanto en local como en modo mulled test (mira el punto 11).
$ cd ../..

Tras instalar miniconda y Docker deberás configurar los canales de bioconda (gracias José María! Si ya tenías conda instalado igual no los necesitáis; una alternativa a conda es mamba)

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda create -c base -n bioconda
$ conda install -c bioconda bioconda-utils

Ahora prueba las nuevas recetas en tu máquina local

$ conda activate bioconda
$ bioconda-utils lint --packages get_homologues
$ bioconda-utils build --docker --mulled-test --packages get_homologues

Si hay problemas de permisos de escritura en Docker puedes resolverlos con

$ sudo chmod 666 /var/run/docker.sock

Si te aparence un error como FileNotFoundError: [Errno 2] No such file or directory: 'miniconda3/envs/bioconda/conda-bld/conda_build_config_0_-e_conda_build_config.yaml' puedes resolverlo con (gracias José María!):

$ mkdir ~/miniconda3/envs/bioconda/conda-bld

Si los paquetes de las respectivas recetas se construyen correctamente es momento de

Registrar todos los cambios con git add, git commit, git push
Abrir un Pull Request (PR) y documentar tus recetas. Puedes ver por ejemplo el nuestro en https://github.com/bioconda/bioconda-recipes/pull/34595 . Cada cambio que hagas, por ejemplo por nuevos git push, lanza de manera automática los tests que definiste en tus metadatos.
Escribir en https://gitter.im/bioconda/lobby para pedir por favor que alguien revise el PR, y si todo va bien, una tu rama al repositorio oficial

Prueba a instalar el paquete desde la nube, posiblemente desde otra máquina, con algo como

$ conda activate bioconda
$ conda create -n get_homologues -c conda-forge -c bioconda get_homologues
$ conda activate get_homologues

Espero que esta explicación os ayude,

un saludo,

Bruno

1 de diciembre de 2017

Docker image of GET_HOMOLOGUES + GET_PHYLOMARKERS

Dear all,
Pablo Vinuesa and me we have recently built a Docker image of GET_HOMOLOGUES bundled with a new pipeline, meant to be used downstream, called GET_PHYLOMARKERS. This software will be described in detail in a forthcoming publication. The image, and instructions on how to run it, are available at https://hub.docker.com/r/csicunam/get_homologues :

By packing them in a ready-to-use, cross-platform image, users avoid installation glitches, usually related to several extremely useful R packages required by the second pipeline. Please test it and give us feedback if possible,
cheers,
Bruno and Pablo

Note: link to docker hub updated 29Dic2017

9 de marzo de 2017

Tutorial: pan-genome analysis with GET_HOMOLOGUES

Hi,
a new tutorial on the analysis pan-genomes using GET_HOMOLOGUES and GET_HOMOLOGUES-EST is now available. After a short introduction, where the main concepts are illustrated, the remaining sections cover the installation and typical operations required to analyze and annotate genomes and transcriptomes from a pan-genome perspective, in which individuals or species contribute genetic material to a pool.

The examples include both bacterial sequences in GenBank format and plant transcripts. This tutorial has been created for a two-day workshop to be held at BIOS (Manizales, Colombia) next week, with title "From genomes to pangenomes: understanding variation among individuals and species":

The tutorial can be found at: http://digital.csic.es/handle/10261/146411

Code, sample datasets and documentation are available at:
https://github.com/eead-csic-compbio/get_homologues

Suggestions and error reports are welcome,
Bruno

14 de abril de 2015

HOWTO install Grid Engine on multi-core Linux box to run GET_HOMOLOGUES

Hi,
I post this in English so that all users of GET_HOMOLOGUES can benefit. It is inspired on a previous tutorial posted in scidom and the expertise of David Ramírez from our provider SIE.

The aim of this entry is to demonstrate how to set up a Grid Engine queue manager on your local multi-core Linux box or server so that you can get the most of that computing power during your GET_HOMOLOGUES jobs. We will install Grid Engine on its default path, /opt/, which requires superuser permissions. As I'll do this un Ubuntu, I will do 'sudo su' to temporarily get superuser privileges; this should be replaces simply with 'su' in other Linux flavours. Otherwise you can install it elsewhere if you don't have admin rights.

1) Visit http://gridscheduler.sourceforge.net , create a new user and download the latest 64bit binary to /opt:

$ cd /opt
$ sudo su
$ useradd sgeadmin
$ wget -c http://dl.dropbox.com/u/47200624/respin/ge2011.11.tar.gz $ tar xvfz ge2011.11.tar.gz
$ chown -R sgeadmin ge2011.11/
$ chgrp -R sgeadmin ge2011.11/

$ ln -s ge2011.11 sge

2) Set relevant environment variables in /etc/bash.bashrc [system-wide, can also be named /etc/basrhc] or alternatively in ~/.bashrc for a given user:

export arch=x86_64
export SGE_ROOT=/opt/sge
export PATH=$PATH:"$SGE_ROOT/bin/linux-x64"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SGE_ROOT/lib"

And make this changes live:

$ source /etc/bash.bashrc

3) Set your host name to anything but localhost by editing /etc/hosts so that the first line is something like this (localhost or 127.0.x.x IP addresses are not valid):

172.1.1.1 yourhost

4) Install Grid Engine server with all defaults except cluster name, which usually will be 'yourhost':
$ ./install_qmaster

5) Install Grid Engine client with all defaults:
$ ./install_execd

6) Optionally configure default all.q queue:
$ qconf -mq all.q

7) Add your host to list of admitted hosts:
$ qconf -as yourhost
$ exit

You should now be done. A test will confirm this. Please open a new terminal and move to your GET_HOMOLOGUES installation folder:

$ cd get_homologues-x86-20150306
$ ./get_homologues.pl -d sample_buch_fasta -m cluster

If the jobs finishes successfully then you are indeed done. I hope this helps,
Bruno

10 de enero de 2014

New release of GET_HOMOLOGUES

Dear users,
today we are announcing a new release of the GET_HOMOLOGUES package (20140110) that includes an updated version of compare_clusters.pl, which now performs as explained in the manual, including option -T for creating a parsimony pangenomic tree. The download links to get this code are as usual:

http://www.eead.csic.es/compbio/soft/gethoms.php

and

http://maya.ccg.unam.mx/soft/gethoms.php

Please let us know of any other bugs that you might encounter, we appreciate your comments,
many thanks,

Bruno Contreras and Pablo Vinuesa