21 de noviembre de 2013

GET_HOMOLOGUES for pan-genome analysis

en el último número de Applied and Environmental Microbiology mi colega Pablo Vinuesa y yo publicamos un artículo describiendo el software GET_HOMOLOGUES, que tiene como abstract:
GET_HOMOLOGUES is an open source software package that builds upon popular orthology-calling approaches making highly customizable and detailed pan-genome analyses of microorganisms accessible to non-bioinformaticians. It can cluster homologous gene families using the bidirectional best-hit, COGtriangles or OrthoMCL clustering algorithms. Clustering stringency can be adjusted by scanning the domain-composition of proteins using the HMMER3 package, by imposing desired pair-wise alignment coverage cut-offs or by selecting only syntenic genes. Resulting homologous gene families can be made even more robust by computing consensus clusters from those generated by any combination of the clustering algorithms and filtering criteria. Auxiliary scripts make the construction, interrogation and graphical display of core and pan-genome sets easy to perform. Exponential and binomial mixture models can be fitted to the data to estimate theoretical core and pan-genome sizes, and high quality graphics generated. Furthermore, pan-genome trees can be easily computed and basic comparative genomics performed to identify lineage-specific genes or gene family expansions. The software is designed to take advantage of modern multiprocessor personal computers as well as computer clusters to parallelize time-consuming tasks. To demonstrate some of these capabilities, we survey a set of 50 Streptococcus genomes annotated in the Orthologous Matrix Browser as a benchmark case.
El  software  se puede descargar de http://www.eead.csic.es/compbio/soft/gethoms.php y también de http://maya.ccg.unam.mx/soft/gethoms.php y está escrito mayoritariamente en Perl, aunque contiene también trozos en R.
El manual del programa describe en detalle ejemplos de uso y está disponible en http://www.eead.csic.es/compbio/soft/manual.pdf .

Este paquete de programas se diseñó para el estudio de los pan y core-genomas de grupos de microorganismos, que es con lo que trabaja el grupo de Pablo fundamentalmente, y permite generar figuras como éstas:

