#!/perl/bioinfo

28 de noviembre de 2019

cómo instalar LaTeX2HTML

Hola,
hoy quiero compartir cómo instalar el conversor latex2html, una herramienta basada en Perl ya veterana, que ha pasado por varias manos, pero que para mi ha sido muy útil. El problema es que ha pasado ya por las manos de diferentes autores y con el paso del tiempo la versión que puedes instalar en Ubuntu 19.04 ya no me funcionaba bien. Por suerte, el código está en https://github.com/latex2html/latex2html

En mi caso la solución fue comprobar que el binario pdftocairo estaba disponible y despuñes instalarlo de esta manera:

git clone git@github.com:latex2html/latex2html.git

cd latex2html

export p2c=`which pdftocairo`

./configure --with-pdftocairo=$p2c
make
sudo make install

Aunque ahora casi siempre escribo documentación de código con R markdown (ver ejemplos en https://github.com/eead-csic-compbio/barley-agroclimatic-association), sigo utilizando latex2html cuando uso LaTeX, como por ejemplo en los manuales de GET_HOMOLOGUES,
un saludo,
Bruno

11 de noviembre de 2019

9 años de #!/perl/bioinfo

Hola,

ayer me dí cuenta de que este blog lleva en marcha ya más de 9 años, así que he hecho un rápido análisis de a quién y dónde llegan los 258 artículos publicados. A continuación muestro algunos datos de Google Analytics (GA), empezando por la gráfica de visitantes:

Usuarios entre el 31/05/2010 y el hasta el 11/11/2019 (GA)

En total hemos tenido 67.191 usuarios, que han visto un total de 111.806 páginas (263.069 según Blogger) en 86.288 sesiones, gastando cerca de un minuto por sesión.

Páginas vistas por países desde 2010 según Blogger.

País de los visitantes en los últimos 30 días (GA).

Artículos más leídos

Algunos comandos de R útiles en ciencia e investigación

15 jun. 2010

15860

Curso de Python para biólogos - Lección 1

10 feb. 2015, 5 comentarios

2611

Regresión lineal e intervalo de confianza

7 jun. 2010, 3 comentarios

2155

Buscar un elemento en un array de perl

8 nov. 2010, 2 comentarios

2152

Generar todas las posibles combinaciones posibles

19 ago. 2010, 9 comentarios

2016

Tutorial para escribir trabajos académicos y tesis

13 jun. 2010

1795

Secuenciación de amplicones y genotipado de alto rendimiento

14 sept. 2015, 1 comentario

1384

Matrices de sustitución y alineamiento de secuencias

4 jul. 2012

1179

DIAMOND as alternative to BLASTP

19 dic. 2016, 1 comentario

1094

Expresión regular de la familia de las O-fucosiltransferasas

13 feb. 2016, 1 comentario

1043

Lengua del navegador

1.	es-es	5.936	37,67 %
2.	en-us	3.326	21,11 %
3.	es	2.111	13,40 %
4.	es-419	1.461	9,27 %
5.	es-mx	439	2,79 %
6.	es-us	404	2,56 %
7.	pt-br	364	2,31 %
8.	en-gb	356	2,26 %

País

1.	Spain	5.008	31,87 %
2.	Mexico	2.345	14,92 %
3.	United States	1.640	10,44 %
4.	Colombia	1.178	7,50 %
5.	Chile	843	5,36 %
6.	Argentina	714	4,54 %
7.	Peru	638	4,06 %
8.	Brazil	525	3,34 %
9.	Ecuador	307	1,95 %
10.	United Kingdom	258	1,64 %

Ciudad

1.	(not set)	1.628	10,09 %
2.	Madrid	1.113	6,90 %
3.	Mexico City	684	4,24 %
4.	Chicago	635	3,94 %
5.	Santiago	576	3,57 %
6.	Bogota	535	3,32 %
7.	Barcelona	520	3,22 %
8.	Valencia	300	1,86 %
9.	Ashburn	286	1,77 %
10.	Buenos Aires	250	1,55 %

Sistema operativo

1.	Windows	9.508	60,59 %
2.	Linux	1.973	12,57 %
3.	Android	1.755	11,18 %
4.	Macintosh	1.527	9,73 %
5.	iOS	618	3,94 %

Hasta pronto,
Bruno

1 de noviembre de 2019

Cómo identificar pseudogenes en plantas

Hola,
hoy comparto un artículo publicado hace unos meses por Jianbo Xie y colaboradores donde explican su estrategia para anotar pseudogenes en genomas de plantas.

Puedes ver la definición completa de pseudogen en wikipedia, yo la resumo así:

Un pseudogén es un segmento de ADN que deriva de otro gen y que ha perdido al menos parte de su función original en cuanto a su expresión o la codificación de una proteína por acumulación de mutaciones. Se generan por recombinaciones imprecisas, duplicaciones o retrotransposición y en principio pueden confundirse con neogenes al inicio de su ciclo.

El algoritmo de Xie et al para identificar pseudogenes en zonas intergénicas enmascaradas, donde se supone que no hay genes, se resume en este diagrama:

http://www.plantcell.org/content/31/3/563.long

Como podéis ver utiliza exonerate y tfasty (más preciso) para alinear secuencias de proteínas conocidas contra ADN genómico, BLASTP y Orthomcl para grupar los pseudogenes en familias, y finalmente MCScanX para localizar parejas de pseudogenes posiblemente relacioanas por duplicación.

Un saludo,
Bruno

18 de octubre de 2019

Plant Genomes in a Changing Environment 2019 (III)

The zygotic transition in rice and application to self-propagating hybrid crops (Venkatesan Sundaresan, University of California- Davis, USA)

He reviews animal embryogenesis, where the zygotic nucleous is not transcribed for several cell divisions but then gets gradually activated. Less is known about the maternal to zygotic transition in plants. Rice is a nice model for this, as it takes 30min from pollination to fertilization. They observed that zygotic activation is quicker than in animals, with 8 cells (https://www.ncbi.nlm.nih.gov/pubmed/29112853). They have shown that an AP2 baby boom transcription factor expressed in the sperm cell triggers embryogenesis.

In the second part of the talk he talks about their work on obtaining cheap hybrid rice obtained from mutants that turn meiosis in mitosis, as currently farmers worldwide cannot afford to buy hybrid seed. These mutant synthetic apomictic plants produce seed that maintain parents heterozygosity with <30 at="" described="" efficiency="" is="" span="" the="" work="">https://www.nature.com/articles/s41586-018-0785-8. He discusses the risks of having clonal crops with the banana example (Gros Michel, Cavendish, …) and their disease susceptibilities.

During questions he explains that there natural apomictic species and in pearl millet it is known that the responsible gene is a baby boom TF.

The genetics of plant-plant interactions: from monospecific to community-wide interactions (Fabrice Roux, INRA, France)

Plants do not grow in isolation, they usually compete for space and resources. In fact, most pesticides used by farmers are herbicides, as plants compete with neighbor plants from the same or other species to thrive (https://onlinelibrary.wiley.com/doi/full/10.1111/tpj.13799). They are currently studying genetic variability in two A. thaliana populations. With one in France (TOU-A, n=195, 1.9M SNPs, LD 2kb, no population structure), they observe different genomic architectures for the response to monospecific and plurispecific interactions (https://www.biorxiv.org/content/10.1101/536953v1). They find candidate genes which are light-sensitive and also receptor-like kinases.

Increasing meiotic recombination in plants (Raphael Mercier, MPI Cologne, France)

After screening 6K mutant lines in A. thaliana they have discovered 3 pathways that limit crossover. His group has many papers published on this topic: https://scholar.google.com/citations?user=BKGJoo4AAAAJ . Some of his experiments involve rescuing mutant phenotypes with human proteins, which shows its degree of conservations. A remarkable example is the A. thaliana BRCA2 ortholog, which is involved in meiosis crossover control. His most recent work (see for instance https://www.pnas.org/content/115/10/2431.short) employs this knowledge to increase the frequency of crossover in plants without reducing fertility.

17 de octubre de 2019

Plant Genomes in a Changing Environment 2019 (II)

Exploring and utilization of rice resources with broad-spectrum resistance against blast disease (Xuwei Chen, Sichuan University, China)

He speaks about their sampling effort to identify alleles in rice germplasm that confer resistance to blast disease. A survey of 3K sequenced rice genomes discovered through GWAS an allele in cultivar Digu with MAF=0.10. It is a SNP in the promoter of a Zinc Finger TF. The results are published in https://www.ncbi.nlm.nih.gov/pubmed/28666113 . He then moves on to their work on transcription factor IPA1 (Ideal Plant Architecture) , that represses improductive tillers and enhances immune responses. Again, the selected allele (ipa1-1D) carries a mutation that breaks a miRNA site. The results can be found in https://www.ncbi.nlm.nih.gov/pubmed/30190406

Do environmental changes induce retrotransposon expression in plants? (Flavia Mascagni, University of Pisa, Italy)

She is conducting work to determine to what extent retrotransposons (RTs) are expressed in response to environmental changes in sunflower. Upon treatment with hormones and chemicals, they observe higher expression in the leaf than in the root, with some genotypes more prone than others. Overall they found 134 differentially expressed RTs. Then they used a similar approach in poplar, again using public cDNA libraries. Some genotypes are more prone than others to show RT expression in response to treatment. In both species, of the few differentially expressed RTs, most belong to the Copia superfamily.

Functional genomics of European hazel (Corylus avellana L.) to address an emerging, destructive powdery mildew pathogen (Stuart Lucas, Sabanci University, Turkey)

For their search of alleles conferring resistance they have completed a genome assembly yielding 11 scaffolds (370Mb) for a predicted size of 380Mb. They are now annotating MLO and NLR genes. As for MLO genes, 5 clustered copies are good candidates for disease resistance. For NLR they are using long-reads to sequence end-to-end copies, on a pool of 363 genes with little overlap across populations.

Natural genetic variation in the response of Arabidopsis to Plasmodiophora brassicae infection (William Truman, IPG PAS Poznan, Poland)

He describes this obligate pathogen protest (clubroot) that affects a wide range of Brassica crops. Some of their previous results are at http://www.plantcell.org/content/30/12/3058 . They are testing candidate resistant alleles in Arabidopsis thaliana ecotypes. Some are being further studied in Y2H assays.

Daniele Filiault, Gregor Mendel Institute, Austria

She describes her Arabidopsis thaliana experiments in a latitudinal gradient from Germany to Sweden. The measure survival and slug susceptibility and observe local adaptation, with germplasms from S latitudes doing badly when planted up North. They then do GWAS and separate intra-specific and Genus-specific variants.

Pathogen-informed strategies for sustainable broad-spectrum resistance in crops (Bart Thomma, University of Wageningen, The Netherlands)

He talks about how we can learn from pathogen molecules to obtain resistant crops. He shows this video of tomato fungal pathogen Verticillium dahlie: https://vimeo.com/222178738 . He then shows haplotypes of different isolates of the pan-genome and refers to https://genome.cshlp.org/content/early/2016/07/12/gr.204974.116 and https://onlinelibrary.wiley.com/doi/full/10.1111/mec.15168 . Each isolate has 10% lineage-specific non-core genes and they are apparently more conserved across species of the Gens than core genes. This could be due to horizontal transfer (unlikely), selection (unlikely) or reduced error replicons (Hi-C experiments suggest co-localized in nucleous, unmethylated, enriched in TE, etc). Their most recent manuscript is https://www.biorxiv.org/content/10.1101/528729v1 . The find that a single effector gene in the fungus is responsible for pathogenicity, and when removed infection does not occur/progress. Conversely, when transformed into non-pathogenic species of the genus they now cause a disease in tomato.

He ends with another story, where they have seen that the fungus produces an antimicrobial protein (VdAve1, is that an antibiotic?) that alters the plant root microbiome and ultimately facilitate infection.

Beyond single genes: receptor networks underpin plant immunity (Sophien Kamoun, The Sainsbury Laboratory, UK)

Most plants are resistant to most pathogens, they have a very efficient immune system with Pattern recognition receptors (PRR) and NLR receptors. Pathogens secrete effectors to modulate plant defenses (https://www.ncbi.nlm.nih.gov/pubmed/23223409). Together, plant and pathogens coevolve and drive diversification. The NLR diversification is much larger in plants than in mammals (human vs muse, tomato vs coffee, 100Myr). In fact, ultimately, pathogens alter plant genomes (gene-for-gene model). He proposes to move from the single gene paradigm to the immune network, incorporating redundancy, evolvability, robustness and epistasis (https://www.ncbi.nlm.nih.gov/pubmed/29930125).

Plant NLRs are typically made of three domains: [CC|CCR|TIR]NB-ARC-LRR. They form resistosome complexes that integrate in the membrane (https://www.ncbi.nlm.nih.gov/pubmed/30948527). These genes cluster in the genome (https://www.pnas.org/content/114/30/8113). This would be the most ancestral network, found in chr5 of sugar beet and conserved in other plants (such as tomato?):

A fifth of monocot/dicot NLR N-termina share a conserved MADA motif (MADAxVSFxVxKLxxLLxxEx, https://www.biorxiv.org/content/10.1101/693291v1). The CC domain diversified and became non-functional in many cases.

Using data science to understand plant gene regulation (Daphne Ezer, University of York, UK)

She starts by asking how do we know that our experiments are relevant in the real world? We need to correct for confounding variables and always put the data in its context, right? For instance, for bulk RNA-seq you must sync plants/treatments to make sure you are comparing tissues of the same age, same circadian point and tissue ratio. She has developed tools for these tasks, such as https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2717-5

Structure, stability and phenotypic relevance of DNA methylation in Thlaspi arvense natural populations (Dario Galanti, University of Tubingen, Germany)

He talks about his PhD project, which is concerned about heritable methylation as a function of location of origin, and how that affects phenotypes. He is working with pennycress (Thlaspi arvense). Populations sampled across Europe. I'll see if we can load that genome in Ensembl.

Nanopore Direct RNA Sequencing Maps the Arabidopsis m6A Epitranscriptome (Matthew Parker, University of Dundee, UK)

He starts by enumerating the theoretical advantages of sequencing native RNA directly, instead of sequencing cDNA. They are using it in several projects. The error rate is 5-8% which is not a problem for polyAs, but it is for short exon annotation and intron boundaries. In those cases they still use Illumina to correct the long reads.

Improving gene regulatory network inference from ATAC-Seq data using an ensemble motif mapping approach (Marc Jones, VIB / Ghent University, Belgium)

This talk complements yesterday’s talk by focusing on ATAC-Seq. They use ATAC read depth to restrict genome regions where known motifs can be scanned to discover relvant cis regulation.

No genome required: Finding genetic variants associated with plant phenotypes without complete genome information (Yoav Voichek, Max Plank Institute for Developmental Biology, Germany)

This talk complements yesterday’s but with a focus on the biology and the comparison between GWAS based on SNPs and kmers. He shows a Venn plot to show that noth approaches miss have a large intersection. However, there are some SNPs associated not found with kmers and also the converse (structural variants, regions missing in reference, etc). He is asked how this would work with heterozygous genomes.