1 de febrero de 2023

Mide la huella de carbono de los algoritmos


hace unos días nuestra colega Ana Conesa se hacía eco de una dispositiva de Roderic Guigó donde se resumía la huella de carbono de dos algoritmos para predicción de genes en genomas (Geneid vs Augustus):


Aunque falta contexto como las versiones de los programas o los datos de entrada, en este ejemplo se puede ver que el algoritmo de la izquierda es energéticamente mucho más eficiente que el de la derecha.

En https://www.green-algorithms.org puedes hacer este tipo de cálculos para los algoritmos que tu usas, tanto con lo calculadora en línea, como con un script python para sistemas de cálculo que usen SLURM.

Un saludo y buena semana,


18 de enero de 2023

Notes on Plant and Animal Genomes conference #PAG30 (V)

Wednesday 18012023

Oliver Ryder, San Diego Zoo Institute for Conservation Research. He talks about conservation genomics. For instance, he explains how the genomes of extinct species can be used to estimate genetic load and effective population size. On extant species, genome sequences can be used to estimate heterozygosity for instance (Californian condor > Andean condor). He then moves to technological possibilities to rescue/clone[nearly-]extinct species if we are able to crioperserve DNA and tissues. San Diego’s zoo has its own FrozenZoo for this.

Thomas Blein, Institute of Plant Sciences Paris-Saclay. MARS lncRNA regulates the coexpression of a gene cluster, recruits a TF to a promoter, controls a chromatin loop and coordinates epigenetically the expression of neighbor genes, in response to ABA: https://pubmed.ncbi.nlm.nih.gov/35150931 . You can read there that “The enrichment of co-regulated lncRNAs in clustered metabolic genes in Arabidopsis suggests that the acquisition of novel non-coding transcriptional units may constitute an additional regulatory layer driving the evolution of biosynthetic pathways”.

Peijian Cao, China Tobacco Gene Research Center. Starts by presenting lncRNAs that encode small peptides (https://academic.oup.com/bib/article/20/5/1853/5047384, In his experiments they track the expression of lncRNAs in response to herbivores. They find that different lncRNAs coexpress with genes in the biosynthetic pathway to that produces JA. They also have evidence of encoded small peptides being produced upon binding to ribosomes. These peptides are as stable as common proteins.

Hikmet Budak, Montana BioAg Inc. Presents work on lncRNAs on different wheats to try and control pest insects.

Josephine Herbst, VIB-UGent Center for Plant Systems Biology.

Notes on Plant and Animal Genomes conference #PAG30 (IV)

Tuesday 17012023

Ian D Godwin, QAAFI, The University of Queensland. While he introduces sorghum, he recommends the book https://drunkenbotanist.com, which can be used for drinks, food and chicken fodder. In Australia it is grown in the driest area on the E. They are using gene editing to improve it and require high quality genomic resources, such as https://www.nature.com/articles/s41477-021-00925-x . They are particularly interested in high resolution PAV maps, as PAV is a main driver of diversity in this crop. Also they use a non-reference assembly for their work, although transformation still needs to be optimized.  They have selected their own promoters that also work well in barley and maize. They are using these to optimize may trais, mostly plant and root architecture, but also starch composition, which is naturally in a tight protein matrix that make it undigestible (https://onlinelibrary.wiley.com/doi/full/10.1111/pbi.13284). The plants have to be tested in the field, root reach the bottom of a pot in 10 days. The obtained lies with increased protein and larger grains. They have also tested them for poultry feeding, and observed that digestible, high protein content reduces the amount of soy-based fodder required by chicken. They further improved protein digestability by knocking out gamm-kapharin. He mentions that a VRN1 homolog in Sorghum controls root angle. In questions he says they are now introgressing their edited genes in parental lines used for Sorghum hybrids.

Viviane Slon, Tel Aviv University. She extracts ancient DNA from sediments. Previous work have extracted plant and animal DNA 400K yr old (permafrost). Such experiments allow to find out first/last appearance dates in sediments, which can be correlated to past biodiversity, history, climate change and human activity. A few weeks ago researchers have been able to go back 2M yr in Greenland. About 90% of the successfully extract DNA has no BLASTN hits. To improve yield they use mammalian mtDNA capture. What does differentiate ancient DNA from modern? It is shorter, C in single-stranded ends deaminate -> T (this is actually as a sanity check by counting nt substitution pero position). They are now able to extract hominid mtDNA from the soil even when there are no bones, as they have shown in the Denisova cave (https://www.nature.com/articles/s41586-021-03675-0). They have also managed to extract nuclear DNA in Galería de las estatuas, Atapuerca (https://www.science.org/doi/10.1126/science.abf1667) and distinguished two Neanthertal populations. What next? Her lab is now developing methods to improve field sampling, the wet lab and data analyses. With this toolbox we should be able to fill the gaps in the biodiversity history, particularly for plants. With high density sampling in sediment transects we should be able to estimate changes in allele frequencies with help from coalescent theory.

Samuel P. Hazen, University of Massachusetts Amherst. Talks about their experiments to find TFs that might be controlling cell wall thickening in Brachupodium distachyon., such as the bZIP named SWIZ. This is one among other TFS that are thigmotropic, relocating and locally expressing to the nucleous when the plant is touched/perturbed (this depends on calmodulin and Ca being released). Expression lasts about 1h.  They find that 7-9K genes are differentially expressed (DE) upon touching the plants and they have also discovered a couple of DNA motifs for SWIZ using ATAC-Seq analysis. They have also done de novo discovery of motifs upstream of DE genes. Adding external GA hormone represses movement to the nucleous. There’s a preprint at https://www.biorxiv.org/content/10.1101/2021.02.03.429573v2.abstract

Melissa Bredow, Department of Plant Pathology and Microbiology, Iowa State University. Frost is still an important stress for crops despite global warming. Freeze damage starts by seed ice crystal that end up piercing cell membranes. Gradual expose to cold expressed ice-binding proteins (IBP) that protect membranes upon freeze. She uses brachy as model to study the protection provided by IBPs. There are seven IBPs in B. distachyon (BdIRI1-7), none in A. thaliana. These protein are only stable < 4ºC (disordered otherwise) and theire folds are different across species. Apparently is not just cold what matters, but also bacterial (Xanthomonas, Pseudomonas sp) ice nucleation proteins that favour freeze and membrane destruction. Their current modela is that BdIRI proteins actually bind to bacterial nucleation proteins to inhibit their function.

Todd Blevins, Centre national de la recherche scientifique, University of Strasbourg. Studies the role of RNA polymerase IV in brachy, which silence transposons by transcribing non-coding RNAs that drive AGO-based silencing, as reviewed in https://www.annualreviews.org/doi/abs/10.1146/annurev-arplant-093020-035446. Mutants of these genes (nrpd1) have reduced leaf elongation via regulation of cell production and via cell cycle exit. In addition, mutation causes higher expression of some genes, including bZIP TFs, which are silenced in the wild type. These have differentially methylated promoters. This varies across ecotyopes and depends on the presence/absence of a TE. They have screened methylated sequences with Illumina and Nanopore and found very comparable results, although ONT is superior when it comes to check individual TEs, as Illumina reads multimap.

Birkett Clay, USDA-ARS. Talks about integrating into https://breedbase.org Practical Haplotype Graphs built from exome data for wheat and barley. He uses code at https://github.com/TriticeaeToolbox/PHGv2 and imputation protocols at https://wheat.triticeaetoolbox.org/static_content/files/imputation.html. Imputation accuracy at the PHG in barley is > 93% if #markers > 2000. Details and converted VCF files are available at https://files.triticeaetoolbox.org . They display the resulting PHG with JBrowse (https://triticeaetoolbox.org/jbrowse). The PHG is built on a single reference genome, so you might need to select the appropriate reference to optimize imputation (or build a mosaic reference). Creating the PHG is computationally intensive, but the imputation is quite fast.

Karen A Sanguinet, Washington State University. She talks about buzz mutants that affect root biomass and hair formation in brachy. The A. thaliana ortholog rescue the mutant phenotype. She saw that BUZZ expression responds to N availability, although primary root growth is not N-responsive. It is expressed in the root epidermis.

Kapeel Chougule, Cold Spring Harbor Laboratory. Presents (PanOryza) efforts to consistently annotate gene models in the rice pangenome. Canonical isoforms are called with TRaCE (https://academic.oup.com/bioinformatics/article/38/1/261/6326792). At Gramene they have rice subsite and plan to build pan-gene indexes.

Andrew Olson, Cold Spring Harbor Laboratory. After a little history of the Gramene project (2022), he presents the pangenome sites (2021-22), which currently represent the larger bulk of new genes being added to Gramene (maize, rice, Vitis and Sorghum). He goes to summarize all the tasks involved in setting up and maintaining the sites, the import of data from Ensembl Plants (https://plants.ensembl.org) and Expression Atlas, and mentions they are now following the standards agreed at https://data.nal.usda.gov/ag-data-commons-collection-development-policy.

Sushma Naithani, Dept. of Botany and Plant Pathology, Oregon State University. She presents her work on curating plant reactome pathways using omic datasets (https://plantreactome.gramene.org). These pathways are linked to genes in Ensembl Plants and Gramene, which in turn often link to gene expression data. The curation protocols are illustrated at https://peerj.com/articles/11052. Currently they 126 species and 326 pathways, which have been project to 39K genes.

My turn. I presented our recent work "Building pangene sets from plant genome alignments confirms presence-absence variation", from the PanOryza project. The preprint can be read at https://www.biorxiv.org/content/10.1101/2023.01.03.520531v1 and code and documentation obtained here: https://github.com/Ensembl/plant-scripts/tree/master/pangenes.


[Source: Agata]  

Jonathan Cahn, HHMI-Cold Spring Harbor Laboratory. Talks about regulatory elements in maize inferred from diverse omics datasets (ie ChIP-seq, H3K4-me1) as part of http://www.maizecode.org, which follows ENCODE guidelines. Raw data can be downloaded, I cannot see the DNA motifs though. Superenhancers are delimited by methylated areas and enriched in H3K27ac and accumulate binding sites.The results of this project are described at https://www.frontiersin.org/articles/10.3389/fpls.2020.00289/full. Shows really nice plots made with https://cran.r-project.org/web/packages/ggalluvial

Sarah Dyer, EMBL-EBI. Talks summarizes the current status of the wheat pangenome at Ensembl Plants: https://plants.ensembl.org/Triticum_aestivum/Info/Strains?db=core. The main addition since last time I checked is that now wheat genes have a cultivar-based Compara section, where you can see orthology to genes in other pangenome wheats, ie: https://plants.ensembl.org/Triticum_aestivum/Gene/Strain_Compara_Tree?g=TraesCS3D02G273600;r=3D:379535906-379539827

Josh Clevenger, HudsonAlpha Institute for Biotechnology. https://www.hudsonalpha.org/khufudata/plant-improvement

On Twitter I heard about a talk I missed by Katie Jenike were she presented Panagram, K-mer based software for alignment-free visualization & analysis of pan-genomes. There’s code (https://github.com/kjenike/panagram) and even slideas at https://twitter.com/mike_schatz/status/1615440857980899328

17 de enero de 2023

Notes on Plant and Animal Genomes conference #PAG30 (III)

Monday 16012023


Rajeev Varshney, Murdoch University. Starts by talking about sustainable, climate-smart crops such as legumes and the resources that are changing the way we breed them: marker-assisted breeding, expression atlases, improved reference genomes, and more recently pangomes and superpangenomes. He believes that unadapted germplasm will provide genes for future crops. They are currently exploiting all these tools, for instance to tap on global variability of chickpea (n=3,366). Using the toolbox he and collaborators have mapped 20-50 traits in several legumes and have introgressed selected alelles in elite lines and evaluated them in the field. To streamline genotyping around the world they put together a low-cost high-throughput genotyping project that has benefited dozens of crops. Moreover, they have trained breeders with 15+ meetings around developing countries (52 MSc & PhD students as well). He then goes to show many examples of improved varieties produced in collaboration with local breeders that are now resistant to diseases or tolerate drought much better than checks (see letter at https://www.nature.com/articles/s41587-021-01079-z). What’s the future? Haplotype-based breeding to drive optimal idiotypes, genomic prediction, spatial transcriptomics, machine learning  => fast-forwards breeding (https://www.cell.com/trends/genetics/fulltext/S0168-9525(21)00226-2). He paid homage to green revolution heroes, he’s certainly one of them.

In poster session I discovered a poster by MA Lemay, U Laval, where he does GWAS on a soybean panel and compares the performance of  K-mer based analysis (https://github.com/malemay/katcher , https://github.com/malemay/gwask) to that of GWAS with explicit SV-indels. He finds that K-mer GWAS performs better than SV and comparable to SNP-based GWAS. He actually used https://pubmed.ncbi.nlm.nih.gov/32284578 to perform GWAS on K-mers.

Also read the poster of Merrit Kaipho-Burch, Cornell University,  where she summarized her experiments for the estimation of the effect of TE insertions/deletions on gene expression of maize inbreds and hybrids. The work required correcting for kinship. She concluded that 14% of the tested genes show expression changes, but only 0.9% of TE events had consequences.

Chandler Sutherland, UC Berkeley. Presents her work on plant NLR immune receptors. In A. thaliana they found that there are two classes of NLRs, with low and high amino acid Shannon diversity (https://academic.oup.com/plcell/article/33/4/998/6119334). Can they be identified using epigenomic features? In A. thaliana leaf they find that highly variable (hv) NLRs are more expressed than non-hv (apparently in contradiction with https://www.nature.com/articles/s41467-017-02292-8), and are also less gene-body methylated. They are also closer to TEs and often cluster in the genome.

Dan Sloan talks about mutation rate in plant mitochondria, which apparently are less mutable that other replicons (mt<cp<nucleous), as a result of the action of MutS Homolog 1 (MSH1, https://www.pnas.org/doi/10.1073/pnas.2206973119). Unpublished data suggest that mutation rate is negatively correlated wit the number of copies of the mitochondrial DNA.

Michelle Stitzer, CURRENTLY AT Cornell University. She describes a series of experiments at the Ross-Ibarra lab at UC DAVISto measure the mutation rates (genic, intergenic, TEs) in maize after comparing individuals across generations and even two inbred B73 genomes assembled 4-years apart.

Daniel Koenig, University of California-Riverside. Uses A. thaliana populations to study how variation arises through mutation. His lab looks in particular at CNV of mutator loci and take a reference-free approach with K-mers. They use GWAS and find candidate genes that explain CNV of K-mers, as found a couple of years go in rice (https://www.nature.com/articles/s41467-018-07974-5).

Daniela P. Quiroz, UC Davis. She talks about targeted DNA repair in rice; when repair fails mutations arise. She found that mutation rates were lower in genomic regions marked by H3K4me1, a histone modification found in the gene bodies of actively expressed and evolutionarily conserved genes in plants. This compared to other methylations types in K4 (0, 2, 3). The repair mechanism involves protein domain Tudor (https://www.ebi.ac.uk/interpro/entry/InterPro/IPR002999). This work is published in https://www.biorxiv.org/content/10.1101/2022.05.28.493846v3.