#!/perl/bioinfo

18 de enero de 2023

Notes on Plant and Animal Genomes conference #PAG30 (IV)

Tuesday 17012023

Ian D Godwin, QAAFI, The University of Queensland. While he introduces sorghum, he recommends the book https://drunkenbotanist.com, which can be used for drinks, food and chicken fodder. In Australia it is grown in the driest area on the E. They are using gene editing to improve it and require high quality genomic resources, such as https://www.nature.com/articles/s41477-021-00925-x . They are particularly interested in high resolution PAV maps, as PAV is a main driver of diversity in this crop. Also they use a non-reference assembly for their work, although transformation still needs to be optimized. They have selected their own promoters that also work well in barley and maize. They are using these to optimize may trais, mostly plant and root architecture, but also starch composition, which is naturally in a tight protein matrix that make it undigestible (https://onlinelibrary.wiley.com/doi/full/10.1111/pbi.13284). The plants have to be tested in the field, root reach the bottom of a pot in 10 days. The obtained lies with increased protein and larger grains. They have also tested them for poultry feeding, and observed that digestible, high protein content reduces the amount of soy-based fodder required by chicken. They further improved protein digestability by knocking out gamm-kapharin. He mentions that a VRN1 homolog in Sorghum controls root angle. In questions he says they are now introgressing their edited genes in parental lines used for Sorghum hybrids.

Viviane Slon, Tel Aviv University. She extracts ancient DNA from sediments. Previous work have extracted plant and animal DNA 400K yr old (permafrost). Such experiments allow to find out first/last appearance dates in sediments, which can be correlated to past biodiversity, history, climate change and human activity. A few weeks ago researchers have been able to go back 2M yr in Greenland. About 90% of the successfully extract DNA has no BLASTN hits. To improve yield they use mammalian mtDNA capture. What does differentiate ancient DNA from modern? It is shorter, C in single-stranded ends deaminate -> T (this is actually as a sanity check by counting nt substitution pero position). They are now able to extract hominid mtDNA from the soil even when there are no bones, as they have shown in the Denisova cave (https://www.nature.com/articles/s41586-021-03675-0). They have also managed to extract nuclear DNA in Galería de las estatuas, Atapuerca (https://www.science.org/doi/10.1126/science.abf1667) and distinguished two Neanthertal populations. What next? Her lab is now developing methods to improve field sampling, the wet lab and data analyses. With this toolbox we should be able to fill the gaps in the biodiversity history, particularly for plants. With high density sampling in sediment transects we should be able to estimate changes in allele frequencies with help from coalescent theory.

Samuel P. Hazen, University of Massachusetts Amherst. Talks about their experiments to find TFs that might be controlling cell wall thickening in Brachupodium distachyon., such as the bZIP named SWIZ. This is one among other TFS that are thigmotropic, relocating and locally expressing to the nucleous when the plant is touched/perturbed (this depends on calmodulin and Ca being released). Expression lasts about 1h. They find that 7-9K genes are differentially expressed (DE) upon touching the plants and they have also discovered a couple of DNA motifs for SWIZ using ATAC-Seq analysis. They have also done de novo discovery of motifs upstream of DE genes. Adding external GA hormone represses movement to the nucleous. There’s a preprint at https://www.biorxiv.org/content/10.1101/2021.02.03.429573v2.abstract

Melissa Bredow, Department of Plant Pathology and Microbiology, Iowa State University. Frost is still an important stress for crops despite global warming. Freeze damage starts by seed ice crystal that end up piercing cell membranes. Gradual expose to cold expressed ice-binding proteins (IBP) that protect membranes upon freeze. She uses brachy as model to study the protection provided by IBPs. There are seven IBPs in B. distachyon (BdIRI1-7), none in A. thaliana. These protein are only stable < 4ºC (disordered otherwise) and theire folds are different across species. Apparently is not just cold what matters, but also bacterial (Xanthomonas, Pseudomonas sp) ice nucleation proteins that favour freeze and membrane destruction. Their current modela is that BdIRI proteins actually bind to bacterial nucleation proteins to inhibit their function.

Todd Blevins, Centre national de la recherche scientifique, University of Strasbourg. Studies the role of RNA polymerase IV in brachy, which silence transposons by transcribing non-coding RNAs that drive AGO-based silencing, as reviewed in https://www.annualreviews.org/doi/abs/10.1146/annurev-arplant-093020-035446. Mutants of these genes (nrpd1) have reduced leaf elongation via regulation of cell production and via cell cycle exit. In addition, mutation causes higher expression of some genes, including bZIP TFs, which are silenced in the wild type. These have differentially methylated promoters. This varies across ecotyopes and depends on the presence/absence of a TE. They have screened methylated sequences with Illumina and Nanopore and found very comparable results, although ONT is superior when it comes to check individual TEs, as Illumina reads multimap.

Birkett Clay, USDA-ARS. Talks about integrating into https://breedbase.org Practical Haplotype Graphs built from exome data for wheat and barley. He uses code at https://github.com/TriticeaeToolbox/PHGv2 and imputation protocols at https://wheat.triticeaetoolbox.org/static_content/files/imputation.html. Imputation accuracy at the PHG in barley is > 93% if #markers > 2000. Details and converted VCF files are available at https://files.triticeaetoolbox.org . They display the resulting PHG with JBrowse (https://triticeaetoolbox.org/jbrowse). The PHG is built on a single reference genome, so you might need to select the appropriate reference to optimize imputation (or build a mosaic reference). Creating the PHG is computationally intensive, but the imputation is quite fast.

Karen A Sanguinet, Washington State University. She talks about buzz mutants that affect root biomass and hair formation in brachy. The A. thaliana ortholog rescue the mutant phenotype. She saw that BUZZ expression responds to N availability, although primary root growth is not N-responsive. It is expressed in the root epidermis.

Kapeel Chougule, Cold Spring Harbor Laboratory. Presents (PanOryza) efforts to consistently annotate gene models in the rice pangenome. Canonical isoforms are called with TRaCE (https://academic.oup.com/bioinformatics/article/38/1/261/6326792). At Gramene they have rice subsite and plan to build pan-gene indexes.

Andrew Olson, Cold Spring Harbor Laboratory. After a little history of the Gramene project (2022), he presents the pangenome sites (2021-22), which currently represent the larger bulk of new genes being added to Gramene (maize, rice, Vitis and Sorghum). He goes to summarize all the tasks involved in setting up and maintaining the sites, the import of data from Ensembl Plants (https://plants.ensembl.org) and Expression Atlas, and mentions they are now following the standards agreed at https://data.nal.usda.gov/ag-data-commons-collection-development-policy.

Sushma Naithani, Dept. of Botany and Plant Pathology, Oregon State University. She presents her work on curating plant reactome pathways using omic datasets (https://plantreactome.gramene.org). These pathways are linked to genes in Ensembl Plants and Gramene, which in turn often link to gene expression data. The curation protocols are illustrated at https://peerj.com/articles/11052. Currently they 126 species and 326 pathways, which have been project to 39K genes.

My turn. I presented our recent work "Building pangene sets from plant genome alignments confirms presence-absence variation", from the PanOryza project. The preprint can be read at https://www.biorxiv.org/content/10.1101/2023.01.03.520531v1 and code and documentation obtained here: https://github.com/Ensembl/plant-scripts/tree/master/pangenes.

Imagen

[Source: Agata]

Jonathan Cahn, HHMI-Cold Spring Harbor Laboratory. Talks about regulatory elements in maize inferred from diverse omics datasets (ie ChIP-seq, H3K4-me1) as part of http://www.maizecode.org, which follows ENCODE guidelines. Raw data can be downloaded, I cannot see the DNA motifs though. Superenhancers are delimited by methylated areas and enriched in H3K27ac and accumulate binding sites.The results of this project are described at https://www.frontiersin.org/articles/10.3389/fpls.2020.00289/full. Shows really nice plots made with https://cran.r-project.org/web/packages/ggalluvial

Sarah Dyer, EMBL-EBI. Talks summarizes the current status of the wheat pangenome at Ensembl Plants: https://plants.ensembl.org/Triticum_aestivum/Info/Strains?db=core. The main addition since last time I checked is that now wheat genes have a cultivar-based Compara section, where you can see orthology to genes in other pangenome wheats, ie: https://plants.ensembl.org/Triticum_aestivum/Gene/Strain_Compara_Tree?g=TraesCS3D02G273600;r=3D:379535906-379539827

Josh Clevenger, HudsonAlpha Institute for Biotechnology. https://www.hudsonalpha.org/khufudata/plant-improvement

On Twitter I heard about a talk I missed by Katie Jenike were she presented Panagram, K-mer based software for alignment-free visualization & analysis of pan-genomes. There’s code (https://github.com/kjenike/panagram) and even slideas at https://twitter.com/mike_schatz/status/1615440857980899328

17 de enero de 2023

Notes on Plant and Animal Genomes conference #PAG30 (III)

Monday 16012023

Imagen

Rajeev Varshney, Murdoch University. Starts by talking about sustainable, climate-smart crops such as legumes and the resources that are changing the way we breed them: marker-assisted breeding, expression atlases, improved reference genomes, and more recently pangomes and superpangenomes. He believes that unadapted germplasm will provide genes for future crops. They are currently exploiting all these tools, for instance to tap on global variability of chickpea (n=3,366). Using the toolbox he and collaborators have mapped 20-50 traits in several legumes and have introgressed selected alelles in elite lines and evaluated them in the field. To streamline genotyping around the world they put together a low-cost high-throughput genotyping project that has benefited dozens of crops. Moreover, they have trained breeders with 15+ meetings around developing countries (52 MSc & PhD students as well). He then goes to show many examples of improved varieties produced in collaboration with local breeders that are now resistant to diseases or tolerate drought much better than checks (see letter at https://www.nature.com/articles/s41587-021-01079-z). What’s the future? Haplotype-based breeding to drive optimal idiotypes, genomic prediction, spatial transcriptomics, machine learning => fast-forwards breeding (https://www.cell.com/trends/genetics/fulltext/S0168-9525(21)00226-2). He paid homage to green revolution heroes, he’s certainly one of them.

In poster session I discovered a poster by MA Lemay, U Laval, where he does GWAS on a soybean panel and compares the performance of K-mer based analysis (https://github.com/malemay/katcher , https://github.com/malemay/gwask) to that of GWAS with explicit SV-indels. He finds that K-mer GWAS performs better than SV and comparable to SNP-based GWAS. He actually used https://pubmed.ncbi.nlm.nih.gov/32284578 to perform GWAS on K-mers.

Also read the poster of Merrit Kaipho-Burch, Cornell University, where she summarized her experiments for the estimation of the effect of TE insertions/deletions on gene expression of maize inbreds and hybrids. The work required correcting for kinship. She concluded that 14% of the tested genes show expression changes, but only 0.9% of TE events had consequences.

Chandler Sutherland, UC Berkeley. Presents her work on plant NLR immune receptors. In A. thaliana they found that there are two classes of NLRs, with low and high amino acid Shannon diversity (https://academic.oup.com/plcell/article/33/4/998/6119334). Can they be identified using epigenomic features? In A. thaliana leaf they find that highly variable (hv) NLRs are more expressed than non-hv (apparently in contradiction with https://www.nature.com/articles/s41467-017-02292-8), and are also less gene-body methylated. They are also closer to TEs and often cluster in the genome.

Dan Sloan talks about mutation rate in plant mitochondria, which apparently are less mutable that other replicons (mt<cp<nucleous), as a result of the action of MutS Homolog 1 (MSH1, https://www.pnas.org/doi/10.1073/pnas.2206973119). Unpublished data suggest that mutation rate is negatively correlated wit the number of copies of the mitochondrial DNA.

Michelle Stitzer, CURRENTLY AT Cornell University. She describes a series of experiments at the Ross-Ibarra lab at UC DAVISto measure the mutation rates (genic, intergenic, TEs) in maize after comparing individuals across generations and even two inbred B73 genomes assembled 4-years apart.

Daniel Koenig, University of California-Riverside. Uses A. thaliana populations to study how variation arises through mutation. His lab looks in particular at CNV of mutator loci and take a reference-free approach with K-mers. They use GWAS and find candidate genes that explain CNV of K-mers, as found a couple of years go in rice (https://www.nature.com/articles/s41467-018-07974-5).

Daniela P. Quiroz, UC Davis. She talks about targeted DNA repair in rice; when repair fails mutations arise. She found that mutation rates were lower in genomic regions marked by H3K4me1, a histone modification found in the gene bodies of actively expressed and evolutionarily conserved genes in plants. This compared to other methylations types in K4 (0, 2, 3). The repair mechanism involves protein domain Tudor (https://www.ebi.ac.uk/interpro/entry/InterPro/IPR002999). This work is published in https://www.biorxiv.org/content/10.1101/2022.05.28.493846v3.

16 de enero de 2023

Notes on Plant and Animal Genomes conference #PAG30 (II)

Sunday 15012023

Mario Stanke, Institute for Mathematics and Computer Science, University of Greifswald. He reviews existing tools to identify coding and non-coding genes based on dn/dS and sequence composition. These help discriminate non-coding. Instead, the recent ClaMSA is a differentiable TensorFlow model which can be trained on any objective criteria, as opposed to PhyloCSF (likelihood) or codeml (omega). It was a student assignment project. Code and documentation can be found at https://github.com/Gaius-Augustus/clamsa. In their benchmark with vertebrates and fly exon codon alignments it makes less errors that PhyloCSF and codeml. It can be used to scan genomic regions to discover protein-coding frames.

Mihaela Pertea, Department of Biomedical Engineering Johns Hopkins University. She talks about recent work that evolves StringTie (https://ccb.jhu.edu/software/stringtie) and integrates both short (rarely span more than 1 exon) and long (high error rate) RNAseqs to assemble transcripts. Long-read on their own can create very complex splice graphs which impaired StringTie v1. She mentions the work of Lima et al 2020 to justify why simply correcting errors in long reads is not a good idea, as many isoforms are lost (https://academic.oup.com/bib/article/21/4/1164/5512144). StringTie2 can successfully use both types of reads and can handle noisy long reads. In her tests hybrid data produces much better transcripts that either individual inputs and does not need correcting long reads (not worth it).

Tomáš Brůna, DOE Joint Genome Institute. Presents GeneMark-ETP, a software for protein-coding gene annotation. He shows benchmarks with C. elegans, A. thaliana and D. melanogaster and then with more complex genomes. The latest version performs better than BRAKER particularly in genomes with heterogeneous GC regions, such as mouse. Takes 1-3 days to run.

Lars Gabriel, Institute for Mathematics and Computer Science, University of Greifswald. Presents BRAKER3 (https://github.com/Gaius-Augustus/BRAKER) for the annotation of eukaryotic genomes from short RNAeq reads and protein sequences. This latest version uses HISTA2, StringRie, GeneMarkETP and AUGUST among other tools. The only plant tested is A .thaliana. They also us TSEBRA to select plausible isoforms. In they benchmarks they report results at the exon, gene and transcript level. It takes almost 2x the time of BRAKER2 to run. They haven’t used with long reads yet. Realted note: in questions somebody says that their IsoSeq libraries contain a lot on retained introns.

I missed Roderic Guigo's talk, but there's this tweet: https://twitter.com/Campbell_JD_PhD/status/1614682387648221184

Stephane Rombauts, VIB-UGent Center for Plant Systems Biology. Talks about their pangenome implementation for genomes of the same species or genus. Reviews gene-centric vs sequence-based graphs vs object-based sequence feature pangenomes. They go for the last option, resembling Sandra Smit’s approach but using vaticle (https://vaticle.com) instead of Neo4j as DB engine.

Robert J. Henry, Queensland Alliance for Agriculture and Food Innovation, University of Queensland. Talks about their current program on sequencing wild relatives of crops such as macadamias, bananas, pidgeon pea, coffee, mango, etc, and cereals (wild rice, sorghum genus,). These are mostly untapped species that could be eventually domesticated if needed. He mentions that rice domestication involved only two mothers (plastids) but happened many times. In questions he says that takes only 1yr to domesticate these plants. Main problem is seed shattering. Many of the cereal species have larger grains than currently cultivated plants.

Ilga Mercedes Porth, Laval University. She talks about poplar improvement in Canada. They have a panel of 1K genotyped individuals. 2/3 variants hava MAF<0.05. They have found 8K gained stop codons in 6K genes. Heterosis is used in breeding as well (to accelerate growth for instance), suspected to be related to structural variants. They have looked at local adaptation using SNP, SVs and more recently ploidy. It seems that in polyploids certain transposon families are over-expressed compared to diploids (https://nph.onlinelibrary.wiley.com/doi/full/10.1002/ppp3.10297). As happens in other species, triploids seem to accumulate in areas prone to drought. They have also found that certain multigene families are partially responsible for adaptation (https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.14836).

Min Zhang, University of California Irvine. She talks about network reconstruction out of cis-eQTLs. She explains this has been done in the literature by building linear models, but these can be unfeasible at the genome scale for the large number of parameters required. Too tired to follow the algebra, she shows examples on simulated and yeast data.

Nils Stein, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK). Talks about accessing the secondary genepool of barley, which includes Hordeum bulbosum, which is no host for most barley pathogens. Complements a talk by Martin Mascher yesterday. Shows previous results of exome capture of introgression lines, where regions that accumulate polymorphic read mappings delineate introgressed Hbulb segments, usually telomeric. Now they are using PacBio HiFi reads to resolve introgressions by assembling phased haplotypes and building pangenome graphs of particular loci, such as a locus for virus resistance.