17 de octubre de 2019

Plant Genomes in a Changing Environment 2019 (II)


Exploring and utilization of rice resources with broad-spectrum resistance against blast disease (Xuwei Chen, Sichuan University, China)
He speaks about their sampling effort to identify alleles in rice germplasm that confer resistance to blast disease. A survey of 3K sequenced rice genomes discovered through GWAS an allele in cultivar Digu with MAF=0.10.  It is a SNP in the promoter of a Zinc Finger TF. The results are published in https://www.ncbi.nlm.nih.gov/pubmed/28666113 . He then moves on to their work on transcription factor IPA1 (Ideal Plant Architecture) , that represses improductive tillers and enhances immune responses. Again, the selected allele (ipa1-1D) carries a mutation that breaks a miRNA site. The results can be found in https://www.ncbi.nlm.nih.gov/pubmed/30190406

Do environmental changes induce retrotransposon expression in plants? (Flavia Mascagni, University of Pisa, Italy)
She is conducting work to determine to what extent retrotransposons (RTs) are expressed in response to environmental changes in sunflower. Upon treatment with hormones and chemicals, they observe higher expression in the leaf than in the root, with some genotypes more prone than others. Overall they found 134 differentially expressed RTs. Then they used a similar approach in poplar, again using public cDNA libraries. Some genotypes are more prone than others to show RT expression in response to treatment. In both species, of the few differentially expressed RTs, most belong to the Copia superfamily.

Functional genomics of European hazel (Corylus avellana L.) to address an emerging, destructive powdery mildew pathogen (Stuart Lucas, Sabanci University, Turkey)
For their search of alleles conferring resistance they have completed a genome assembly yielding 11 scaffolds (370Mb) for a predicted size of 380Mb. They are now annotating MLO and NLR genes. As for MLO genes, 5 clustered copies are good candidates for disease resistance. For NLR they are using long-reads to sequence end-to-end copies, on a pool of 363 genes with little overlap across populations.

Natural genetic variation in the response of Arabidopsis to Plasmodiophora brassicae infection (William Truman, IPG PAS Poznan, Poland)
He describes this obligate pathogen protest (clubroot) that affects a wide range of Brassica crops. Some of their previous results are at http://www.plantcell.org/content/30/12/3058 . They are testing candidate resistant alleles in Arabidopsis thaliana ecotypes. Some are being further studied in Y2H assays.

Daniele Filiault, Gregor Mendel Institute, Austria
She describes her Arabidopsis thaliana experiments in a latitudinal gradient from Germany to Sweden. The measure survival and slug susceptibility and observe local adaptation, with germplasms from S latitudes doing badly when planted up North. They then do GWAS and separate intra-specific and Genus-specific variants.

Pathogen-informed strategies for sustainable broad-spectrum resistance in crops (Bart Thomma, University of Wageningen, The Netherlands)
He talks about how we can learn from pathogen molecules to obtain resistant crops. He shows this video of tomato fungal pathogen Verticillium dahlie: https://vimeo.com/222178738 . He then shows haplotypes of different isolates of the pan-genome and refers to https://genome.cshlp.org/content/early/2016/07/12/gr.204974.116 and https://onlinelibrary.wiley.com/doi/full/10.1111/mec.15168 . Each isolate has 10% lineage-specific non-core genes and they are apparently more conserved across species of the Gens than core genes. This could be due to horizontal transfer (unlikely), selection (unlikely) or reduced error replicons (Hi-C experiments suggest co-localized in nucleous, unmethylated, enriched in TE, etc). Their most recent manuscript is https://www.biorxiv.org/content/10.1101/528729v1 . The find that a single effector gene in the fungus is responsible for pathogenicity, and when removed infection does not occur/progress. Conversely, when transformed into non-pathogenic species of the genus they now cause a disease in tomato.
He ends with another story, where they have seen that the fungus produces an antimicrobial protein (VdAve1, is that an antibiotic?) that alters the plant root microbiome and ultimately facilitate infection.


Beyond single genes: receptor networks underpin plant immunity (Sophien Kamoun, The Sainsbury Laboratory, UK)
Most plants are resistant to most pathogens, they have a very efficient immune system with Pattern recognition receptors (PRR) and NLR receptors. Pathogens secrete effectors to modulate plant defenses (https://www.ncbi.nlm.nih.gov/pubmed/23223409). Together, plant and pathogens coevolve and drive diversification. The NLR diversification is much larger in plants than in mammals (human vs muse, tomato vs coffee, 100Myr). In fact, ultimately, pathogens alter plant genomes (gene-for-gene model). He proposes to move from the single gene paradigm to the immune network, incorporating redundancy, evolvability, robustness and epistasis (https://www.ncbi.nlm.nih.gov/pubmed/29930125).  

Plant NLRs are typically made of three domains: [CC|CCR|TIR]NB-ARC-LRR. They form resistosome complexes that integrate in the membrane (https://www.ncbi.nlm.nih.gov/pubmed/30948527). These genes cluster in the genome (https://www.pnas.org/content/114/30/8113). This would be the most ancestral network, found in chr5 of sugar beet and conserved in other plants (such as tomato?):

A fifth of monocot/dicot NLR N-termina share a conserved MADA motif (MADAxVSFxVxKLxxLLxxEx, https://www.biorxiv.org/content/10.1101/693291v1). The CC domain diversified and became non-functional in many cases.

Using data science to understand plant gene regulation (Daphne Ezer, University of York, UK)
She starts by asking how do we know that our experiments are relevant in the real world? We need to correct for confounding variables and always put the data in its context, right? For instance, for bulk RNA-seq you must sync plants/treatments to make sure you are comparing tissues of the same age, same circadian point and tissue ratio. She has developed tools for these tasks, such as https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2717-5

Structure, stability and phenotypic relevance of DNA methylation in Thlaspi arvense natural populations (Dario Galanti, University of Tubingen, Germany)
He talks about his PhD project, which is concerned about heritable methylation as a function of location of origin, and how that affects phenotypes. He is working with pennycress (Thlaspi arvense). Populations sampled across Europe. I'll see if we can load that genome in Ensembl.

Nanopore Direct RNA Sequencing Maps the Arabidopsis m6A Epitranscriptome (Matthew Parker, University of Dundee, UK)
He starts by enumerating the theoretical advantages of sequencing native RNA directly, instead of sequencing cDNA. They are using it in several projects. The error rate is 5-8% which is not a problem for polyAs, but it is for short exon annotation and intron boundaries. In those cases they still use Illumina to correct the long reads.

Improving gene regulatory network inference from ATAC-Seq data using an ensemble motif mapping approach (Marc Jones, VIB / Ghent University, Belgium)
This talk complements yesterday’s talk by focusing on ATAC-Seq. They use ATAC read depth to restrict genome regions where known motifs can be scanned to discover relvant cis regulation.

No genome required: Finding genetic variants associated with plant phenotypes without complete genome information (Yoav Voichek, Max Plank Institute for Developmental Biology, Germany)
This talk complements yesterday’s but with a focus on the biology and the comparison between GWAS based on SNPs and kmers. He shows a Venn plot to show that noth approaches miss have a large intersection. However, there are some SNPs associated not found with kmers and also the converse (structural variants, regions missing in reference, etc). He is asked how this would work with heterozygous genomes.

16 de octubre de 2019

Plant Genomes in a Changing Environment 2019 (I)

Hi, these are my notes of the talks I attended of the first day of the Plant Genomes in a Changing Environment 2019 conference.

Jump to day 2 or day 3.


Unlocking the polyploidy potential of wheat through genomics (Cristobal Uauy, John Innes Centre, UK)
Phenotypes of agricultural importance are complex, with continuous gradients instead of discrete states, and differences being often difficult to tell from noise. This only gets worse with polyploids, where QTL effects are subtler than in diplids. He explains that Arabidopsis thaliana is as far from wheat as platypus to human. He then talks about the rich wheat genomic resources which are bringing people to work in this species perhaps for the first time. All these resources are documented at http://www.wheat-training.com

He then moves to describe a particular example of combining these tools, using  https://bioconductor.org/packages/release/bioc/html/GENIE3.html to predict target genes of wheat TFs using around 900 RNAseq experiments. They see no evidence of TFs preferring targets from same (A,B,D) subgenome, even when the D subgenome has eveolved independently for thousands of years with respect to A & B. He’s asked how difficult it is to map & genotype specific subgenomes, he says with Polymarker you can enrich on subgenome-specific regions. They use https://pachterlab.github.io/kallisto to assemble transcripts and have validated in chr-deletion lines that the transcript from those missing chromosomes are not expressed.

ENSEMBL plants – Visualizing the Wheat Genome in Ensembl Plants (Guy Naamati, EMBL-EBI, UK)
He starts with a quick tour of http://plants.ensembl.org/Triticum_aestivum. He then explains the gene trees for wheat produced with https://www.ensembl.org/info/genome/compara . Then he moves on to the wheat variant collections and the TILLING mutants and the KASP markers. Finally he also mentions our preparative work with a dozen wheat cultivar assemblies from http://www.10wheatgenomes.com . He concludes showing off the Ensembl Outreach team that do Ensembl training around the world. He is asked why gene models keep changing across releases and whether it is possible to know the most abundant isoform (canonical?) . He’s also asked how the 10+ varieties are going to be loaded. Another question is how low sequence identity orthologues are managed.

Expression atlas - Submission, archival and visualisation of plant sequencing data (Nancy George, EMBL-EBI, UK)
She guides us through a submission from start to end: i) annotate metadata with https://www.ebi.ac.uk/fg/annotare, ii) import expression data with https://www.ebi.ac.uk/arrayexpress and https://www.ebi.ac.uk/gxa (min 3 replicates, biological question, reference in Ensembl). These steps eventually result in baseline expression reports such as http://plants.ensembl.org/Triticum_aestivum/Gene/ExpressionAtlas?g=TraesCS3D02G273600;r=3D:379535906-379539827
She then moves on to say how the plant community are still not embracing the single-cell sequencing efforts due to technical challenges.

Benchmarking and development of an ensemble motif mapping approach to improve gene regulatory network inference (Marc Jones, VIB / Ghent University, Belgium)
He introduces TF binding motifs and how they can be used to scan genome sequences to predict genomic sites. They compared different motif aligners, including MOODS, cluster-buster, FIMO and matrix-scan. The observe that FIMO sites tend to match more often with those from other tools. They then compared site predictions to ChIPseq read depth, in order to compute precision and recall. FIMO comes best in terms of precision and worst on recall. When they look at the first 7000 sites, their 4 tested aligners behave similarly. Eventually they combined FIMO and cluster-buster, as they report many sites missed by the others. The full set of results is described at http://www.plantphysiol.org/content/181/2/412

No genome required: Finding genetic variants associated with plant phenotypes without complete genome information (Yoav Voichek, Max Planck Institute for Developmental Biology, Germany)
He talks about doing GWAS analysis with K-mer distributions instead of mapping to a reference genome. They start with a PAV table of 31-mers across genotypes. That table can be used to characterize a pan-genome after removing low depth kmers, as they did with 1000 A. thaliana genome sequence sets. From that they have developed a GWAS pipeline for k-mers which accounts for population structure. They assign genomic context to k-mers by i) mapping to ref genome, ii) LD and iii) assemblying reads containing the k-mers and then mapping. The code will be released soon in https://github.com/voichek/kmers-gwas

The 4th dimension of Gene Regulatory Networks: TIME (Gloria M Coruzzi, New York University, USA) 
She talks about the time dimension in regulatory networks with the diagram on the left  from https://europepmc.org/articles/PMC4558309. She proposes we should be handling TFs binding to DNA just like enzymes, with enzymatic kinetics. She tells 3 stories on A. thaliana.


The Just-in-TIME approach allowed to study genes expressed in response to N a as function of time with enriched cis elements and GO terms that you would have missed if analyzed in bulk https://www.pnas.org/content/115/25/6494.short. They apply ML to identify the TFs binding to those cis elements using time series gene expression and they validated the predictions with 7 TF perturbations, that affect 2K targets.

Hit-and-Run is another approach to study transient TF binding controlled by adding dexamethasone (developed by José Álvarez et al,  soon in Nat Comms).  She stresses that binding is a poor predictor of regulation, as most binding does not affect expression, and instead in many cases they can’t catch ChIPseq binding events that they know to happen. She also shows results of TFs binding to the 3’UTR. In order to catch those transient-binding TFs they used a new protocol called DamID. It turns out that most transient events are very early in the N response, while the stable binders tend to be late responding. She does not know whether transient sites are bound with less affinity, but she notes they do are enriched in neighbor sites from other TFs.

Finally, they performed network walking to connect transient TFs to their targets in A. thaliana, which they published at https://www.nature.com/articles/s41467-019-09522-1. It is called net walking because they walk from primary TFs, then to secondary regulated TFs and finally to indirect targets. They are now developing a method called OutPredict to introduce priors in their network inference.
  
Genetic and genomic studies of climate adaptation and genotype-by-environment interaction in switchgrass (Panicum virgatum, Tom Juenger, University of Texas at Austin, USA)
Talks about the evolutionary genetics of plant adaptation citing https://www.ncbi.nlm.nih.gov/pubmed/21550682 . His system is the C4, perennial, polyploid, wind-pollinated P. virgatum, related to http://plants.ensembl.org/Panicum_hallii_fil2.

They have resequenced 950 individuals 45x to map against a V5 PacBio assembly, yielding 46M SNPs. They belong to 4 populations. Their experiment sites span 24.3 degrees of latitude across 16 locations. They have published several articles, such as https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100855 . They have been able to assign % of genetic variance to climate (such as mean temp of driest quarter) and geography and find SNPs associated to them. They conclude climate has been a stronger driver of adaptation than genetic isolation, and they observe widespread QTL x E interactions for local adaptation.