25 de octubre de 2018

Plant Genomes in a Changing Environment (II)

Now for the second day.



Etienne Bucher, INRA, France
I miss the beginning of the talk but still get the main message: you can control the efficiency of retrotransposon mobilization in plants by exposing plants to heat (stress) and drug-inhibiting RNA pol II, which has a key role on transposon defense (RNA-directed methylation). The key paper is https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1265-4. They are using controlled [drugs: α-amanitin and zebularine] to create new variants and to select them in the field with rice and soybean. He has set up a company called epibreed to carry out this kind of experiments, but he insisted the approach can be used for free for research purposes.

Holger Puchta, Karlsruhe Institute of Technology, Germany
He takes us to a nice overview of double-strand breaks in plant genomes, and then moves to CRISP-Cas9 systems, where they initially the got 15% (heritable mutation) efficiencies in Arabidopsis thaliana. And now, using S. aureus Cas9, they achieve 90% efficiencies. They have tried several approaches for in planta gene targeting (initial idea summarized in  http://www.pnas.org/content/early/2012/04/19/1202191109) and are improving their efficiency so that they can use it to routinely knock out genes in A. thaliana (http://www.pnas.org/content/113/26/7266.short). He discusses that by combining double-strand breaks it is possible to induce recombination in centromeric regions, where meiotic recombination is extremely unlikely. In A. thaliana, out of 200 ds-break you get about 10 cross-over events. He is funded by ERC.

Sophie Harrington, John Innes Centre, UK
She talks about TILLING to study wheat senescence. They do EMS TILLING populations and sequence captured exons. She shows a nice figure of Ensembl Plants where this kind of data is readily available for users. She then introduces NAC transcription factors and in particular the NAM factors related to senescence. They use tetraploid wheat to study NAM-A1, because it´s single-copy there. By phenotyping EMS populations they see a particular amino acid substitution induces a significant delay in senescence in the field in two environments. Using yeast two-hybrids (Y2H) they believe these mutations impair NAC dimerization. She mentions a paper describing the NAC family in wheat (https://www.ncbi.nlm.nih.gov/pubmed/28698232). They used chromosome sorting to isolate a chromosome harboring a region with a clear allele frequency shift linked to senescence, they are working on sequencing that region. She gets several questions regarding dominant mutants in wheat, and how the dominant nature relates to the number of copies of the mutated regions.

Youssef Belkhadir,GMI Vienna, Austria
He talks about the molecular logic and emergent properties in receptor-receptor interaction networks around plant signaling. There are 400 receptor kinares (RKs) in Arabidopsis thaliana. They have diverse extracellular domains (ECDs). He shows nice cartoons of large & short Leu-rich ECDs docked together with a ligand and triggering intracellular phosphorylation and presents their approach to high-throughput screen LRR domains, as published in https://www.nature.com/articles/nature25184. They did confirmation Y2H experiments and found and agreement of 57% for high-confidence short-to-long LRR interaction predictions. By using network dissection, including page rank, they find that sort LRR proteins are more frequently central nodes than long LRR proteins.
He also shows data from an A. thaliana diversity panel (about 600 lines) used for large-scale root phenotyping assays of plants treated with brassinosteroids. Subsequent GWAS analyses suggest several LRR genes to explain the differences observed.
He mentions that BAK1 receptor is 100% conserved at the amino acid level in over 1K A. thaliana lines. He mentions that absence genotypes of particular LRR genes were confirmed by PCR against the suspected genome. They didn´t do the actual annotation; instead this was done at the group of Magnus Nordborg.

Anne Osbourn, John Innes Centre, UK
She talks about antimicrobial compounds (such as avenacin) synthesized at the roots of Avena plants. The responsible pathway is actually composed of several neighbor genes which are all under concerted expression, with a root-specific promoter (http://www.pnas.org/content/111/23/8679). They have a contig of this 720Kb region of the genome and they believe this cluster is not conserved in Brachypodium nor in wheat.
She mentions that many metabolic gene clusters have been reported in both monocot and dicots, that no horizontal gene transfer from microbes has been demonstrated and that probably their genomic co-localization is linked to their regulation and epigenomics (https://www.ncbi.nlm.nih.gov/pubmed/26895889). They have developed transient expression systems to test these metabolic clusters, both natural and synthetic, in Nicotiana leaves and obtained in some cases gr-scale triterpenes productions (https://www.ncbi.nlm.nih.gov/pubmed/28687337).
She then describes the thalianol pathway in A. thaliana, which was the first operon-like they ever predicted, and other posterior examples, such as http://www.pnas.org/content/114/29/E6005. She also shows data of rhizosphere composition changes in mutants on these pathways. They have developed a tool for predicting metabolic clusters: http://plantismash.secondarymetabolites.org

Matteo Dell Acqua, Scuola Superiore Sant'Anna, Italy
He talks about the identification of candidate genes for maize leaf development using tools such as GWAS, eQTL and precision phenotyping. He emphasizes the need to integrate approaches due to the observation that most alleles have small effects, with only a few major effect genes whatever the complex trait under study. He shows correlations among gene expression values and leaf traits, as well as GWAS-derived SNPs associated to the same traits.
He also shows that for eQTLs, the majority of expression levels analyzed are associated to remote cis & trans locations (matrix of expressed gene position vs eQTL position, cis are in diagonal). They focus on cis SNPs  found for several traits, and find several genes encoding vacuole pumps. He mentions the challenge of pericentromeric regions that have high linkage disequilibrium, that produce artificial segments with consecutive eQTLs. They use also WGCNA and compute correlations between modules and phenotypes, finding that some have positive correlations while others are actually negative.
He concludes by summarizing that RNAseq data are very valuable to do eQTL analyses and to produce markers.

Ming-Jung Liu, Academia Sinica, Taiwan
She starts by saying that Academia Sinica is currently recruiting and moves to talk about regulatory divergence in wound-responsive gene expression between domesticated (lycopersicum) and wild (pennellii) Solanum species. She expends some time discussing the tradeoff between growth and wound stress tolerance in wild species. They identified putative cis regulatory elements enriched in clusters of genes related to wound responses, which correspond to G-box and W-box elements, and are enriched in upstream regions immediately before TSS positions. They then check whether these cis elements are conserved between both species and find that most are conserved but a good fraction are actually non conserved, unique to each species (http://www.plantcell.org/content/early/2018/05/09/tpc.18.00194).

Sally Aitken, University of British Columbia, Canada
She talks about climate adaptation in conifers, which are currently experiencing drought and massive death at British Columbia. She talks about the increasing frequency of extreme climate events, added to the warming trends. (Tree) seed and breeding zones based on local populations no longer match genotypes with climates. Mutation rates in trees are low per year but high per generation. They have estimated that climate is chainging at a speed of 70km/yr, while paleobiology evidence suggest trees have in history travelled at 0.1km/yr. She describes their AdapTree project which is designed to manage this issue in W Canada with assisted gene flow. They have not seen population variability in drougt/heat response, only in cold hardiness. As they don´t have access to good assemblies they used exome capture and SNP arrays to do Genome to Environment Association with bayenv2 and standard GWAS. She explains that the population structure of conifers actually correlates with climate gradients, so that by removing pop structure you actually miss potentially bona fide adaptation loci. So they decided to not remove pop structure and instead took only SNPs in excess of the background distribution of SNPs per gene (http://science.sciencemag.org/content/353/6306/1431). They found 47 candidate genes common to pine and spruce populations and later work was done to find correlating haplotypes, instead of individual SNPs, to be used as markers (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1545-7).

I missed “Reinforcing plant volutionary genomics using ancient DNA” by Hernan Burbano (MPI Tübingen, Germany) and “A major QTL for grain weight in wheat is associated with increased grain length and cell size” by Jemima Brinton (John Innes Centre, UK).

Esther van der Knaap, University of Georgia, USA
She talks about their work on the mechanisms underlying morphological diversity in tomato, which is largely explained by four gene families, including Ovate and the OFP family members. OFP have been shown to interact with TFs, to act as repressors and to affect cellular localization of other proteins. They observed that OFP20 interacts with a series of proteins in Y2H assays and further refined the list by doing Cas9 knockout mutants and found that the pear/round shape is related to patterns of cell division in the fruit. She mentions a collaboration with Toni Monforte (UPV, Spain) where they found another OFP family member responsible for melon fruit shape.

Benjamin Brachi, INRA, France
He talks about natural variation of leaf secondary metabolites, and the underlying genetics, in European white oaks (Quercus robur). They have a reference genome and a genetic map made from trees planted in 1999. They do mass spectrometry from leave extract, cluster the compounds/pseudomolecules observed and estimate their replicability and heritability. He then explains a study of 9 populations of Quercus petrae from around France, where they see that population provenance does explain a very small part of the metabolites analyzed, and a fraction of those actually have bimodal/binary/PAV patterns: they are either produced or not at all. I think he believes the latter have a genetic explanation, while the rest probably respond largely to the environment.
Andrew Gloss, University of Chicago, USA
Andy talks about plant genotype × herbivore genotype interactions using 288 ecotypes of Arabidopsis thaliana, with the goal of discovering the genetic architecture of resistance to herbivory. The chosen herbivor is a fly related to Drosophila. They measure leaf damage and perform multi-trait GWAS, classifying SNPs as common genetic SNPs and SNPs with effects that depend on the plant population studied. He then focus on gene PBSL, which underlies clinal variation in size from N to S Europe.

Sarah Schiessl Weidenweber, Justus Liebig University Giessen, Germany
She talks about miRNA signaling under drought stress in winter lines of alopolyploid Brassica napus. How does drought affect flowering? It delays flowering and reduces yield. Their hypothesis is that the flowering networks senses drought stress by means of RNAi. They put their plants in containers to get realistic soil drying compared to pots, sampled tissue and finally did WGCNA analyses first with RNAseq to define modules and then with small RNAs looking for those correlated with modules defined earlier. Now they are studying in PCR experiments the expression of the candidate smallRNAs and they have observed a high variation across genotypes.

Adrien Sicard, SLU Uppsala, Sweden
He talks about the convergent evolution of flower morphology after the transition to selfing in the genus Capsella. He introduces the selfing syndrome of repeated morpho evolution in plants, which tend to reduce petal size by reducing the number of petal cells, which they also see after Principal Component Analysis of transcriptomes of selfing and non-selfing species. They have a strong QTL for petal size in a population of two selfing species. When the candidate gene is mutated, probably in the promoter, they see pleiotropic effects.

Plant Genomes in a Changing Environment (I)

Hi,  
the first meeting on "Plant Genomes in a Changing Environment" kicked off today at the Wellcome Genome Campus in Hinxton, UK. It is exciting to be here and find out this is probaby the first ever  plant genome meeting in an otherwise world-famous genomics venue.

 
I will post here my notes on the talks I attend to.


Caroline Dean, John Innes Centre, UK
She presents the different flowering habits of Arabidopsis thaliana accessions (rapid cycling, winter facultative & obligate winter-annual) and takes us to the current knowledge of the quantitative nature of winter recording in the FLC locus, a MADS repressor of flowering which is the target of a polycomb-mediated epigenetic switch. In addition, she summarizes the mutually exclusive non-coding FLC transcripts found to be cold induced, such as COOLAIR (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234544, https://www.nature.com/articles/ncomms13031). After flowering, the epigenome state of FLC is restored by a demethylase. COOLAIR is actually a Brassicaceae-conserved secondary structure RNA molecule substantially affected with a single SNP affecting splicing. She says that this ncRNA folds and stays in place, blocking physical access to that locus. She adds this mechanism is conserved in humans and Brassicaceae, and would expect the same in monocots.
By the way, COOLAIR non-coding transcripts seem to be annotated in Ensembl Plants: https://plants.ensembl.org/Arabidopsis_thaliana/Gene/Summary?g=AT5G10140;r=5:3173382-3179448;t=AT5G10140.2;db=core

FLC locus accumulates H3K27me3 histones with exposure to cold, setting up a bistable state of inducing/repressing chromatine modifications. This balance spreads across tissues and cell populations, including the root tip. This memory is sustained by the own chromatin in cis (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450441).
She then presents the RY cis elements in intron 1 of the FLC locus which is repressed by VAL1 (https://www.ncbi.nlm.nih.gov/pubmed/27471304) to trigger polycomb nucleation (http://floresta.eead.csic.es/footprintdb/index.php?tf=ea4a1835a3360403cd07b75528829572).
When they looked at 80 world-wide populations they found distinct FLC haplotypes, which compared to each other in a common background explain a linear vernalization requirement.
She claims that in A. thaliana vernal days are actually afternoons with temperatures < 15 °C (https://www.nature.com/articles/s41467-018-03065-7).  


Doreen Ware, USDA and Cold Spring Harbor, USA
She talks about a maize pangenome browser currently under development. She explains that growers require a platform that would allow easy knowledge transfer from some plants to others, so that it can be used in breeding. She talks about CNV genes with agronomical impact, such as transporters providing Al tolerance (http://www.pnas.org/content/110/13/5241). She shows GRAMENE neighborhood conservation display modes based on Ensembl Compara data: 




Then she describes their current efforts PacBio-assembling 26 maize NAM parents, with SMRTlink assembly performed in the cloud (DNAnexus) and sped up 360x. The resulting assemblies are robust, with N50 > 34Mb.
She terminates with a quick overview of transcriptome profiling for heterosis-inspired work, with the aim of phasing isoforms, which is important for reconstructing heterozygous loci (https://www.nature.com/articles/ncomms11708).

Eric Schranz, Wageningen University, The Netherlands
Talks about conservation and divergence in relative gene order of plant and animal genomes using network-based synteny analysis. He explains genome territories and why gene context matters with multiple examples of Hox genes and body layout plans. He claims that we have a genomic hairball problem when looking at synteny, and that networks with edges~synteny can simplify the problem, allowing PAV and homeologues to be integrated easily (https://www.sciencedirect.com/science/article/pii/S1369526616302230).
He also explains phylogenetic profiling and how they used to find MADS box genes which are syntenic in all angiosperms but not in particular groups such as crucifers or monocots (http://www.plantcell.org/content/early/2017/06/05/tpc.17.00312).
He also explains that they´re doing a mammal vs plant synteny analysis. Overall, mammal genomes are syntenic, while plant genomes are not. This work is under review at PNAS. They do find family specific conserved syntenic blocks and a few, photosynthesis & clock-related, angiosperm-conserved genes.

John Vogel, University of California, Berkeley, USA
John talks about the pan-genome of Brachypodium distachyon and its implications for polyploid genome evolution. He describes the main findings of the Gordon et al paper (https://www.nature.com/articles/s41467-017-02292-8). He mentions that there is currently no way of displaying the pangenome efficiently in phytozome, and he looks forward to the new developments of Gramene.
He then introduces B. stacei and the resulting B. hybridum. He shows the high synteny between B. hybridum subgenomes and the diploid parental species, as well as the SNP-based tree suggesting at least two hybridization events. Then he shows k-mer plots suggesting that D-citotype B. hybridum (older) lines contain unique k-mer composition.
He then moves to the analysis of foundation effects in the hybrids, but shows that the hybridum + parental pangenome is not significantly different to the individual parental pangenomes. Finally, he shows dNdS plots to show that both subgenomes are still under selection.
M Morgante comments that this data is probably not compatible with a epigenetic shock post-hybridization.

Jae Young Choi, New York University, USA
Jae could not attendand was replaced by an unnamed researcher from the group. She starts by introducing that besides transposable elements (https://www.ncbi.nlm.nih.gov/pubmed/25917896), tandem repeats are important drivers and markers for plant diversity. The talk is actually about natural variation in telomere repeats, which essentially are a major plant satellite, and their correlation with flowering time. They work with 100-mers of Oryza species, which include telomeres. In fact they see that O. sativa indica has significantly larger telomeres than ssp. japonica, and that correlates negatively with days to flowering.

Gabriele Magris, University of Udine, Italy
Gabriele gave a very nice and comprehensive talk on the characterisation of the pan-genome of Vitis vinifera using NGS with a special focus on collinear genes that have gained or lost a neighbor transposable element (TE) affecting their expression. My battery died and unfortunately, I could not take proper notes. However, I recall that he show nice results on the methylation state of the regions where TE insert and the preference of TE families for particular genomic territories, such as LINE elements for introns for instance. I asked him about how to efficiently annotate TEs in genomes and he referred me to the work of Wicker (https://www.nature.com/articles/nrg2165-c2).

 

19 de octubre de 2018

"Modern Statistics for Modern Biology" (libro)

Hola,
esta es mi primera entrada escrita desde el EMBL-EBI y en ella solamente quiero compartir un libro de libre acceso que se llama Modern Statistics for Modern Biology, escrito por Susan Holmes y Wolfgang Huber, que se puede visitar en https://www.huber.embl.de/msmb


Tiene una prosa sencilla y describe aproximaciones para enfrentarse a los problemas reales de la biología en general, incluyendo los que de manera habitual describimos en este blog. Además de explicar los fundamentos, el texto tiene muchos ejemplos y soluciones completas en lenguaje R. De hecho se puede descargar en http://web.stanford.edu/class/bios221/book/Rfiles el código fuente de todos los capítulos.

Un saludo,
Bruno

25 de septiembre de 2018

SequenceServer: nice local blast

Hi,
today I wanted to let you know about a tool that we discovered recently that has been very useful for us. Its name is SequenceServer (http://www.sequenceserver.com). It is simply a wrapper to let your collaborators run NCBI BLAST searches on your local sequence databases.
All you need is a copy of the NCBI folder with some BLAST+ release and a Linux distribution. Here we had ncbi-blast-2.7.1+ already in place but had to install the application, which is a ruby application, as recommended by the authors:

$ sudo gem install sequenceserver

Due to dependencies during installation I could not manage to install it in Centos5, but instead it was easy in CentOS release 7.5 (sudo yum install ruby-devel). Once this is done, and the appropriate port is open in the host, all that remains is to let the application know where the sequence databases are. You can do that with these commands:

$ sequenceserver -d /path/to/dbs  # add new databases
$ sequenceserver -l               # list installed dbs
$ sequenceserver &                # launch web application 
 

Now you are ready to go. Your users need to type the URL:port of your host in their browser and they can now run their searches. This is the way it looks in our server:


Cheers, Bruno

PS I will be moving to the EMBL-EBI so there might be a break in this blog, but please keep in touch