Jump to day 2 or day 3.
Unlocking the polyploidy potential of wheat
through genomics (Cristobal Uauy, John Innes Centre, UK)
Phenotypes
of agricultural importance are complex, with continuous gradients instead of
discrete states, and differences being often difficult to tell from noise. This
only gets worse with polyploids, where QTL effects are subtler than in diplids.
He explains that Arabidopsis thaliana
is as far from wheat as platypus to human. He then talks about the rich wheat
genomic resources which are bringing people to work in this species perhaps for
the first time. All these resources are documented at http://www.wheat-training.com.
He then
moves to describe a particular example of combining these tools, using https://bioconductor.org/packages/release/bioc/html/GENIE3.html
to predict target genes of wheat TFs using around 900 RNAseq experiments. They
see no evidence of TFs preferring targets from same (A,B,D) subgenome, even
when the D subgenome has eveolved independently for thousands of years with
respect to A & B. He’s asked
how difficult it is to map & genotype specific subgenomes, he says with
Polymarker you can enrich on subgenome-specific regions. They use https://pachterlab.github.io/kallisto to assemble transcripts and have validated in
chr-deletion lines that the transcript from those missing chromosomes are not
expressed.
ENSEMBL plants – Visualizing the Wheat Genome
in Ensembl Plants (Guy Naamati, EMBL-EBI, UK)
He starts
with a quick tour of http://plants.ensembl.org/Triticum_aestivum. He then explains the gene trees
for wheat produced with https://www.ensembl.org/info/genome/compara . Then he moves on to the wheat
variant collections and the TILLING mutants and the KASP markers. Finally he
also mentions our preparative work with a dozen wheat cultivar assemblies from http://www.10wheatgenomes.com . He concludes showing off the Ensembl
Outreach team that do Ensembl training around the world. He is asked
why gene models keep changing across releases and whether it is possible to know the most
abundant isoform (canonical?) . He’s also asked how the 10+ varieties are going to be loaded. Another
question is how low sequence identity orthologues are managed.
Expression atlas - Submission, archival and
visualisation of plant sequencing data (Nancy George, EMBL-EBI, UK)
She guides
us through a submission from start to end: i) annotate metadata with https://www.ebi.ac.uk/fg/annotare, ii) import expression data with https://www.ebi.ac.uk/arrayexpress and https://www.ebi.ac.uk/gxa (min 3 replicates, biological
question, reference in Ensembl). These steps eventually result in baseline
expression reports such as http://plants.ensembl.org/Triticum_aestivum/Gene/ExpressionAtlas?g=TraesCS3D02G273600;r=3D:379535906-379539827
She then
moves on to say how the plant community are still not embracing the single-cell
sequencing efforts due to technical challenges.
Benchmarking and development of an ensemble
motif mapping approach to improve gene regulatory network inference (Marc
Jones, VIB / Ghent University, Belgium)
He
introduces TF binding motifs and how they can be used to scan genome sequences
to predict genomic sites. They compared different motif aligners, including MOODS,
cluster-buster, FIMO and matrix-scan. The observe that FIMO sites tend to match
more often with those from other tools. They then compared site predictions to
ChIPseq read depth, in order to compute precision and recall. FIMO comes best in
terms of precision and worst on recall. When they look at the first 7000 sites,
their 4 tested aligners behave similarly. Eventually they combined FIMO and
cluster-buster, as they report many sites missed by the others. The full set of
results is described at http://www.plantphysiol.org/content/181/2/412
No genome required: Finding genetic variants
associated with plant phenotypes without complete genome information (Yoav
Voichek, Max Planck Institute for Developmental Biology, Germany)
He talks
about doing GWAS analysis with K-mer distributions instead of mapping to a
reference genome. They start with a PAV table of 31-mers across genotypes. That
table can be used to characterize a pan-genome after removing low depth kmers,
as they did with 1000 A. thaliana
genome sequence sets. From that they have developed a GWAS pipeline for k-mers
which accounts for population structure. They assign genomic context to k-mers
by i) mapping to ref genome, ii) LD and iii) assemblying reads containing the
k-mers and then mapping. The code will be released soon in https://github.com/voichek/kmers-gwas
She talks about the time dimension in regulatory networks with the diagram on the left from https://europepmc.org/articles/PMC4558309. She proposes we should be handling TFs binding to DNA just like enzymes, with enzymatic kinetics. She tells 3 stories on A. thaliana.
The Just-in-TIME
approach allowed to study genes expressed in response to N a as function of
time with enriched cis elements and GO terms that you would have missed if
analyzed in bulk https://www.pnas.org/content/115/25/6494.short. They apply ML to identify the TFs
binding to those cis elements using time series gene expression and they
validated the predictions with 7 TF perturbations, that affect 2K targets.
Hit-and-Run is another approach to study
transient TF binding controlled by adding dexamethasone (developed by José
Álvarez et al, soon in Nat Comms). She stresses that binding is a poor predictor
of regulation, as most binding does not affect expression, and instead in many
cases they can’t catch ChIPseq binding events that they know to happen. She
also shows results of TFs binding to the 3’UTR. In order to catch those
transient-binding TFs they used a new protocol called DamID. It turns out that
most transient events are very early in the N response, while the stable
binders tend to be late responding. She does not know whether transient sites
are bound with less affinity, but she notes they do are enriched in neighbor
sites from other TFs.
Finally,
they performed network walking to connect transient TFs to their targets
in A. thaliana, which they published
at https://www.nature.com/articles/s41467-019-09522-1. It is called net walking because
they walk from primary TFs, then to secondary regulated TFs and finally to
indirect targets. They are now developing a method called OutPredict to
introduce priors in their network inference.
Genetic and genomic studies of climate
adaptation and genotype-by-environment interaction in switchgrass (Panicum
virgatum, Tom Juenger, University of Texas at Austin, USA)
Talks about
the evolutionary genetics of plant adaptation citing https://www.ncbi.nlm.nih.gov/pubmed/21550682 . His system is the C4, perennial,
polyploid, wind-pollinated P. virgatum,
related to http://plants.ensembl.org/Panicum_hallii_fil2.
They have
resequenced 950 individuals 45x to map against a V5 PacBio assembly, yielding
46M SNPs. They belong to 4 populations. Their experiment sites span 24.3
degrees of latitude across 16 locations. They have published several articles,
such as https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100855 . They have been able to assign %
of genetic variance to climate (such as mean temp of driest quarter) and
geography and find SNPs associated to them. They conclude climate has been a
stronger driver of adaptation than genetic isolation, and they observe widespread
QTL x E interactions for local adaptation.