20 de marzo de 2018

notes on EUCARPIA Cereals meeting 2018 (II)


Tuesday 20th March 2018

John FOULKES (U Nottingham, UK & CIMMYT collaborators)
Talks about genetic diversity and resource efficiency in wheat. Genetic gains in yield potential over the last decades has increased 0.5-1% per yr, now it is slowing down. Biomass acounts for a large part of that potential, while harvest index (HI, http://plantsinaction.science.uq.edu.au/content/641-harvest-index) is following an inverse trend. Therefore, we are not taking the full benefit of the genetic gains. Data from field assays indicates that the 2nd internode that is competing with spike growth. He reckons that HI values of 0.6 are likely.
Then he moves on to N fertilization and the WISP project (http://www.wheatisp.org/Consortium/WISP.php), and shows some results about diversity on N uptake and also about photosynthetic efficiency published last year by Gaju et al,  https://www.sciencedirect.com/science/article/pii/S0378429016301022.
Finally, he summarizes experiments on root phenomics (shovelomics, extracting top 20 cm of crown root, see http://plantscience.psu.edu/research/labs/roots/methods/field/shovelomics) with the aim of redesigning root architecture. Using a Rialto x Savanah population, they discovered a great deal of variation in the root traits they were tracking. They also found correlation between root angle and root length under rain-fed conditions. They plan to set up a TILLING experiment to validate candidate genes. He shows nice images of root cuts of irrigated and non-irrigated plants that they’re using to train machine learning algorithms for image analysis.

 
Bruno CONTRERAS-MOREIRA (EEAD-CSIC, Spain)
That’s me. I talked about a series of experiments with pooled barleys designed with the aim of testing the agronomic advantage of the presence of flowering control gene PpdH2 in winter barleys proven through a natural selection approach. I took the chance to present the PpdH2 gene as ana accessory gene in the pan-genome context, using the terminology we have used in our papers (barley and A.thaliana & Brachypodium distachyon). I got questions by Frank Ordon and Simon Griffith: i) the number of seeds used to build the pools, and ii) could plant competition explain the results.

Gaëtan TOUZY (Arvalis, France)
He talks about his project on “Improving Water Use Efficiency in Bread Wheat by Multi-trait multi-Environment Genome-Wide Association Studies”.

Eric OBER (NIAB, UK)
The title of his talk is “Implementing large-scale field phenotyping in genomic selection to accelerate wheat breeding” which reports results of project GplusE (http://gtr.rcuk.ac.uk/projects?ref=BB%2FL022141%2F1). They do visual scoring plus drone and manned flight hyperspectral camera shots. They build Bayesian networks with a few traits and yield. They use data from a few years to predict phenotype (yield) with some success, but with different accuracies among sites and years. Average prediction is safer than predicting the best/worst performer genotypes. Best traits are hyperspectral data, development trains and late-measurements data.

Kerstin NEUMANN (IPK, Germany)
She talks about barley phenotyping at IPK, particularly about using high-throughput image analysis to study stress-adaptive and constitutive biomass QTLs in cereals. She shows data and results of spring barleys, published at https://www.ncbi.nlm.nih.gov/labs/articles/28797222, and winter wheats.

Ulrike LOHWASSER (IPK, Germany)
Her talk is about “Searching for Frost Tolerance in Wheat – A genome wide association study”. Frost tolerance is a complex trait, which involves winter survival, desiccation, anoxia, ice-encasement and even disease resistance. Heritability for frost tolerance is low in most locations when field trails are carried out in cold winters, but it is high in controlled conditions.

Heribert HIRT (KAUST, Saudia Arabia)
He talks about beneficial microbial endophytes to enhance abiotic stress tolerance and yield. He works mostly with Arabidopsis and is interested on plants living in deserts as part of project DARWIN 21 (http://www.darwin21.net). The do trials of Arabidopsis, but also wheat, barley, alfalfa and confirm beneficial effects under stress but not in normal conditions. In A. thaliana they actually observe that inoculation changes the stress response of the plant. They also did experiments to mimic microbial inoculation by adding external chemicals in A. thaliana. He shows data for one of their endophytes, Enterobacter spp. SA187, which was found in both monocot and dicot plants. They have no evidence of crop-specific strains, because they looked for generalists.

Ewen MULLINS (Teagasc, Ireland)
He talks about the impact (-30% in the last couple of years) of Septoria tritici blotch (STB) disease in wheat. They are carrying out intensive field phenotyping to support breeding of resistant lines. They did fungicide-free trials in Ireland and the UK, visually scoring plants, and concluded that different wheat genotypes have different latency periods. However, eventually disease progresses in all of them (https://onlinelibrary.wiley.com/doi/abs/10.1111/ppa.12780), so it would seem that a reasonable breeding target might be further extending the latency period.


Yvan MOËNNE-LOCCOZ (U Lyon, France)
Talks about interactions of plant-beneficial rhizosphere bacteria in cereals. Can breeding benefit from microbiome-based approaches? Does the plant genotype matter? Have modern cultivars lost their microbial partners? They have performed 16S rRNA scans of rhizosphere below different crops and wild plant species such as teosinte. They have data that suggest that some modern lines conserve the ability to interact with inoculated bacteria; others do not, perhaps for being counter-selected. They have used Pseudomonas kilonensis F113 to test root colonization in a panel of wheat cultivars and see that modern cultivars are relatively less colonized than old or landraces. He also mentions Azospirillum brasilense Sp245, which stimulates root growth by producing hormone IAA. He has a number of publications on these topics listed at https://scholar.google.fr/citations?hl=fr&user=rF48UsAAAAAJ.


Laetitia WILLOCQUET (INRA, France)
She talks about phenotyping methods for quantitative host plant resistance using simulation modelling and ROC curves. She reports results published at https://www.sciencedirect.com/science/article/pii/S1360138517300237. Phenotyping is now the bottleneck for breeding resistance. Nonetheless, data was produced to feed models of infection and resistance, estimate parameters and make predictions. Details on these simulation models can be found at this document published in 2014: https://goo.gl/VQpqK7


Hermann BUERSTMAYR (BOKU, Austria)
Genomics assisted improvement of Fusarium head blight resistance in bread wheat, durum wheat and triticale


Javier SANCHEZ-MARTIN (U Zürich, Switzerland)
His talk is about performing GWAS to reveal race-specific resistance genes to powdery mildew in wheats from the WHEALBI project (http://www.whealbi.eu). As in related talks, he discusses how exome capture platforms present mapping problems when aligning genes absent from the reference genome. He has published part of these results in 2016:  https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1082-1



Pierre-Antoine PRECIGOUT (U Paris-Saclay, France)
Pierre presents numerical, epidemiological models of foliar fungal pathogens in wheat, which have been described in detailed at https://www.ncbi.nlm.nih.gov/pubmed/28453406 . These model the latent period to predict potential evolutionary directions.

19 de marzo de 2018

notes on EUCARPIA Cereals meeting 2018 (I)


Monday, 19th March 2018 (program at https://symposium.inra.fr/eucarpia-cereal2018)

Intro: Gilles Charmet (INRA-UCA), remembers Patrick Schweizer
Intro Eucarpia: Andreas Borner (IPK), EUCARPIA Cereals section conference

Raphäel Dumoin (Bayer Crop Science) Wheat Innovation Strategy at Bayer
There is a need to breed wheat for both high and low productivity areas around the world. In each area, there is a gap between current productivity and potential yield. They expect that the wheat seed market will be soon as large as corn’s [due to correlation between acreage and seed value for corn, soybean, cotton, canola]. BCS now has breeding stations in North America, EU and Australia and they are developing pure lines and hybrids, as well as looking for yield improving traits. The elements that explain higher yield in wheat are yield stability and abioitic stress tolerance, while maintaining quality. They use heterotic pools for breeding hybrids. They work with both spring and winter wheats and take 7yr to develop a new variety with marker-assisted breeding. They also work with targeted genome optimization/Cas9 edition, which can be done in 1-2yr but faces regulatory hurdles in EU. They actively engage in collaborations with public R&D organizations and private companies around the world.

Andreas Graner (IPK) Ex situ germoplasm collections
There is a increased demand of crops and a need for sustainability (Steffen Science 2015). The quest for innovation in plant breeding needs the interface between genomics, metabolomics and phenomics. The breeding methodology is now genomic selection, that increases explained variability by adding minor QTLs. Both doubled haploids and transformation/Cas9 are key enabling technologies. He emphasizes the importance of surveying and exploiting the available genetic resources. He mentions that currently the German federal ex-situ genebank contains 27K wheat and 23K Hordeum accessions, with seed multiplication done on average every 20-30 yr (https://www.nature.com/articles/srep05231). These experiments have allowed estimating heritabilities of 0.89-0.95 and are now the ground for GWAS analyses with very large populations after careful curation of data. At IPK they are taking advantage of a large phenomics facility put together recently to quantitatively characterize traits such as lipid content at large scale (see paper 2014 on Avena lipids).  They have sequenced with GBS 23K barleys, observing that genetic diversity mimics geographic origin. He mentions data management FAIR principles and APIs. They have recently released the BRIDGE barley IPK DB (https://t.co/fLPAkkF7nY). He argues that the Nagoya Protocol on Access and Benefit Sharing (https://www.cbd.int/abs) is against Open Access, as it will restrict, for instance, dissemination of phenotypic data from collections.

Davide Guerra (CREA, Italy)
Presents the WHEALBI collection with 512 barley accessions from 73 countries, including both cultivars and landraces. These were exon-captured and sequenced to yield 403 validates sampled with 64M called variants, which they used to allocate barleys to 6 geography-based subpopulations. A series of common garden experiments were carried out in several latitudes and irrigations regimes. He shows preliminary results on multi-environment GWAS experiments and discusses a few confirmed candidate genes they have found, including VRNH1, PpdH1 or HvCEN. He then goes into some depth to show his results on Copy Number Variation (CNV) at the CBF locus, the frost tolerance experiments carried out to characterize the alleles discovered and the PCR experiments ahead to survey that particular genomic locus.

Ernesto Igartua (EEAD-CSIC, Spain)
Presents the Spanish Barley Core Collection (SBCC, http://www.eead.csic.es/barley/index.php) and explains that Spanish landraces comprise actually 4 subpopulations. These SBCC barleys have been used in the CLIMBAR FACCEJGI project to analyze their association to agro-climatic variables. He presents first the genetic differentiation of the 4 subpopulations (XtX, diversity). Then a table is shown with linkage disequilibrium. First, it is found that cold tolerance and water balance are the main variables explaining the genetic diversity. Second, GWAS experiments with both Bayenv2 and LFMM confirm the CBF locus (+ control) and unveil a candidate amino-oxydase associated to cold/heat responses.

Marco Maccaferri (U Bologna, Italy); Luigi CATTIVELLI (CREA, Italy)
Genome assembly of durum wheat cv Svevo (http://www.tasaco.com/Seed.aspx?cesit=44) and then a tetraploid diversity panel of 1.9K lines. Estimates average LD < 0.2 with dist(SNPs) between 400Kb and 1.9Mb depending on the population.
Luigi talks more about the genome project (https://www.interomics.eu/durum-wheat-genome), assembled with NRGene software. 90% of the genome in 2K scaffolds. 95% scaffolds are mapped and anchored. The same protocol was used by other team to sequence wild emmer cv Zavitan, parent of wheat tetraploids, which was already sequenced (http://science.sciencemag.org/content/357/6346/93) and suggests that there is a lot of CNV, concentrated at the end of chromosome arms. In addition, they found 600 loss-of-function genes in durum compared to Zavitan, due to gained stop codons or frame shifts due to indels%3 > 0. These must have occurred in less than 10K yr.

Helmy M YOUSSEF (IPK, Germany)
Talks about natural diversity of inflorescence in Hordeum vulgare, reporting results published in https://www.nature.com/articles/ng.3717. He explains what two, six-rowed barleys are and describes labile and intermedium spikes as well. They discover and describe gene Vrs2, which affect spike architecture.

Constance LAVERGNE (U Nottingham, UK)
Talks about introducing/introgressing of Aegilops sharonensis cytoplasm into common wheat and production of addition/translocation lines which are often male-sterile. She shows seed pictures of different generations, as well as GISH preparations of introgressed and translocation lines.


 
Scott Allen JACKSON (U Georgia, USA)
Talks about legume genomes (10 references available currently). While annual soybeans are Chinese, there are a few perennials in Australia. Phaseolus is more ancestral and is used to root trees. Breeding is just a series of bottlenecks, and domestication is likely the most important one. However, improvement requires genetic variation. Discusses that reference genomes, while allowing many types of diversity studies, have limitations, as they are just genomic snapshots. He argues that pan-genomes are better tools and he shows the wild Glycine pan-genome, reported at https://www.nature.com/articles/nbt.2979. He mentions that having it allowed to test for genes under selection in G. max, and they found just under seven hundred.
He then talks about transposable elements (TEs) and their role in genome evolution as sources of novel diversity. TEs live for about 2Myr in a typical plant (half-life). There are no subgenomes dominance effect in soybean, and there is large PAV. He talks also about DNA methylation (CG, CHG, CHH, 3 different plant methylases) and how it changes TEs (he cites https://www.nature.com/articles/nrg.2016.139). He says methylation is the preferred mechanism to silence inserted TEs in plant genomes, and how differentially methylated regions (DMRs) in a pan-genome occur, usually because TE move. Most DMRs are inherited stably and behave like SNPs. He also cites a recent paper showing that post-duplication methylation diminishes are evolutionary time passes (https://onlinelibrary.wiley.com/doi/abs/10.1111/pce.13127). Non-syntenic genes tend to be C-methylated. His last statement is that a third of pan-genome genes are in low recombinogenic regions, including TE non-colinear genes.


Caroline JUERY (INRA GDEC, France)
Explains histone marks of euchromatin and heterochromatin and then explains she wants to check whether the wheat epigenome is partitioned according to H4K27me3, H3K36me3, H4K9ac, H3K4me3 marks (or lack of) ascertained by ChIP-seq. She concludes there are clearly epigenetic territories and then looks to triads of homeologous genes to measure the effect of epigenome marks (upstream, ATG, stop, downstream, as in figure 3 of http://www.plantcell.org/content/21/4/1053) on gene expression, not protein expression yet.


Cécile MONAT (IPK, Germany)
She starts by defining the basics of pan-genomes and presents the http://www.10wheatgenomes.com project, which is starting to produce reference-quality assemblies of 10 wheat cultivars combining NRGene assemblies, linked 10x reads (https://community.10xgenomics.com/t5/10x-Blog/A-basic-introduction-to-linked-reads/ba-p/95), POPSEQ and Hi-C data. Cécile has a preprint describing the pan-genome of two African rice species at https://www.biorxiv.org/content/early/2018/01/09/245431.


Maria BUERSTMAYR (BOKU, Austria)
Talks about high-resolution mapping of the pericentromeric region on wheat chromosome arm 5AS harboring the Fusarium head blight resistance QTL Qfhs.ifa-5A. Used gamma-radiation to promote double-breaks in DNA and overcome recombination limitations in the centromere, even with large populations, by building a radiation hybrid map with markers in cR units.


Romain DE OLIVEIRA (INRA GDEC, France)
He defines CNV and then Presence Absence Variation (PAV). He explains his reference-mapping pipeline to identify TE-element-related CNV in wheat. He shows that wheat accessions can be clustered in terms of PAV of TEs.  At least 15% of genes are PAV variable among accessions.

9 de marzo de 2018

growth of protein-DNA complexes in the Protein Data Bank

Hi,
while checking the update logs of our good old 3D-footprint, a database of DNA-binding protein structures updated weekly from the Protein Data Bank, I found a folder with logs starting Februrary, 2009. The plot below shows how the number of non-redundant complexes, filtered in terms of protein sequence identity, has doubled in just a decade:

The nr95 bundle can be downloaded in PDB format at
http://maya.ccg.unam.mx/tfmodeller/get_library.cgi

Other related files are available at:
http://floresta.eead.csic.es/3dfootprint/download.html

cheers,
Bruno

1 de marzo de 2018

sustituyendo el operador smartmatch en Perl5

Hola,
tras el anuncio reciente de que la versión 5.28 de Perl5 eliminaría el operador smartmatch ~~ (ver aquí) me he encontrado un programa viejito dónde se usaba, a pesar de que ha sido experimental desde hace mucho tiempo. Con ayuda de

$ perldoc perlop

cuelgo aquí un ejemplo de cómo sustituir este operador por código estándar:

use strict;
use warnings;

my @array = qw( JASPAR footprintDB UNIPROBE );
my %hash  = ( JASPAR => 1, footprintDB => 2, UNIPROBE => 3 );

my $element = 'footprintDB';

# array context
if ($element ~~ @array){
  print "\@array contains element '$element' (smartmatch)\n";
}

if (grep { $element eq $_ } @array){
  print "\@array contains element '$element' (core Perl5)\n";
}

# hash context
if(/$element/ ~~ %hash){
  print "\%hash contains a key matching regex /$element/ (smartmatch)\n";
}

if(grep { /$element/ } keys(%hash)){
  print "\%hash contains a key matching regex /$element/ (core Perl5)\n";
}

Un saludo,
Bruno

8 de febrero de 2018

Modelling transcription factor complexes in the terminal

Hi,
I just updated our good old server TFmodeller, available at http://www.ccg.unam.mx/tfmodeller,
so that it uses the current collection of 95% non-redundant protein-DNA complexes extracted from the Protein Data Bank. As of Feb 7, 2018, there are 977 such complexes, which can be downloaded.
In addition, I just wrote a Perl client so that predictions can be ordered from the terminal via a SOAP interface, producing XML output which should be easy to parse. The PDB format coordinates of the resulting model are marked-up with tags. The input is a peptide FASTA file. This is the code:

#!/usr/bin/perl -w
use strict;
use SOAP::Lite;

my $URL = 'http://maya.ccg.unam.mx:8080/axis';
my $WSDL = "$URL/TFmodellerService.jws?WSDL";

my $infile = $ARGV[0] || die "# usage: $0 \n";
my ($inFASTA,$result);
open(FASTA,'<',$infile) ||die "#cannot read $infile\n";
$/ = undef;
$inFASTA = ; # slurp
close(FASTA);

my $soap = SOAP::Lite->uri($URL)
                     ->proxy($URL, timeout => 300 )
                     ->service($WSDL);

eval { $result = $soap->TFmodeller($inFASTA) };
if($@){ die $@ }
else{ print $result }

The original Java client can still be found here. Note that the output includes a sequence alignment of query and template with residues contacting DNA nitrogen bases highlighted:

HEADER model 1zrf_A 203 DNACOMPLEX resol=2.10 21 8e-46
REMARK query    MILLLSKKNAEERLAAFIYNLSRRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLG
REMARK template KVGNLAFLDVTGRIAQTLLNLAKQ-PDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILK
REMARK contacts ........................ ................*........***...*...

Bruno