This last session is shared with the next meeting "Genomics-assisted breeding for boosting crop and livestock improvement".
Wednesday 10 may
https://twitter.com/CRAGENOMICA/status/1656225214785568768 |
Cristobal Uauy ‘Crop breeding as a DNA-assembly problem‘
Pictures breeding as creating mosaics genomes from genotypes that bring in traits of interest. Refers to the work of Brinton et al (https://www.nature.com/articles/s42003-020-01413-2) to build haplotypes. SNP chips fail to tell apart pangenomes entries as they target mostly nearly identical regions; instead haplotypes reconstruct the history of Watkins landraces accurately. They also found that landraces can improve the yield of elite cultivars such as Paragon, although with awful plant architectures.
They are also reconstructing haplotypes from raw reads in a scalable way. The goal is to find regions identical by state (IBS). Their window-based K-mer approach is https://github.com/Uauy-Lab/IBSpy; they use read of the reference genome as negative control. In their WGA benchmarks they find that 1SNP in 5Kbp ~ identical by state. 30% of their A-genome data have a divergence that you would expect for wild wheat. In fact they confirmed that T. monococcum had 1% IBS regions with hexaploidy wheat, which are evidence of introgressions. They have also found introgressions of T. timopheevii. Work at U Nottingham suggests any region of the hexploid genome is susceptible to introgression. The method works with barley and maize inbred lines.
They use affinity propagation clustering to simplify haplotypes. They obtain robust haplotypes by comparing K-mers to all pangenome assemblies. If several genotypes have the same pattern against the same reference, they probably are IBS in that region.
Up to 55% of modern cultivars can be traced back to <10 Watkins landraces. In the UK they realized the need to go back crossing to landraces to improve yield.
Nils Stein ‘The barley pan-genome: large structural genome variation is not rare’
Starts by explaining how genomes of barley have powered applications from SNP calling up to genomic selection. He then explains how 22K barleys were analyzed by GBS and intersected to core collections to pick n=20 genotypes, including one wild barley. He shows alternative ways to depict and summarize pangenomes, mentioning that graphs seem to work well with human genomes, but current tools don’t work well with barley (ie pggb). They had to mask geneless regions and build single-copy gene clusters.
They are now working towards a second version of the pangenome adding 23 wild barley and reaching a total of 76 genotypes, not yet T2T assemblies (talking with Sanger for that). This set seem to plateau for domesticated barleys, not yet for wild ones.
Bruno Contreras-Moreira ‘Upgrading the gene annotation at the population level reveals the diverse pan-gene set of Asian rice’
My talk, a related preprint can be found at: https://www.biorxiv.org/content/10.1101/2023.01.03.520531v1
Code, documentation and examples at: https://github.com/Ensembl/plant-scripts
I paste here the abstract: Oryza sativa and Arabidopsis thaliana are the best sources of gene function information among plant genomes, probably for their role as model species. In rice, two independent efforts (RAP-DB and MSU - the latter no longer updated) curated the Nipponbare genes that researchers refer to in their papers. However, these two gene sets are different, and more importantly, do not include gene models found in other rice cultivars. To address these limitations, our consortium produced high quality genome assemblies for 15 cultivars representative of all cultivated Asian rice, plus the reference genome (IRGSP), which were then annotated using the same protocol and gene expression data. The goal was to produce a consistent catalog of gene models that could be used by rice breeders around the world and distributed by key resources such as Ensembl Plants, Gramene and UniProt. A software prototype was developed to find collinear genes annotated in the rice set, producing a total of 84,530 gene clusters. Of these, 26,357 were found in 15+ rices (soft-core). We then confirmed that the soft-core set i) contains 94.6% of all the BUSCO protein domains of order Poales, ii) 89% of genes of agronomic value curated by RAP-DB and iii) has the largest support from protein mass spectrometry experiments. Further inspection of collinear genes revealed a large degree of diversity in terms of gene boundaries and a significant number of missing genes and potential loss of function alleles. This work highlights the challenges of defining a consensus gene annotation for a crop when different cultivars and populations are considered.
Dani Zamir ‘Epistasis time’
Epistasis is the surprise you get when you combine two genes and get something you did not expect, not additive. Based on his recent tomato work with two interacting QTLs in different tomato chromosomes, where they found significant heterosis for 10% of normal water input (https://www.pnas.org/doi/abs/10.1073/pnas.2205787119) he pushed us to move to pairs of markers, triplets of markers instead of one at a time.
Danelle Seymour ‘Integrating scales to traverse the genotype-phenotype divide in citrus’
[Scion = injerto] She does a pangenome survey of NLR gene models across Citrus species in the context of the HLB insect-transmited bacterial disease. Then she moves to explain their field trials in FL, US, along 3 years. They also do automatic phenotyping of several traits by taking 30 images per fruit.