#!/perl/bioinfo: pangenome

Mostrando entradas con la etiqueta pangenome. Mostrar todas las entradas

7 de febrero de 2024

Browsing barley pangenes

Hi,

late last year we published a paper describing GET_PANGENES, a protocol to call pangenes, which are clusters of gene models/alleles found in genomic assemblies in a similar location. You can read all about it at https://doi.org/10.1186/s13059-023-03071-z . Using this approach you can produce figures like this, where you can see the pangene of interest in green:

Genomic context of barly pangene cluster HORVU.MOREX.r3.3HG0311160 (green arrows), which corresponds to barley locus HvOS2. Figure from https://doi.org/10.1186/s13059-023-03071-z

As we do research on barley breeding and adaptation, we thought it would be useful for us and others out there to have way of inspecting barley pangenes, for instance to check whether a gene of interest is conserved or polymorphic across the barleys sampled by in the pangenome (n=20) put together by Jayakodi et al.

This exactly what you can do, at the protein sequence level, at https://eead-csic-compbio.github.io/barley_pangenes , where you can scroll pangenes along chromosomes, with MorexV3 positions; genes not found in MorexV3 lack a position therefore and are shown with a hash (#):

You will notice that pangenes with occupancy > 1, ie containing gene models found it at least two barleys, can be clicked to display a multiple protein alignment with help from the NCBI msaviewer:

There you can easily zoom in to regions of interest and print or export the alignment in FASTA, PDF or SVG format (high quality).

Hope this can be useful to the barley genomics community,

Bruno

25 de octubre de 2018

Plant Genomes in a Changing Environment (I)

Hi,
the first meeting on "Plant Genomes in a Changing Environment" kicked off today at the Wellcome Genome Campus in Hinxton, UK. It is exciting to be here and find out this is probaby the first ever plant genome meeting in an otherwise world-famous genomics venue.

I will post here my notes on the talks I attend to.

Caroline Dean, John Innes Centre, UK

She presents the different flowering habits of Arabidopsis thaliana accessions (rapid cycling, winter facultative & obligate winter-annual) and takes us to the current knowledge of the quantitative nature of winter recording in the FLC locus, a MADS repressor of flowering which is the target of a polycomb-mediated epigenetic switch. In addition, she summarizes the mutually exclusive non-coding FLC transcripts found to be cold induced, such as COOLAIR (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234544, https://www.nature.com/articles/ncomms13031). After flowering, the epigenome state of FLC is restored by a demethylase. COOLAIR is actually a Brassicaceae-conserved secondary structure RNA molecule substantially affected with a single SNP affecting splicing. She says that this ncRNA folds and stays in place, blocking physical access to that locus. She adds this mechanism is conserved in humans and Brassicaceae, and would expect the same in monocots.

By the way, COOLAIR non-coding transcripts seem to be annotated in Ensembl Plants: https://plants.ensembl.org/Arabidopsis_thaliana/Gene/Summary?g=AT5G10140;r=5:3173382-3179448;t=AT5G10140.2;db=core

FLC locus accumulates H3K27me3 histones with exposure to cold, setting up a bistable state of inducing/repressing chromatine modifications. This balance spreads across tissues and cell populations, including the root tip. This memory is sustained by the own chromatin in cis (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450441).

She then presents the RY cis elements in intron 1 of the FLC locus which is repressed by VAL1 (https://www.ncbi.nlm.nih.gov/pubmed/27471304) to trigger polycomb nucleation (http://floresta.eead.csic.es/footprintdb/index.php?tf=ea4a1835a3360403cd07b75528829572).

When they looked at 80 world-wide populations they found distinct FLC haplotypes, which compared to each other in a common background explain a linear vernalization requirement.

She claims that in A. thaliana vernal days are actually afternoons with temperatures < 15 °C (https://www.nature.com/articles/s41467-018-03065-7).

Doreen Ware, USDA and Cold Spring Harbor, USA

She talks about a maize pangenome browser currently under development. She explains that growers require a platform that would allow easy knowledge transfer from some plants to others, so that it can be used in breeding. She talks about CNV genes with agronomical impact, such as transporters providing Al tolerance (http://www.pnas.org/content/110/13/5241). She shows GRAMENE neighborhood conservation display modes based on Ensembl Compara data:

Then she describes their current efforts PacBio-assembling 26 maize NAM parents, with SMRTlink assembly performed in the cloud (DNAnexus) and sped up 360x. The resulting assemblies are robust, with N50 > 34Mb.

She terminates with a quick overview of transcriptome profiling for heterosis-inspired work, with the aim of phasing isoforms, which is important for reconstructing heterozygous loci (https://www.nature.com/articles/ncomms11708).

Eric Schranz, Wageningen University, The Netherlands

Talks about conservation and divergence in relative gene order of plant and animal genomes using network-based synteny analysis. He explains genome territories and why gene context matters with multiple examples of Hox genes and body layout plans. He claims that we have a genomic hairball problem when looking at synteny, and that networks with edges~synteny can simplify the problem, allowing PAV and homeologues to be integrated easily (https://www.sciencedirect.com/science/article/pii/S1369526616302230).

He also explains phylogenetic profiling and how they used to find MADS box genes which are syntenic in all angiosperms but not in particular groups such as crucifers or monocots (http://www.plantcell.org/content/early/2017/06/05/tpc.17.00312).

He also explains that they´re doing a mammal vs plant synteny analysis. Overall, mammal genomes are syntenic, while plant genomes are not. This work is under review at PNAS. They do find family specific conserved syntenic blocks and a few, photosynthesis & clock-related, angiosperm-conserved genes.

John Vogel, University of California, Berkeley, USA

John talks about the pan-genome of Brachypodium distachyon and its implications for polyploid genome evolution. He describes the main findings of the Gordon et al paper (https://www.nature.com/articles/s41467-017-02292-8). He mentions that there is currently no way of displaying the pangenome efficiently in phytozome, and he looks forward to the new developments of Gramene.

He then introduces B. stacei and the resulting B. hybridum. He shows the high synteny between B. hybridum subgenomes and the diploid parental species, as well as the SNP-based tree suggesting at least two hybridization events. Then he shows k-mer plots suggesting that D-citotype B. hybridum (older) lines contain unique k-mer composition.

He then moves to the analysis of foundation effects in the hybrids, but shows that the hybridum + parental pangenome is not significantly different to the individual parental pangenomes. Finally, he shows dNdS plots to show that both subgenomes are still under selection.

M Morgante comments that this data is probably not compatible with a epigenetic shock post-hybridization.

Jae Young Choi, New York University, USA

Jae could not attendand was replaced by an unnamed researcher from the group. She starts by introducing that besides transposable elements (https://www.ncbi.nlm.nih.gov/pubmed/25917896), tandem repeats are important drivers and markers for plant diversity. The talk is actually about natural variation in telomere repeats, which essentially are a major plant satellite, and their correlation with flowering time. They work with 100-mers of Oryza species, which include telomeres. In fact they see that O. sativa indica has significantly larger telomeres than ssp. japonica, and that correlates negatively with days to flowering.

Gabriele Magris, University of Udine, Italy

Gabriele gave a very nice and comprehensive talk on the characterisation of the pan-genome of Vitis vinifera using NGS with a special focus on collinear genes that have gained or lost a neighbor transposable element (TE) affecting their expression. My battery died and unfortunately, I could not take proper notes. However, I recall that he show nice results on the methylation state of the regions where TE insert and the preference of TE families for particular genomic territories, such as LINE elements for introns for instance. I asked him about how to efficiently annotate TEs in genomes and he referred me to the work of Wicker (https://www.nature.com/articles/nrg2165-c2).

25 de septiembre de 2017

PhD in Brachypodium perennial species

We seek candidates for a PhD FPI contract associated to our project “Evolution of biological traits and speciation processes in the model genus Brachypodium (Poaceae) through comparative and functional genomic” (CGL2016-79790-P). The PhD thesis will investigate the origins and evolutionary changes of perenniality/annuality switches and the pangenomic diversity and phylogeography of model grass species of Brachypodium.

The work (2018-2021) will be carried out at the High Polytechnic School of Huesca (University of Zaragoza, Spain) with research stays at CSIC (with Bruno Contreras-Moreira @ EEAD and Pilar Hernández @ IAS) and international institutes and participation in CSP Joint Genome Institute projects. The PhD thesis will include field and greenhouse work, genomic and transcriptomic data generation and processing, and development of computational pipelines for genomics and phylogenomic analyses of perennial and annual species of Brachypodium.

The research team has a large experience in evolutionary genomics (www.bifi.es/bioflora), computational biology (www.eead.csic.es/compbio) and translational genomics (https://goo.gl/RSnfw3) studies of grasses.

Applicants should comply with the requirements to apply for a Spanish PhD contract (open to European Community and other countries citizens, see information at https://goo.gl/5Bp6YW). Experience in plant evolutionary biology, genomics and bioinformatics will be highly valued.

Interested applicants please contact Prof. Pilar Catalan (pcatalan@unizar.es) and send Curriculum Vitae and a brief motivation letter before October 3 2017.

9 de marzo de 2017

Tutorial: pan-genome analysis with GET_HOMOLOGUES

Hi,
a new tutorial on the analysis pan-genomes using GET_HOMOLOGUES and GET_HOMOLOGUES-EST is now available. After a short introduction, where the main concepts are illustrated, the remaining sections cover the installation and typical operations required to analyze and annotate genomes and transcriptomes from a pan-genome perspective, in which individuals or species contribute genetic material to a pool.

The examples include both bacterial sequences in GenBank format and plant transcripts. This tutorial has been created for a two-day workshop to be held at BIOS (Manizales, Colombia) next week, with title "From genomes to pangenomes: understanding variation among individuals and species":

The tutorial can be found at: http://digital.csic.es/handle/10261/146411

Code, sample datasets and documentation are available at:
https://github.com/eead-csic-compbio/get_homologues

Suggestions and error reports are welcome,
Bruno