Mostrando entradas con la etiqueta breeding. Mostrar todas las entradas
Mostrando entradas con la etiqueta breeding. Mostrar todas las entradas

4 de noviembre de 2020

Course on scripting with the Linux shell

 Hi,  Carlos Cantalapiedra and me recently put together teaching material about scripting in the linux terminal.

The material can be found at repository https://github.com/eead-csic-compbio/scripting_linux_shell 

There are five sessions and the goal is for you to learn the basics of the Linux shell and scripting for data sciences such as genomics and plant breeding:

session title required time URL
0 Setup prior to course session 0
1 Linux basics and files 2h session 1
2 Processes and scripts 2h session 2
3 Parsing with regular expressions 2h session 3
4 Perl one-liners 2h session 4
5 Advanced scripts 2h session 5 


Figure of the standard streams, taken from https://en.wikipedia.org/wiki/Standard_streams

 

If you spot errors please send pull requests, hope this helps some of you out there,

Bruno 

PD si prefieres aprender en español echa un vistazo a https://github.com/vinuesa/intro2linux


































26 de octubre de 2018

Plant Genomes in a Changing Environment (III)

Hi, this is my account of the first few talks from the last day of the meeting.


Claudia Köhler, Swedish University of Agricultural Sciences, Sweden
She talks about imprinted genes which are flanked by transposable elements (TE) in Arabidopsis thaliana. They find that RNApolIV mutants suppress triploid seed abortions. RNApolIV is know to be involved in RNA-guided methylation. They found that RNApolIV is behind the biogenesis of easiRNAs from TEs, and that correlates with decreased CHH methylation in the endosperm of triploid seeds (https://www.ncbi.nlm.nih.gov/pubmed/29335544). So they propose that pollen-derived easiRNAs are functional after fertilization and have a transgenerational role in assessing gamete compatibility, similar to animal piRNAs. The relevance of the results is that these mechanisms allow rapid evolution of hybridization barriers and ultimately speciation.

Isabel Bäurle, University of Potsdam, Germany
She talks about how Arabidopsis thaliana plants remember past stress events, particular heat, which is one of the most fluctuating stress sources in nature. She describes Heat Shock Factor 2 (HSFA2) and how it associates transiently to genes conferring heat memory. Target genes were observed to accumulate H3K4me3, making chromatin accessible for at least 5 days  (https://www.ncbi.nlm.nih.gov/pubmed/26657708, http://www.plantcell.org/content/early/2014/04/25/tpc.114.123851). Then she moves to describing BRU1/TSK/MGO3, which is orthologous to animal TSL, which has an epigenetic role during DNA replication and is also required for heat memory ensuring that chromatin marks are inherited during cell division (https://onlinelibrary.wiley.com/doi/abs/10.1111/pce.13365). Their long-term goal is to provide stress-memory to crops in the right moment so that yield is not too affected.

Manu Dubin, CNRS / Université de Lille, France
He explains he is back to academia from industry and that he is studying how both climate of origin and breeding efforts influence DNA methylation in barley (Hordeum vulgare) and how that is linked to adaptation, inspired in previous work on climate clines in A. thaliana. They used USDA barley core collection (inbred seeds from Mexico) with both landraces and cultivars from Europe and North America, but does not include any Iberian barleys nor North-African, which are known to contribute to the genetic diversity of the species (see for instance https://link.springer.com/article/10.1007/s11032-018-0816-z). They observe that winter barleys have slightly higher CG methylation than springs and show GWAS results on TE methylation. They find that for most TE families winter lines are more methylated than springs. He focus a little on BARE1 copia-like elements, associated to drought and ABA responses, with higher CNV equatorial/sorth term T fluctuating regions. He shows a negative correlation between BARE1 CNV and yield. He shows nice boxplot-like plots showing individual data. He is asked to what extent the reference genome (Morex) affects his conclusion. He is also asked whether the seed source would affect his results, and to what extent his yield measurements are affected by the fact that he is planting barleys from other regions in North Europe.

Sorry, I missed the talks by Martin Groth (Helmholtz Zentrum München, Germany), Nick Loman (U. Birmingham, UK) and Tetsuya Higashiyama (Nagoya University, Japan).

25 de octubre de 2018

Plant Genomes in a Changing Environment (I)

Hi,  
the first meeting on "Plant Genomes in a Changing Environment" kicked off today at the Wellcome Genome Campus in Hinxton, UK. It is exciting to be here and find out this is probaby the first ever  plant genome meeting in an otherwise world-famous genomics venue.

 
I will post here my notes on the talks I attend to.


Caroline Dean, John Innes Centre, UK
She presents the different flowering habits of Arabidopsis thaliana accessions (rapid cycling, winter facultative & obligate winter-annual) and takes us to the current knowledge of the quantitative nature of winter recording in the FLC locus, a MADS repressor of flowering which is the target of a polycomb-mediated epigenetic switch. In addition, she summarizes the mutually exclusive non-coding FLC transcripts found to be cold induced, such as COOLAIR (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234544, https://www.nature.com/articles/ncomms13031). After flowering, the epigenome state of FLC is restored by a demethylase. COOLAIR is actually a Brassicaceae-conserved secondary structure RNA molecule substantially affected with a single SNP affecting splicing. She says that this ncRNA folds and stays in place, blocking physical access to that locus. She adds this mechanism is conserved in humans and Brassicaceae, and would expect the same in monocots.
By the way, COOLAIR non-coding transcripts seem to be annotated in Ensembl Plants: https://plants.ensembl.org/Arabidopsis_thaliana/Gene/Summary?g=AT5G10140;r=5:3173382-3179448;t=AT5G10140.2;db=core

FLC locus accumulates H3K27me3 histones with exposure to cold, setting up a bistable state of inducing/repressing chromatine modifications. This balance spreads across tissues and cell populations, including the root tip. This memory is sustained by the own chromatin in cis (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450441).
She then presents the RY cis elements in intron 1 of the FLC locus which is repressed by VAL1 (https://www.ncbi.nlm.nih.gov/pubmed/27471304) to trigger polycomb nucleation (http://floresta.eead.csic.es/footprintdb/index.php?tf=ea4a1835a3360403cd07b75528829572).
When they looked at 80 world-wide populations they found distinct FLC haplotypes, which compared to each other in a common background explain a linear vernalization requirement.
She claims that in A. thaliana vernal days are actually afternoons with temperatures < 15 °C (https://www.nature.com/articles/s41467-018-03065-7).  


Doreen Ware, USDA and Cold Spring Harbor, USA
She talks about a maize pangenome browser currently under development. She explains that growers require a platform that would allow easy knowledge transfer from some plants to others, so that it can be used in breeding. She talks about CNV genes with agronomical impact, such as transporters providing Al tolerance (http://www.pnas.org/content/110/13/5241). She shows GRAMENE neighborhood conservation display modes based on Ensembl Compara data: 




Then she describes their current efforts PacBio-assembling 26 maize NAM parents, with SMRTlink assembly performed in the cloud (DNAnexus) and sped up 360x. The resulting assemblies are robust, with N50 > 34Mb.
She terminates with a quick overview of transcriptome profiling for heterosis-inspired work, with the aim of phasing isoforms, which is important for reconstructing heterozygous loci (https://www.nature.com/articles/ncomms11708).

Eric Schranz, Wageningen University, The Netherlands
Talks about conservation and divergence in relative gene order of plant and animal genomes using network-based synteny analysis. He explains genome territories and why gene context matters with multiple examples of Hox genes and body layout plans. He claims that we have a genomic hairball problem when looking at synteny, and that networks with edges~synteny can simplify the problem, allowing PAV and homeologues to be integrated easily (https://www.sciencedirect.com/science/article/pii/S1369526616302230).
He also explains phylogenetic profiling and how they used to find MADS box genes which are syntenic in all angiosperms but not in particular groups such as crucifers or monocots (http://www.plantcell.org/content/early/2017/06/05/tpc.17.00312).
He also explains that they´re doing a mammal vs plant synteny analysis. Overall, mammal genomes are syntenic, while plant genomes are not. This work is under review at PNAS. They do find family specific conserved syntenic blocks and a few, photosynthesis & clock-related, angiosperm-conserved genes.

John Vogel, University of California, Berkeley, USA
John talks about the pan-genome of Brachypodium distachyon and its implications for polyploid genome evolution. He describes the main findings of the Gordon et al paper (https://www.nature.com/articles/s41467-017-02292-8). He mentions that there is currently no way of displaying the pangenome efficiently in phytozome, and he looks forward to the new developments of Gramene.
He then introduces B. stacei and the resulting B. hybridum. He shows the high synteny between B. hybridum subgenomes and the diploid parental species, as well as the SNP-based tree suggesting at least two hybridization events. Then he shows k-mer plots suggesting that D-citotype B. hybridum (older) lines contain unique k-mer composition.
He then moves to the analysis of foundation effects in the hybrids, but shows that the hybridum + parental pangenome is not significantly different to the individual parental pangenomes. Finally, he shows dNdS plots to show that both subgenomes are still under selection.
M Morgante comments that this data is probably not compatible with a epigenetic shock post-hybridization.

Jae Young Choi, New York University, USA
Jae could not attendand was replaced by an unnamed researcher from the group. She starts by introducing that besides transposable elements (https://www.ncbi.nlm.nih.gov/pubmed/25917896), tandem repeats are important drivers and markers for plant diversity. The talk is actually about natural variation in telomere repeats, which essentially are a major plant satellite, and their correlation with flowering time. They work with 100-mers of Oryza species, which include telomeres. In fact they see that O. sativa indica has significantly larger telomeres than ssp. japonica, and that correlates negatively with days to flowering.

Gabriele Magris, University of Udine, Italy
Gabriele gave a very nice and comprehensive talk on the characterisation of the pan-genome of Vitis vinifera using NGS with a special focus on collinear genes that have gained or lost a neighbor transposable element (TE) affecting their expression. My battery died and unfortunately, I could not take proper notes. However, I recall that he show nice results on the methylation state of the regions where TE insert and the preference of TE families for particular genomic territories, such as LINE elements for introns for instance. I asked him about how to efficiently annotate TEs in genomes and he referred me to the work of Wicker (https://www.nature.com/articles/nrg2165-c2).