Hola, espero que estéis bien.
En esta primera entrada del año solamente quería señalar que BLAST+ fue actualizado a la versión 2.8.1+ hace un par de semanas a causa de un error encontrado al usar la opción -max_target_seqs, tal como se publicó en https://doi.org/10.1093/bioinformatics/bty833 y se discutió en https://www.biostars.org/p/340129 .
En respuesta a este error, tres autores del NCBI (Madden, Busby y Ye) escribieron una carta donde explican que el error detectado tiene menor impacto del esperado porque afecta a alineamientos con un número "muy elevado" de indels. Sin embargo, sí reconocen que el uso del parámetro -max_target_seqs con valores M pequeños puede causar confusión porque secuencias con igual puntuación se seleccionarían en base a su posición en el fichero FASTA de partida. Para abordar esto la versión actualizada avisa al usuario cuando use M < 5.
La explicación detallada de los autores de BLAST y los cambios introducidos en la versión actual se explican en https://www.ncbi.nlm.nih.gov/books/NBK131777 y https://doi.org/10.1093/bioinformatics/bty1026 .
Un saludo,
Bruno
Ideas y código para problemas de genómica de plantas, biología computacional y estructural
2 de enero de 2019
17 de diciembre de 2018
no sabemos plegar proteínas (CASP13)
Hola,
en la última entrada de este año, escrita desde Hinxton, UK, me gustaría hablar de CASP13, la edición más reciente del experimento colectivo de predicción a ciegas de estructuras de proteínas (que ya habíamos mencionado aquí).
Entre que esta ocasión ha habido un salto de capacidad predictiva y que el aprendizaje automático está de actualidad, este año CASP ha salido en todas partes: en Science, en The Guardian y hasta en El País.
Yo me centraré aquí en las opiniones de expertos participantes de CASP. Pero antes, para que sepáis de qué hablo, podéis ver los resultados oficiales en predictioncenter.org/casp13
Empezaré por esta figura de Torsten Schwede, que muestra el salto de calidad de las mejores predicciones a lo largo de la historia de CASP. El ajuste entre un modelo y su estructura experimental se calcula con la función GDT_TS:
Otra visión de los mismos resultados nos la da Mohammed AlQuraishi, mostrando la separación entre los mejores grupos/predictores en ediciones de CASP:
En ambos casos podemos ver una tendencia ascendente que habrá que ver si se mantiene en el tiempo o, si en cambio, se debe a que las secuencias problema de CASP13 eran más fáciles que otras veces.
Qué ha pasado en los últimos años? Seguramente la suma de muchas cosas. Por ejemplo, la llegada del equipo DeepMind en esta edad de oro del aprendizaje automático. Es curioso, porque las redes neuronales se han estado aplicando en CASP desde los años noventa para la predicción de estructura secundaria; sin embargo, desde 2011 sabemos que para muchas familias de proteínas tenemos tantas secuencias diferentes que podemos predecir los contactos que se dan entre las partes plegadas de la proteína.
Por tanto, no sabemos cómo se pliegan las proteínas todavía, pero algunos grupos de investigación han sabido explotar la información evolutiva implícita en alineamientos múltiples de proteínas para saber qué tipo de plegamiento adoptan finalmente. Muchos de esos grupos comparten su código fuente (por ejemplo http://evfold.org/evfold-web/evfold.do), a ver si lo hace DeepMind pronto,
hasta el año que viene!
Bruno
en la última entrada de este año, escrita desde Hinxton, UK, me gustaría hablar de CASP13, la edición más reciente del experimento colectivo de predicción a ciegas de estructuras de proteínas (que ya habíamos mencionado aquí).
Entre que esta ocasión ha habido un salto de capacidad predictiva y que el aprendizaje automático está de actualidad, este año CASP ha salido en todas partes: en Science, en The Guardian y hasta en El País.
Yo me centraré aquí en las opiniones de expertos participantes de CASP. Pero antes, para que sepáis de qué hablo, podéis ver los resultados oficiales en predictioncenter.org/casp13
Empezaré por esta figura de Torsten Schwede, que muestra el salto de calidad de las mejores predicciones a lo largo de la historia de CASP. El ajuste entre un modelo y su estructura experimental se calcula con la función GDT_TS:
Fuente: https://www.sib.swiss/about-sib/news/10307-deep-learning-a-leap-forward-for-protein-structure-prediction |
Otra visión de los mismos resultados nos la da Mohammed AlQuraishi, mostrando la separación entre los mejores grupos/predictores en ediciones de CASP:
Fuente: https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/ |
Qué ha pasado en los últimos años? Seguramente la suma de muchas cosas. Por ejemplo, la llegada del equipo DeepMind en esta edad de oro del aprendizaje automático. Es curioso, porque las redes neuronales se han estado aplicando en CASP desde los años noventa para la predicción de estructura secundaria; sin embargo, desde 2011 sabemos que para muchas familias de proteínas tenemos tantas secuencias diferentes que podemos predecir los contactos que se dan entre las partes plegadas de la proteína.
Fuente: https://doi.org/10.1371/journal.pone.0028766 |
Por tanto, no sabemos cómo se pliegan las proteínas todavía, pero algunos grupos de investigación han sabido explotar la información evolutiva implícita en alineamientos múltiples de proteínas para saber qué tipo de plegamiento adoptan finalmente. Muchos de esos grupos comparten su código fuente (por ejemplo http://evfold.org/evfold-web/evfold.do), a ver si lo hace DeepMind pronto,
hasta el año que viene!
Bruno
26 de octubre de 2018
Plant Genomes in a Changing Environment (III)
Hi, this is my account of the first few talks from the last day of the meeting.
Claudia Köhler,
Swedish University of Agricultural Sciences, Sweden
She talks about imprinted genes which are
flanked by transposable elements (TE) in Arabidopsis
thaliana. They find that RNApolIV mutants suppress triploid seed abortions.
RNApolIV is know to be involved in RNA-guided methylation. They found that
RNApolIV is behind the biogenesis of easiRNAs from TEs, and that correlates
with decreased CHH methylation in the endosperm of triploid seeds (https://www.ncbi.nlm.nih.gov/pubmed/29335544). So they propose that
pollen-derived easiRNAs are functional after fertilization and have a
transgenerational role in assessing gamete compatibility, similar to animal
piRNAs. The relevance of the results is that these mechanisms allow rapid
evolution of hybridization barriers and ultimately speciation.
Isabel Bäurle,
University of Potsdam, Germany
She talks about how Arabidopsis thaliana plants remember past stress events, particular
heat, which is one of the most fluctuating stress sources in nature. She
describes Heat Shock Factor 2 (HSFA2) and how it associates transiently to
genes conferring heat memory. Target genes were observed to accumulate H3K4me3,
making chromatin accessible for at least 5 days
(https://www.ncbi.nlm.nih.gov/pubmed/26657708,
http://www.plantcell.org/content/early/2014/04/25/tpc.114.123851). Then she moves to describing
BRU1/TSK/MGO3, which is orthologous to animal TSL, which has an epigenetic role
during DNA replication and is also required for heat memory ensuring that
chromatin marks are inherited during cell division (https://onlinelibrary.wiley.com/doi/abs/10.1111/pce.13365). Their long-term goal is to
provide stress-memory to crops in the right moment so that yield is not too
affected.
Manu Dubin, CNRS / Université de Lille, France
He explains he is back to academia from
industry and that he is studying how both climate of origin and breeding efforts
influence DNA methylation in barley (Hordeum
vulgare) and how that is linked to adaptation, inspired in previous work on
climate clines in A. thaliana. They
used USDA barley core collection (inbred seeds from Mexico) with both landraces
and cultivars from Europe and North America, but does not include any Iberian
barleys nor North-African, which are known to contribute to the genetic diversity of
the species (see for instance https://link.springer.com/article/10.1007/s11032-018-0816-z).
They observe that winter barleys have slightly higher CG methylation than
springs and show GWAS results on TE methylation. They find that for most TE
families winter lines are more methylated than springs. He focus a little on
BARE1 copia-like elements, associated to drought and ABA responses, with higher
CNV equatorial/sorth term T fluctuating regions. He shows a negative correlation
between BARE1 CNV and yield. He shows nice boxplot-like plots showing
individual data. He is asked to what extent the reference genome (Morex)
affects his conclusion. He is also asked whether the seed source would affect
his results, and to what extent his yield measurements are affected by the fact
that he is planting barleys from other regions in North Europe.
Sorry, I missed the talks by Martin Groth (Helmholtz
Zentrum München, Germany), Nick Loman (U. Birmingham, UK) and Tetsuya
Higashiyama (Nagoya University, Japan).
25 de octubre de 2018
Plant Genomes in a Changing Environment (II)
Now for the second day.
Etienne Bucher, INRA,
France
I miss the beginning of the talk but still get
the main message: you can control the efficiency of retrotransposon
mobilization in plants by exposing plants to heat (stress) and drug-inhibiting
RNA pol II, which has a key role on transposon defense (RNA-directed methylation).
The key paper is https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1265-4. They are using controlled [drugs: α-amanitin and zebularine] to create
new variants and to select them in the field with rice and soybean. He has set
up a company called epibreed to carry out this kind of experiments, but he
insisted the approach can be used for free for research purposes.
Holger Puchta,
Karlsruhe Institute of Technology, Germany
He takes us to a nice overview of double-strand
breaks in plant genomes, and then moves to CRISP-Cas9 systems, where they
initially the got 15% (heritable mutation) efficiencies in Arabidopsis thaliana. And now, using S. aureus Cas9, they achieve 90% efficiencies. They have tried
several approaches for in planta gene targeting (initial idea summarized
in http://www.pnas.org/content/early/2012/04/19/1202191109)
and are improving their efficiency so that they can use it to routinely knock
out genes in A. thaliana (http://www.pnas.org/content/113/26/7266.short). He discusses that by combining
double-strand breaks it is possible to induce recombination in centromeric
regions, where meiotic recombination is extremely unlikely. In A. thaliana, out of 200 ds-break you get
about 10 cross-over events. He is funded by ERC.
Sophie Harrington,
John Innes Centre, UK
She talks about TILLING to study wheat
senescence. They do EMS TILLING populations and sequence captured exons. She
shows a nice figure of Ensembl Plants where this kind of data is readily available
for users. She then introduces NAC transcription factors and in particular the
NAM factors related to senescence. They use tetraploid wheat to study NAM-A1,
because it´s single-copy there. By phenotyping EMS populations they see a
particular amino acid substitution induces a significant delay in senescence in
the field in two environments. Using yeast two-hybrids (Y2H) they believe these
mutations impair NAC dimerization. She mentions a paper describing the NAC
family in wheat (https://www.ncbi.nlm.nih.gov/pubmed/28698232). They used chromosome sorting to
isolate a chromosome harboring a region with a clear allele frequency shift
linked to senescence, they are working on sequencing that region. She gets
several questions regarding dominant mutants in wheat, and how the dominant
nature relates to the number of copies of the mutated regions.
Youssef Belkhadir,GMI
Vienna, Austria
He talks about the molecular logic and emergent
properties in receptor-receptor interaction networks around plant signaling.
There are 400 receptor kinares (RKs) in Arabidopsis
thaliana. They have diverse extracellular domains (ECDs). He shows nice
cartoons of large & short Leu-rich ECDs docked together with a ligand and
triggering intracellular phosphorylation and presents their approach to
high-throughput screen LRR domains, as published in https://www.nature.com/articles/nature25184. They did confirmation Y2H
experiments and found and agreement of 57% for high-confidence short-to-long
LRR interaction predictions. By using network dissection, including page rank,
they find that sort LRR proteins are more frequently central nodes than long
LRR proteins.
He also shows data from an A. thaliana diversity panel (about 600 lines) used for large-scale
root phenotyping assays of plants treated with brassinosteroids. Subsequent
GWAS analyses suggest several LRR genes to explain the differences observed.
He mentions that BAK1 receptor is 100%
conserved at the amino acid level in over 1K A. thaliana lines. He mentions that absence genotypes of particular
LRR genes were confirmed by PCR against the suspected genome. They didn´t do
the actual annotation; instead this was done at the group of Magnus Nordborg.
Anne Osbourn, John
Innes Centre, UK
She talks about antimicrobial compounds (such
as avenacin) synthesized at the roots of Avena
plants. The responsible pathway is actually composed of several neighbor genes
which are all under concerted expression, with a root-specific promoter (http://www.pnas.org/content/111/23/8679). They have a contig of this 720Kb
region of the genome and they believe this cluster is not conserved in Brachypodium nor in wheat.
She mentions that many metabolic gene clusters
have been reported in both monocot and dicots, that no horizontal gene transfer
from microbes has been demonstrated and that probably their genomic
co-localization is linked to their regulation and epigenomics (https://www.ncbi.nlm.nih.gov/pubmed/26895889). They have developed transient
expression systems to test these metabolic clusters, both natural and
synthetic, in Nicotiana leaves and
obtained in some cases gr-scale triterpenes productions (https://www.ncbi.nlm.nih.gov/pubmed/28687337).
She then describes the thalianol pathway in A.
thaliana, which was the first operon-like they ever predicted, and other
posterior examples, such as http://www.pnas.org/content/114/29/E6005. She also shows data of rhizosphere
composition changes in mutants on these pathways. They have developed a tool
for predicting metabolic clusters: http://plantismash.secondarymetabolites.org
Matteo Dell Acqua,
Scuola Superiore Sant'Anna, Italy
He talks about the identification of candidate
genes for maize leaf development using tools such as GWAS, eQTL and precision
phenotyping. He emphasizes the need to integrate approaches due to the
observation that most alleles have small effects, with only a few major effect
genes whatever the complex trait under study. He shows correlations among gene
expression values and leaf traits, as well as GWAS-derived SNPs associated to
the same traits.
He also shows that for eQTLs, the majority of
expression levels analyzed are associated to remote cis & trans locations
(matrix of expressed gene position vs eQTL position, cis are in diagonal). They
focus on cis SNPs found for several
traits, and find several genes encoding vacuole pumps. He mentions the
challenge of pericentromeric regions that have high linkage disequilibrium,
that produce artificial segments with consecutive eQTLs. They use also WGCNA
and compute correlations between modules and phenotypes, finding that some have
positive correlations while others are actually negative.
He concludes by summarizing that RNAseq data
are very valuable to do eQTL analyses and to produce markers.
Ming-Jung Liu,
Academia Sinica, Taiwan
She starts by saying that Academia Sinica is
currently recruiting and moves to talk about regulatory divergence in
wound-responsive gene expression between domesticated (lycopersicum) and wild (pennellii)
Solanum species. She expends some time discussing the tradeoff between growth
and wound stress tolerance in wild species. They identified putative cis
regulatory elements enriched in clusters of genes related to wound responses,
which correspond to G-box and W-box elements, and are enriched in upstream
regions immediately before TSS positions. They then check whether these cis
elements are conserved between both species and find that most are conserved
but a good fraction are actually non conserved, unique to each species (http://www.plantcell.org/content/early/2018/05/09/tpc.18.00194).
Sally Aitken,
University of British Columbia, Canada
She talks about climate adaptation in conifers,
which are currently experiencing drought and massive death at British Columbia.
She talks about the increasing frequency of extreme climate events, added to
the warming trends. (Tree) seed and breeding zones based on local populations
no longer match genotypes with climates. Mutation rates in trees are low per
year but high per generation. They have estimated that climate is chainging at
a speed of 70km/yr, while paleobiology evidence suggest trees have in history
travelled at 0.1km/yr. She describes their AdapTree project which is designed
to manage this issue in W Canada with assisted gene flow. They have not seen
population variability in drougt/heat response, only in cold hardiness. As they
don´t have access to good assemblies they used exome capture and SNP arrays to
do Genome to Environment Association with bayenv2 and standard GWAS. She
explains that the population structure of conifers actually correlates with
climate gradients, so that by removing pop structure you actually miss
potentially bona fide adaptation loci. So they decided to not remove pop
structure and instead took only SNPs in excess of the background distribution
of SNPs per gene (http://science.sciencemag.org/content/353/6306/1431). They found 47 candidate genes
common to pine and spruce populations and later work was done to find correlating
haplotypes, instead of individual SNPs, to be used as markers (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1545-7).
I missed “Reinforcing plant volutionary
genomics using ancient DNA” by Hernan Burbano (MPI Tübingen, Germany) and “A
major QTL for grain weight in wheat is associated with increased grain length
and cell size” by Jemima Brinton (John Innes Centre, UK).
Esther van der Knaap, University
of Georgia, USA
She talks about their work on the mechanisms
underlying morphological diversity in tomato, which is largely explained by
four gene families, including Ovate and the OFP family members. OFP have been
shown to interact with TFs, to act as repressors and to affect cellular
localization of other proteins. They observed that OFP20 interacts with a series
of proteins in Y2H assays and further refined the list by doing Cas9 knockout
mutants and found that the pear/round shape is related to patterns of cell
division in the fruit. She mentions a collaboration with Toni Monforte (UPV,
Spain) where they found another OFP family member responsible for melon fruit
shape.
Benjamin Brachi, INRA,
France
He talks about natural variation of leaf
secondary metabolites, and the underlying genetics, in European white oaks (Quercus robur). They have a reference
genome and a genetic map made from trees planted in 1999. They do mass spectrometry
from leave extract, cluster the compounds/pseudomolecules observed and estimate
their replicability and heritability. He then explains a study of 9 populations
of Quercus petrae from around France,
where they see that population provenance does explain a very small part of the
metabolites analyzed, and a fraction of those actually have bimodal/binary/PAV patterns:
they are either produced or not at all. I think he believes the latter have a
genetic explanation, while the rest probably respond largely to the
environment.
Andrew Gloss, University
of Chicago, USA
Andy talks about plant genotype × herbivore
genotype interactions using 288 ecotypes of Arabidopsis
thaliana, with the goal of discovering the genetic architecture of resistance
to herbivory. The chosen herbivor is a fly related to Drosophila. They measure leaf damage and perform multi-trait GWAS,
classifying SNPs as common genetic SNPs and SNPs with effects that depend on
the plant population studied. He then focus on gene PBSL, which underlies
clinal variation in size from N to S Europe.
Sarah Schiessl
Weidenweber, Justus Liebig University Giessen, Germany
She talks about miRNA signaling under drought
stress in winter lines of alopolyploid Brassica
napus. How does drought affect flowering? It delays flowering and reduces yield.
Their hypothesis is that the flowering networks senses drought stress by means
of RNAi. They put their plants in containers to get realistic soil drying
compared to pots, sampled tissue and finally did WGCNA analyses first with
RNAseq to define modules and then with small RNAs looking for those correlated
with modules defined earlier. Now they are studying in PCR experiments the expression
of the candidate smallRNAs and they have observed a high variation across
genotypes.
Adrien Sicard, SLU
Uppsala, Sweden
He talks about the convergent evolution of
flower morphology after the transition to selfing in the genus Capsella. He introduces the selfing
syndrome of repeated morpho evolution in plants, which tend to reduce petal
size by reducing the number of petal cells, which they also see after Principal
Component Analysis of transcriptomes of selfing and non-selfing species. They
have a strong QTL for petal size in a population of two selfing species. When
the candidate gene is mutated, probably in the promoter, they see pleiotropic
effects.
Suscribirse a:
Entradas (Atom)