11 de junio de 2024

AllHands 2024 en Uppsala (II)

Elaine Harrison empieza explicando el programa científico de ELIXIR, articulado alrededor de tres temas de la nueva capa/tier científica:

1.  Patrick Aloy (ES) introduce el tema "Biología celular y molecular" e insiste en ir más allá de los datos  para poder reconstruir el "full analytic journey" de cada estudio.

2.  Robert Waterhouse (CH) introduce el tema que nos toca más cerca "Biodiversity, food security, & pathogens (BFSP)"

3.  Serena Scollen (Hub) presenta "Human data and translational research" y el ciclo completo de los datos genómicos humanos

Taller "Single-cell galaxy user journey" with Wendi Bacon.
https://galaxyproject.org/community/sig/singlecell
https://www.biostars.org/p/471274

No data standard yet, closest is https://anndata.readthedocs.io (really an HDF5 derivative python friendly, R users still use dataframes).

https://usegalaxy.eu workflows can be created graphically, or by stacking up and exporting the operations you carried out on your data.

Workflows are ultimately text files; easy to convert workflows galaxy -> snakemake / nextflow by exporting to bash, difficult the other way around. Can be stored at https://workflowhub.eu

https://github.com/galaxyproject/idc -> genomic references for Galaxy

https://biostar.galaxyproject.org/p/11944/index.html

 
"Defer dataset" allows using public URL as input, data only downloaded when executing in particular galaxy node, only results stored in main; saves disk quota.

Taller "Interop_Mini-Symposium_All_Hands_2024"
RDA = Research Data Alliance
ebi.ac.uk/metabolights , still lack accepted standards for data deposition.
Wei Kheng Teh talks about metadata heterogeneity of single-cell omics data
https://isa-tools.org/ -> https://simplifier.net/guide/isa-to-fhir?version=current

https://www.researchobject.org/ro-crate/  -> lightweight approach to packaging research data with their metadata, see https://doi.org/10.5281/zenodo.5146227,  can be stored in Zenodo or GitHub, Workflow -> to get work done / Dataflow -> to publish and share data.

ELIXIR-CZ are working on writing material / templates for DMPs, there's a gap there, on bridging high level interoperability aims and actual detailed protocols.

Interoperability barriers: traditional rules, increasing data complexity, we still need a minimal standard for acrossfields data integration, heterogeneity of data quality.


Taller "Paving the way towards the effective use of generative AI for ELIXIR - Agenda"
Michael Hu, PI and Director of Bioinformatics at West Virginia University habla de "Bioinformatics with ChatGPT"
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011511
https://arxiv.org/abs/2403.15274
Renat Shigapov, Data Scientist at University of Mannheim talks about "Reviewing (meta)data and evaluate their FAIRness using ChatGPT+", pero su charla requiere una licencia de chatGTP Plus. La idea central es que pare que un conjunto de datos sea FAIR entonces deben poder encontrarlo tanto otras personas como software y que GPT puede ayudar en esa tarea, siempre y cuando se incluyan conexiones a fuentes externas para comprobar las URLs que devuelve y evitar alucionaciones. Sugiere que ELIXIR debería usar herramientas como https://github.com/UB-Mannheim/FAIR-GPT


 


Imagen
Poster disponible en https://doi.org/10.7490/f1000research.1119714.1, foto de Ana Conesa https://x.com/anaconesa/status/1800828390607610058


PD 12062024 day III

ELIXIR technical tier, 5 Plataforms
Ejemplos de proy EU que se convirtieron en servcios: RDMkit, workflowhub
M Jetten presenta RDM community
Elixir software registry: bio.tools > openebench > biocontainers > galaxy
https://github.com/research-software-ecosystem/content
TeSS training support system https://tess.elixir-europe.org
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007854
LS Login

AllHands 2024 en Uppsala (I)

Esta semana he acudido a la reunión anual AllHands de ELIXIR, la ESFRI europea para datos de ciencias de la vida que hace poco cumplió 10 años. Se suele realizar cada año en un nodo diferente, este año en Suecia, en la ciudad de Uppsala. 

Aunque llevo ya un tiempo participando en actividades de ELIXIR, y desde hace uno ya como parte del nodo https://inb-elixir.es, es mi primera vez y me ha venido bien para aprender su vocabulario propio y ver un poco cómo funciona. Aquí pondré mis notas.

https://www.proteinatlas.org is a core resource, from tissues to cell lines and single-cell (nTPM for expression). 0% human prots are housekeeping, 15% tissue specific. Menciona https://olink.com para detectar 5k prots en muestras humanas. Case-control not a good approach to find markers, you need to consider wide disease panel.

Vocabulario:

    Commisioned services (CoS) are funded from EXILIR budget.
    Communities are funded for 2 years (capital C).
    Focus Groups?
    Platforms are operational infrastructures, not computing Resources
    Services ~500 are provided by nodes

 RDM = Research Data Manager

Taller "Demystifying ELIXIR: Everything you ever wanted to know and more" incluye preguntas y respuestas resueltas en grupos, por ejemplo: ELIXIR name was proposed by Janet Thornton, does not mean anything

Taller "Training MiniSymposium All Hands 2024" https://github.com/elixir-europe-training . Shortly a learning Path in place for all communities. Fair training handbook. Course design: considerations for trainers https://f1000research.com/documents/9-1377.

Taller "Insights into ELIXIR's Biodiversity and Plant Science Collaborations: Fostering Cross-Disciplinary Dialogue". Cyril presenta las ideas de la comunidad de plantas. Robert Waterhouse (CH) habla sobre la Comunidad de biodiversidad, donde también está T Gabaldón; menciona https://rdmkit.elixir-europe.org/plant_sciences; le preguntan sobre digital twins para modelo de nichos ecológicos. S Beier habla sobre nuevo plan de trabajo de nuevo itinerario (hasta 2028) de Plant Sc Community, que continúa https://f1000research.com/documents/10-145 . Menciona https://framework.frictionlessdata.io/docs/guides/validating-data.html para validar ficheros de datos tabulares. Phenotypic data does not fit well EBI, instead Zenodo, GnpIS or https://recherche.data.gouv.fr/fr using MIAPPE as a common language. Yvan Le Bras habla de la iniciativa fr biodiversidad PMDB. Cita https://onlinelibrary.wiley.com/doi/10.1002/ece3.9961 . gbif + metashark. Galaxy, conda, biocontainers. K Gruden holobionte, similar initiative at EPSO, not effective communication with ELIXIR. Nfdi4Biodiversity en .de en 10años. PHENET Daniel Wibberg. EU coordinado desde inrae montpellier. https://www.phenet.eu/en/about-phenet/phenet_partners


Auditorio principal de universidad de Uppsala