Mostrando entradas con la etiqueta RSAT. Mostrar todas las entradas
Mostrando entradas con la etiqueta RSAT. Mostrar todas las entradas

8 de enero de 2024

footprintDB added new TFs, cis elements and binding interfaces

Over the last couple of weeks I have carried out the annual update of footprintDB, our database of transcription factors (TFs) with annotated cis elements and binding interfaces.

This involved three consecutive steps:

1) Updated 3d-footprint (completed 21/12/2023). This means that the collection of protein-DNA complexes from the Protein Data bank was updated and will help annotate more interface residues in TFs.

2) Updated the EEADannot collection with plant motifs and sites manually curated by us from papers published recently. You can see the sources at https://github.com/eead-csic-compbio/EEADannot .

3) Added JASPAR 2024 data and predicted interface residues for all included TFs.

 

You can check the current contents of each collection at https://footprintdb.eead.csic.es/index.php?databases , an overall summary is shown here:

 


The data can be downloaded in FASTA and TRANSFAC formats at https://footprintdb.eead.csic.es/download and the motifs have also been updated in https://github.com/rsa-tools/motif_databases , which feeds RSAT servers (see the plants server here). Note that some collections were left out due to licensing limitations.

Bruno

23 de diciembre de 2022

RSAT::Plants updated (Dec2022)

Hi, 

if you use the Plants server of the Regulatory Sequence Analysis Tools (RSAT), you might want to know that it has just been updated. Here's a short summary of the changes:

  • The updated URL is https://rsat.eead.csic.es/plants
  •  It now supports HTTPS connections powered by certbot
  •  It now uses the source code at https://github.com/rsa-tools/rsat-code (I have updated some documentation along the way)
  •  Nine new species have been imported from Ensembl Plants: Lolium perenne, Brassica juncea, Echinochloa crusgalli, Digitaria exilis, Vigna unguiculata, Brassica rapa ro18, Corylus avellana, Ficus carica, Lactuca sativa
  •  One species renamed: Physcomitrium patens
  •  Three updated with a new assembly: Vitis vinifera, Triticum urartum, sunflower
  • This leaves the total number of supported assemblies in 100; you can see their stats at https://rsat.eead.csic.es/plants/data/stats
  • Most species now correspond to release 55 of Ensembl Plants, but note that the sequence data is unchanged in many cases. This means that, for instance, that Hordeum_vulgare.MorexV3_pseudomolecules_assembly.52 becomes        Hordeum_vulgare.MorexV3_pseudomolecules_assembly.55, but the sequence is exactly the same.         

  


Have a nice break,

Bruno


31 de mayo de 2022

footprintDB May 2022 version

Hi, we just updated the motifs, transcription factors and sites in the database footprintDB

The 31052022 version includes JASPAR2022 and new plant data at EEADannot (see repo), plus all protein-DNA interface residues have been recomputed after bringing 3d-footprint up to speed. 

The current contents include:

totaluniquemetazoaplants
Transcription Factors9920746249401217
DNA motifs (PSSM)146151212980932233
DNA Binding Sites/Sequences46154


Sequence logo of transcription factor WRKY71, read more here.


The footprintDB motifs will be shortly synced with the RSAT servers so that they can be used there as well, see you soon,
Bruno

24 de agosto de 2021

gene IDs in RSAT::Plants

Plant genomes in plants.rsat.eu are imported from different sources, such as Ensembl Plants, the NCBI or JGI Phytozome. You can check the actual source of your genome of interest by browsing the left menu, finding 'Genomes and genomes' and clicking on the supported organisms table. As each database is different, the format of gene IDs across genomes might vary. Sometimes a genome might have several annotations as well, with different gene names. So it's important to know which one to use and which is available at RSAT. Here you'll learn two ways to work out the correct gene IDs for your genome.

1. Sequence tools -> retrieve sequence

On the left menu, find 'Sequence tools' and then click on retrieve sequence. On 'Mandatory inputs' type/select the appropriate genome and click on 'all genes of this organism'. See the figure for an example:

You can then click on 'Run analysis' and if you select 'display' you'll get FASTA output, where you can see the gene IDs in the header.


2. Data -> Linux terminal

On the left menu, find 'Help & Contact' and then click on data. Find the 'genomes/' folder, then your genome and therein the 'genome/' folder. There should be a file named 'gene.tab'. You should copy the URL to that file and then in the terminal call wget or curl:

 
dataurl=http://rsat.eead.csic.es/plants/data/
genefile=$dataurl/genomes/Cannabis_sativa.cs10.GCF_900626175.2.NCBI/genome/gene.tab
wget -O - -o /dev/null $genefile | cut -f 1 |grep -v "^;" | head

curl -s $genefile | cut -f 1 |grep -v "^;" | head 

 

You should obtain a one-column file with the actual gene IDs supported for that genome, which you can copy and paste in retrieve sequences directly.

Hope this helps,

Bruno