26 de abril de 2017

Genome annotation with footprintDB

Hi,
some of you might have heard of our footprintDB collection, which is somewhat unique in that it annotates DNA motifs from different sources together with their cognate transcription factors (TF) and their interface residues. it was published in 2014 and is regularly updated and queried by users around the world, who usually perform interactive searches.

There is also a web services interface which is also quite useful, but slow if you have many sequences to scan (see examples in the manual). Things are even worse if you have a complete genome or proteome. And that's exactly what Teshome Mulugeta, who's visiting the lab from Norway, needed to do.

ACE2 DNA motif, taken from http://floresta.eead.csic.es/footprintdb/index.php?motif=cb6f6b343b895dfa1c3776c99fbedda7 .
So, we have made available FASTA files of all transcription factors in footprintDB, together with their cognate DNA motifs, at http://floresta.eead.csic.es/footprintdb/download . They come in three flavours (all, Metazoa and plants), and TF sequences look like this one:

>1:ACE2 [Saccharomyces cerevisiae] libs:JASPAR;CISBP; motif:vTGCTGGtym;mCCAGCa; url 
MDNVVDPWYINPSGFAKDTQDEEYVQHHDNVNPTIPPPDNYILNNENDDGLDNLLGMDYYNIDDLLTQELRDLDIPLVPSPKTGDGS
SDKKNIDRTWNLGDENNKVSHYSKKSMSSHKRGLSGTAIFGFLGHNKTLSISSLQQSILNMSKDPQPMELINELGNHNTVKNNNDDF
DHIRENDGENSYLSQVLLKQQEELRIALEKQKEVNEKLEKQLRDNQIQQEKLRKVLEEQEEVAQKLVSGATNSNSKPGSPVILKTPA
MQNGRMKDNAIIVTTNSANGGYQFPPPTLISPRMSNTSINGSPSRKYHRQRYPNKSPESNGLNLFSSNSGYLRDSELLSFSPQNYNL
NLDGLTYNDHNNTSDKNNNDKKNSTGDNIFRLFEKTSPGGLSISPRINGNSLRSPFLVGTDKSRDDRYAAGTFTPRTQLSPIHKKRE
SVVSTVSTISQLQDDTEPIHMRNTQNPTLRNANALASSSVLPPIPGSSNNTPIKNSLPQKHVFQHTPVKAPPKNGSNLAPLLNAPDL
TDHQLEIKTPIRNNSHCEVESYPQVPPVTHDIHKSPTLHSTSPLPDEIIPRTTPMKITKKPTTLPPGTIDQYVKELPDKLFECLYPN
CNKVFKRRYNIRSHIQTHLQDRPYSCDFPGCTKAFVRNHDLIRHKISHNAKKYICPCGKRFNREDALMVHRSRMICTGGKKLEHSIN
KKLTSPKKSLLDSPHDTSPVKETIARDKDGSVLMKMEEQLRDDMRKHGLLDPPPSTAAHEQNSNRTLSNETDAL

The header contains the internal accession number, the main TF name, the organism name, the source libraries, the DNA motifs (from JASPAR and CISBP in the example) and a URL where the full annotation and references are available,
cheers,
Bruno

No hay comentarios:

Publicar un comentario