8 de octubre de 2013

What does an SNP look like?

So about to use samtools pileup or mpileup? Would you like to take a look at the file, instead of going straightforward for the BCF and bcftools automation?

Ok! this is what an SNP looks like!

contig_100029   698     C       39      .$.....,.,,,,.,,,.,.,..,......,,,,,,.,,^S. 
contig_100029   699     A       38      .....,.,,,,.,,,.,.,..,......,,,,,,.,,. 
contig_100029   700     A       38      .....,.,,,,.,,,.,.,..,......,,,,,,.,,. 
contig_100029   701     C       39      TTTTTtTttttTtttTtTtTTtTTTTTTttttttTttT^ST       
contig_100029   702     A       40      .....,.,,,,.,,,.,.,..,......,,,,,,.,,..^S,     
contig_100029   703     G       41      .....,.,,,,.,,,.,.,..,......,,,,,,.,,.,.,       
contig_100029   704     G       42      .....,.,,,,.,,,.,.,..,......,,,,,,.,,.,.,^S.   

I have ommited the last column (read base qualities).

And what about a reference skip? Maybe want to see an spliced alignment? Here you are!

contig_100029   516     T       43      ,,,.,,...,.,,.................,.,,,,.,,,.,^S.   
contig_100029   517     A       43      ,,,.,,...,.,,.................,.,,,,.,,,.,.     
contig_100029   518     G       43      ,,,.,,...,.,,.................,.,,,,.,,,.,.  
contig_100029   519     A       43      ,,,.,,...,.,,.................,.,,,,.,,,.,.     
contig_100029   520     G       43      ,,,.,,...,.,,.................,.,,,,.,,,.,.    
contig_100029   521     G       43      <<<><<>>><><<>>>>>>>>>>>>>>>>><><<<<><<<><>     
contig_100029   522     T       43      <<<><<>>><><<>>>>>>>>>>>>>>>>><><<<<><<<><>     
contig_100029   523     G       43      <<<><<>>><><<>>>>>>>>>>>>>>>>><><<<<><<<><>     
contig_100029   524     A       43      <<<><<>>><><<>>>>>>>>>>>>>>>>><><<<<><<<><>    
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .
contig_100029   651     C       43      <<<><<>>><><<>>>>>>>>>>>>>>>>><><<<<><<<><>    
contig_100029   652     A       43      <<<><<>>><><<>>>>>>>>>>>>>>>>><><<<<><<<><>     
contig_100029   653     G       43      <<<><<>>><><<>>>>>>>>>>>>>>>>><><<<<><<<><>     
contig_100029   654     G       44      ,,,.,,...,.,,.................,.,,,,.,,,.,.^S, 
contig_100029   655     G       44      ,,,.,,..$.,.,,............$.....,.,,,,.,,,.,., 
contig_100029   656     C       42      ,,,.,,..,.,,................,.,,,,.,,,.,.,     
contig_100029   657     G       42      ,,,.,,..,.,,................,.,,,,.,,,.,.,     

Who wants an IGV when you can scroll up and down seeing how things pile down and up?
An it is memory efficient, right?
You just have to grep or awk filter your region of interest!

Do you want me to explain about columns?

1st: reference_name
2nd: position in reference
3rd: base on reference
4th: depth, or number of bases pilling-up over this reference position
5th: each symbol comes from a read, with some exceptions. Lets look at it:

"." and ",": matches! one in fwd strand, "," in reverse.
"$": end of read. Followed by mapping symbol of that read base ("." or "," for example). So, 2 symbols.
"^": start of read. Followed by mapping quality of the read (the MAPQ field in SAM format) and the mapping symbol of that base ("." or "," for example).
">" "<": reference skip. That is, the read still maps, but no in these reference bases. So probably an spliced alignment with part of the read aligning before the ">" and part after that. ">" fwd strand, "<" reverse.

6th: (not shown, just working with HQ bases ;)

Looking for more information? Ask please!

No hay comentarios:

Publicar un comentario