#!/perl/bioinfo: plot

26 de febrero de 2025

Python tool to plot genomic data

Hi, today I just want to share a tool to prepare nice pictures from
genomic data files. It can be used straight-forward with some
commands and arguments, but an interesting point about this
is that actually uses an intermediate configuration file which can be
smoothly edited. We will go deeper on that later. A remarkable fact
is that is easy to use, as well as manageable in huge amount of
parameters, but also very well described, intuitive, and overwhelming
in details.

First, find here the documents and all guides of usage, and the
reference paper as well.

As mentioned, it takes different output files from several omics, from
widely used bed or FASTAs types, MAFs, GTFs… up to some others
(which I have not tested yet) such as HiC matrix, or epigenetic
annotation Epilogos.

It is coded in python. It can be easily installed using pip

$ pip install pyGenomeTracks

All dependencies should be automatically installed. I personally
recommend having a conda environment for pyGenomeTracks.

To use it, pyGenomeTracks needs a configuration file describing the
requirements for the tracks included in the projected image. Something
like the “instructions” or “cooking recipe”.

Using terminal command lines, it is possible to build this file. Basic
arguments are input files, from which the plot will grab the data and
build the plot, and the output tracks.ini file. This is the configuration file,
and will be shaped according to the type of file that you are providing as
input.

$ make_tracks_file --trackFiles <bigwig file> <bed file> etc. -o tracks.ini

Once you have the configuration file, go for the command which will
generate the image.

$ pyGenomeTracks --tracks tracks.ini -o image.png

Depending on the plot type, you may have to provide some other
arguments. For instance, if your input is a .bed file, you maybe want to
capture a specific region:

$ make_tracks_file --trackFiles <bed file> --region chr1:1000000-4000000
-o tracks.ini

And some other stuff such as title, font, width/height or resolution. In
summary, you can re-use a single configuration file to try and try to plot
as many times your data without re-editing the parameters you probably
expended some time before optimizing them.

And, about this time to prepare the parameters about the plot, here it comes
the most interesting part. You can edit the instruction file from a command
line, but, since it is at the end a text file that stores parameters in each line,
it allows building it manually. Having the guide from all parameters of a plot
types, and keeping an intuitive structure, you are able to control a wide set
of variables, colours, styles… And it is even possible to stack plots,
increasing the possibilities up to the limit of your imagination.

I don’t want to make this post longer. Almost forgot, I’m Joan, a training
researcher. Developing some tasks, I wasn’t able to find an adequate tool to
quickly plot some haplotypes I am working on, and I faced pyGenomeTracks.
Have a look at one of my beautiful and colourful images, as an example.

Hope to write often here!

6 de octubre de 2020

Violines desde varios ficheros en R

Hola,

la semana pasada necesitaba hacer una gráfica para mostrar lado a lado varias distribuciones de miles de observaciones. Para ello me decanté por las llamadas gráficas violín, que mi colega Pablo Vinuesa ya había usado en esta figura (panel D) hace unos años.

Los violin plots permiten resumir distribuciones de manera concisa, mostrando la mediana (punto blanco), el rango intercuartil (el segmento negro grueso) y la densidad de puntos como una curva a cada lado. Por esta última característica son superiores a los diagramas de caja (boxplots).

El siguiente código en R muestra como calculé esta gráfica horizontal usando la librería vioplot de R a partir de múltiples ficheros de texto, de un valor por línea y extensión .tsv, uno por cada serie de datos.

library(vioplot)

# setwd("path") # if required
filedir="./"

# parse input TSV file names
repeat_files = list.files(path=filedir, pattern="\\.tsv")
series_names = gsub("\\.tsv", "", repeat_files)

# actually read files into data frames
repeats = lapply(repeat_files, function(i){
  log10(read.table(i, header=FALSE))
})
names(repeats) <- series_names

# increase left and bottom margins to make room for axis labels
par(mar = c(4, 11, 1, 1)) 

plot("", 
  ylim = c(0.5, length(repeats)+0.5),

  # low X values truncated in example

  xlim = c(2, max(unlist(repeats))), 
  yaxt = "n",  
  ylab = "", 
  xlab = "log10 repeat length")
axis(2, labels = series_names, at = c(1:length(repeats)), las=1)

# add violins one by one
lapply(seq_along(repeats), function(x)
  vioplot(repeats[[x]], at = x,  add = T, box = F, horizontal=T)
)

Hasta pronto,

Bruno