Tutorial ~~~~~~~~ Getting started --------------- The first step to obtain a plot is always setting the **engine**. The way to do it is using the ``set_engine`` function after importing. .. code-block:: import pyranges_plot as prp # As engine use 'plotly' or 'ply' for Plotly and 'matplotlib' or 'plt' for Matplotlib prp.set_engine("plotly") Similarly, some other variables can be set prior to the plot call, like ``id_col``, ``warnings`` and ``theme``; though unlike engine, they can be given as parameters to the :code:`plot` function. Pyranges Plot evolves around the :code:`plot` function, which admits output definition through its parameters and additional appearance customization options. All the parameters are explained in detail below, however to illustrate the options usage, the following figure can be used as a cheat sheet. Note that these are not :code:`plot` parameters as such but can be given as ``kargs`` as well as pre-setting them, as explained later on. .. image:: images/options_fig_wm.png To showcase its functionalities we will load some example data included in the Pyranges Plot package, however Pyranges provides a series of data loading options like gff, gtf, bam... (take a look at `Pyranges documentation `_ to know more!). .. code-block:: p = prp.example_data.p1 print(p) .. code-block:: index | Chromosome Strand Start End transcript_id feature1 feature2 int64 | int64 object int64 int64 object object object ------- --- ------------ -------- ------- ------- --------------- ---------- ---------- 0 | 1 + 1 11 t1 a A 1 | 1 + 40 60 t1 a A 2 | 2 - 10 25 t2 b B 3 | 2 - 70 80 t2 b B 4 | 2 + 85 100 t3 c C 5 | 2 + 110 115 t3 c C 6 | 2 + 150 180 t3 c C 7 | 3 + 140 152 t4 d D PyRanges with 8 rows, 7 columns, and 1 index columns. Contains 3 chromosomes and 2 strands. Once the set up is ready, a minimal plot can be obtained with just: .. code-block:: prp.plot(p) .. image:: images/prp_rtd_01.png The output will be an interactive plot by default, but it can also be a pdf or png file if desired (as explained later in this tutorial). The image represents an interactive plotly plot where the intervals are displayed individually because no id column has been specified. To link the intervals, an ``id_column`` must be provided. .. code-block:: prp.set_id_col("transcript_id") prp.plot(p) # or alternatively prp.plot(p, id_col="transcript_id") .. image:: images/prp_rtd_02.png Playing with limits ------------------- Since the data has only 4 genes all of them are plotted, but the function has a default limit of 25, so in a case where the data contains more genes it will only show the top 25, unless the ``max_ngenes`` parameter is specified. For example, we can set the maximum number of genes as 2. Note that in the case of plotting less genes than the total amount in the data a warning will appear. .. code-block:: prp.plot(p, max_shown=2) .. image:: images/prp_rtd_03.png Another pyranges_plot functionality is allowing to define the plots' coordinate limits through the ``limits`` parameter. The default limits show some space between the first and last plotted exons of each chromosome, but these can be customized. The user can decide to change all or some of the coordinate limits leaving the rest as default if desired. The limits can be provided as a dictionary, tuple or PyRanges object: * Dictionary where the keys should be the data's chromosome names as string and the values can be either ``None`` or a tuple indicating the limits. When a chromosome is not specified in the dictionary, or it is assigned ``None`` the coordinates will appear as default. * Tuple option sets the limits of all plotted chromosomes as specified. * PyRanges object can also be used to define limits, allowing the visualization of one object's genes in another object's range window. .. code-block:: prp.plot(p, limits={1: (None, 100), 2: (60, 200), 3: None}) prp.plot(p, limits=(0,300)) .. image:: images/prp_rtd_04.png .. image:: images/prp_rtd_05.png Coloring -------- We can try to color the genes according to the strand column instead of the ID (default). For that the ``color_col`` parameter should be used. .. code-block:: prp.plot(p, color_col="Strand") .. image:: images/prp_rtd_06.png This way we see the "+" strand genes in one color and the "-" in another color. Additionally, these colors can be customized through the ``colormap`` parameter. For this case we can specify it as a dictionary in the following way: .. code-block:: prp.plot( p, color_col="Strand", colormap={"+": "green", "-": "red"} ) .. image:: images/prp_rtd_07.png The parameter ``colormap`` is very versatile because it accepts dictionaries for specific coloring, but also Matplotlib and Plotly color objects such as colormaps (or even just the string name of these objects) as well as lists of colors in hex or rgb. For example, we can use the Dark2 Matplotlib colormap, even if the plot is based on Plotly (all dependencies must be installed): .. code-block:: prp.plot(p, colormap="Dark2") .. image:: images/prp_rtd_08.png Display options --------------- The disposition of the genes is by default a packed disposition, so the genes are preferentially placed one beside the other. But this disposition can be displayed as 'full' if the user wants to show one gene under the other by setting the ``packed`` parameter as ``False``. Also, a legend can be added by setting the ``legend`` parameter to ``True``. .. code-block:: prp.plot(p, packed=False, legend = True) .. image:: images/prp_rtd_09.png In interactive plots there is the option of showing information about the gene when the mouse is placed over its structure. This information always shows the gene's strand if it exists, the start and end coordinates and the ID. To add information contained in other dataframe columns to the tooltip, a string should be given to the ``tooltip`` parameter. This string must contain the desired column names within curly brackets as shown in the example. Similarly, the title of the chromosome plots can be customized giving the desired string to the ``title_chr`` parameter, where the correspondent chromosome value of the data is referred to as {chrom}. An example could be the following: .. code-block:: prp.plot( p, tooltip="first feature: {feature1}\nsecond feature: {feature2}", title_chr='Chr: {chrom}' ) .. image:: images/prp_rtd_10.png Overlaping intervals, +1 PyRanges and file export ------------------------------------------------- In some cases, the data intervals might overlap. An example could be when some intervals in the PyRanges object correspond to exons and others correspond to "GCA" appearances. For such cases, the ``thickness_col`` and ``depth_col`` parameters are implemented. Additionally, the :code:`plot` function accepts more than 1 PyRanges object given as list, and these inputs can be identified easily in the plot by using the ``y_labels`` parameter. For this plot the ``thickness_col`` will be used to highlight the overlapping intervals. This way some intervals will appear with a bigger height than others according to the thickness column. Note that this column can only have 2 different values, as only 2 height values are accepted. .. code-block:: # Store data p_ala = prp.example_data.p_ala p_cys = prp.example_data.p_cys print(p_ala) print(p_cys) # Plot both PyRanges using depth to differentiate prp.plot( [p_ala, p_cys], id_col="id", y_labels=["pr Alanine", "pr Cysteine"], thickness_col="trait1", ) .. code-block:: index | Start End Chromosome id trait1 trait2 depth int64 | int64 int64 int64 object object object int64 ------- --- ------- ------- ------------ -------- -------- -------- ------- 0 | 10 20 1 gene1 exon gene_1 0 1 | 50 75 1 gene1 exon gene_1 0 2 | 90 130 1 gene1 exon gene_1 0 3 | 13 16 1 gene1 aa Ala 1 4 | 60 63 1 gene1 aa Ala 1 5 | 72 75 1 gene1 aa Ala 1 6 | 120 123 1 gene1 aa Ala 1 PyRanges with 7 rows, 7 columns, and 1 index columns. Contains 1 chromosomes. index | Start End Chromosome id trait1 trait2 depth int64 | int64 int64 int64 object object object int64 ------- --- ------- ------- ------------ -------- -------- -------- ------- 0 | 10 20 1 gene1 exon gene_1 0 1 | 50 75 1 gene1 exon gene_1 0 2 | 90 130 1 gene1 exon gene_1 0 3 | 15 18 1 gene1 aa Cys 1 4 | 55 58 1 gene1 aa Cys 1 5 | 62 65 1 gene1 aa Cys 1 6 | 100 103 1 gene1 aa Cys 1 7 | 110 113 1 gene1 aa Cys 1 PyRanges with 8 rows, 7 columns, and 1 index columns. Contains 1 chromosomes. .. image:: images/prp_rtd_11.png Another way to highligh these overlapping regions playing with colors and depth.This time the plot will be exported to png instead of showing an interactive plot, for that the ``to_file`` parameter will be used. Additionally, the color appearance of the plot will be customized by providing the "dark" ``theme``. .. code-block:: # Plot both PyRanges using interval thickness to differentiate prp.plot( [p_ala, p_cys], id_col="id", y_labels=["pr Alanine", "pr Cysteine"], depth_col="depth", color_col="trait2", to_file="my_plot.png", # file size can be specified in px by to_file=("my_plot.png", (500,500)) theme="dark", ) .. image:: images/my_plot.png Show transcript structure ------------------------- Another interesting feature is showing the transcript structure, so the CDS appear as wider rectangles than UTR regions. For that the proper information should be stored in the "Feature" column of the data. A usage example is: .. code-block:: pp = prp.example_data.p2 print(pp) prp.plot(pp, thick_cds=True) .. code-block:: index | Chromosome Strand Start End transcript_id feature1 feature2 Feature int64 | int64 object int64 int64 object object object object ------- --- ------------ -------- ------- ------- --------------- ---------- ---------- --------- 0 | 1 + 1 11 t1 1 A exon 1 | 1 + 40 60 t1 1 A exon 2 | 2 - 10 25 t2 1 B CDS 3 | 2 - 70 80 t2 1 B CDS ... | ... ... ... ... ... ... ... ... 10 | 4 - 30500 30700 t5 2 E CDS 11 | 4 - 30647 30700 t5 2 E exon 12 | 4 + 29850 29900 t6 2 F CDS 13 | 4 + 29970 30000 t6 2 F CDS PyRanges with 14 rows, 8 columns, and 1 index columns. Contains 4 chromosomes and 2 strands. .. image:: images/prp_rtd_12.png Reduce intron size ------------------ In order to facilitate visualization, pyranges_plot offers the option to reduce the introns which exceed a given threshold size. For that the ``shrink`` parameter should be used. Additionally, the threshold can be defined by the user through kargs or setting the default options as explained in the next section using ``shrink_threshold``, when a float is provided as shrink_threshold it will be interpreted as a fraction of the original coordinate range, while when an int is given it will be interpreted as number of base pairs. .. code-block:: ppp = prp.example_data.p3 print(ppp) prp.plot(ppp, shrink=True) prp.plot(ppp, shrink=True, shrink_threshold=0.2) .. code-block:: index | Chromosome Strand Start End transcript_id int64 | object object int64 int64 object ------- --- ------------ -------- ------- ------- --------------- 0 | 1 + 90 92 t1 1 | 1 + 61 64 t1 2 | 1 + 104 113 t1 3 | 1 + 228 229 t1 ... | ... ... ... ... ... 16 | 2 - 42 46 t5 17 | 2 - 37 40 t5 18 | 2 + 60 70 t6 19 | 2 + 80 90 t6 PyRanges with 20 rows, 5 columns, and 1 index columns. Contains 2 chromosomes and 2 strands. .. image:: images/prp_rtd_13.png .. image:: images/prp_rtd_14.png Appearance customizations ------------------------- There are some features of the plot appearance which can also be customized, like the background color, plot border or titles. To check these customizable features and its default options values, the ``print_options`` function should be used. These values can be modified for all the following plots through the set_options function. However, for a single plot, these features can be given as kargs to the plot function (see shrink_threshold in the example above). .. code-block:: # Check the default options values prp.print_options() .. code-block:: +------------------+-------------+---------+--------------------------------------------------------------+ | Feature | Value | Edited? | Description | +------------------+-------------+---------+--------------------------------------------------------------+ | colormap | popart | | Sequence of colors to assign to every group of intervals | | | | | sharing the same “color_col” value. It can be provided as a | | | | | Matplotlib colormap, a Plotly color sequence (built as | | | | | lists), a string naming the previously mentioned color | | | | | objects from Matplotlib and Plotly, or a dictionary with | | | | | the following structure {color_column_value1: color1, | | | | | color_column_value2: color2, ...}. When a specific | | | | | color_col value is not specified in the dictionary it will | | | | | be colored in black. | | exon_border | None | | Color of the interval's rectangle border. | | fig_bkg | white | | Bakground color of the whole figure. | | grid_color | lightgrey | | Color of x coordinates grid lines. | | plot_bkg | white | | Background color of the plots. | | plot_border | black | | Color of the line delimiting the plots. | | shrunk_bkg | lightyellow | | Color of the shrunk region background. | | tag_bkg | grey | | Background color of the tooltip annotation for the gene in | | | | | Matplotlib. | | title_color | black | | Color of the plots' titles. | | title_size | 18 | | Size of the plots' titles. | | x_ticks | None | | Int, list or dict defining the x_ticks to be displayed. | | | | | When int, number of ticks to be placed on each plot. When | | | | | list, it corresponds to de values used as ticks. When dict, | | | | | the keys must match the Chromosome values of the data, | | | | | while the values can be either int or list of int; when int | | | | | it corresponds to the number of ticks to be placed; when | | | | | list of int it corresponds to de values used as ticks. Note | | | | | that when the tick falls within a shrunk region it will not | | | | | be diplayed. | +------------------+-------------+---------+--------------------------------------------------------------+ | arrow_color | grey | | Color of the arrow indicating strand. | | arrow_line_width | 1 | | Line width of the arrow lines | | arrow_size | 0.006 | | Float corresponding to the fraction of the plot or int | | | | | corresponding to the number of positions occupied by a | | | | | direction arrow. | | exon_height | 0.6 | | Height of the exon rectangle in the plot. | | intron_color | None | | Color of the intron lines. When None, the color of the | | | | | first interval will be used. | | text_pad | 0.005 | | Space where the id annotation is placed beside the | | | | | interval. When text_pad is float, it represents the | | | | | percentage of the plot space, while an int pad represents | | | | | number of positions or base pairs. | | text_size | 10 | | Fontsize of the text annotation beside the intervals. | | v_spacer | 0.5 | | Vertical distance between the intervals and plot border. | +------------------+-------------+---------+--------------------------------------------------------------+ | plotly_port | 8050 | | Port to run plotly app. | | shrink_threshold | 0.01 | | Minimum length of an intron or intergenic region in order | | | | | for it to be shrunk while using the “shrink” feature. When | | | | | threshold is float, it represents the fraction of the plot | | | | | space, while an int threshold represents number of | | | | | positions or base pairs. | +------------------+-------------+---------+--------------------------------------------------------------+ Once you found the feature you would like to customize, it can be modified: .. code-block:: # Change the default options values prp.set_options('plot_bkg', 'rgb(173, 216, 230)') prp.set_options('plot_border', '#808080') prp.set_options('title_color', 'magenta') # Make the customized plot prp.plot(p) .. image:: images/prp_rtd_15.png Now the modified values will be marked when checking the options values: .. code-block:: prp.print_options() .. code-block:: +------------------+--------------------+---------+--------------------------------------------------------------+ | Feature | Value | Edited? | Description | +------------------+--------------------+---------+--------------------------------------------------------------+ | colormap | popart | | Sequence of colors to assign to every group of intervals | | | | | sharing the same “color_col” value. It can be provided as a | | | | | Matplotlib colormap, a Plotly color sequence (built as | | | | | lists), a string naming the previously mentioned color | | | | | objects from Matplotlib and Plotly, or a dictionary with | | | | | the following structure {color_column_value1: color1, | | | | | color_column_value2: color2, ...}. When a specific | | | | | color_col value is not specified in the dictionary it will | | | | | be colored in black. | | exon_border | None | | Color of the interval's rectangle border. | | fig_bkg | white | | Bakground color of the whole figure. | | grid_color | lightgrey | | Color of x coordinates grid lines. | | plot_bkg | rgb(173, 216, 230) | * | Background color of the plots. | | plot_border | #808080 | * | Color of the line delimiting the plots. | | shrunk_bkg | lightyellow | | Color of the shrunk region background. | | tag_bkg | grey | | Background color of the tooltip annotation for the gene in | | | | | Matplotlib. | | title_color | magenta | * | Color of the plots' titles. | | title_size | 18 | | Size of the plots' titles. | | x_ticks | None | | Int, list or dict defining the x_ticks to be displayed. | | | | | When int, number of ticks to be placed on each plot. When | | | | | list, it corresponds to de values used as ticks. When dict, | | | | | the keys must match the Chromosome values of the data, | | | | | while the values can be either int or list of int; when int | | | | | it corresponds to the number of ticks to be placed; when | | | | | list of int it corresponds to de values used as ticks. Note | | | | | that when the tick falls within a shrunk region it will not | | | | | be diplayed. | +------------------+--------------------+---------+--------------------------------------------------------------+ | arrow_color | grey | | Color of the arrow indicating strand. | | arrow_line_width | 1 | | Line width of the arrow lines | | arrow_size | 0.006 | | Float corresponding to the fraction of the plot or int | | | | | corresponding to the number of positions occupied by a | | | | | direction arrow. | | exon_height | 0.6 | | Height of the exon rectangle in the plot. | | intron_color | None | | Color of the intron lines. When None, the color of the | | | | | first interval will be used. | | text_pad | 0.005 | | Space where the id annotation is placed beside the | | | | | interval. When text_pad is float, it represents the | | | | | percentage of the plot space, while an int pad represents | | | | | number of positions or base pairs. | | text_size | 10 | | Fontsize of the text annotation beside the intervals. | | v_spacer | 0.5 | | Vertical distance between the intervals and plot border. | +------------------+--------------------+---------+--------------------------------------------------------------+ | plotly_port | 8050 | | Port to run plotly app. | | shrink_threshold | 0.01 | | Minimum length of an intron or intergenic region in order | | | | | for it to be shrunk while using the “shrink” feature. When | | | | | threshold is float, it represents the fraction of the plot | | | | | space, while an int threshold represents number of | | | | | positions or base pairs. | +------------------+--------------------+---------+--------------------------------------------------------------+ To return to the original appearance of the plot, the ``reset_options`` function can restore all or some parameters. By default, it will reset all the features, but it also accepts a string for resetting a single feature or a list of strings to reset a few. .. code-block:: prp.reset_options() # reset all prp.reset_options('plot_background') # reset one feature prp.reset_options(['plot_border', 'title_color']) # reset a few features PyRanges compatibility ---------------------- To add the plot function to PyRanges objects, the function ``register_plot`` has been implemented. It allows registering :code:`plot` to enable :code:`pyranges.PyRanges.plot()` calls. Its usage is the following: .. code-block:: import pyranges_plot as prp # Register plot function and define engine simultaneously prp.register_plot("matplotlib")