.. _methods: Visualization Methods ===================== There are a variety of ways to visualize DNA sequences in two dimensions. Squiggle provides its own novel visualization method as well as implementations of various other methods. Each method captures a different aspect of a sequence, so it is highly recommended to try using multiple methods in order to get a feel for a sequence. Squiggle -------- Squiggle's DNA visualization method is based on the `UCSC .2bit format `_ and the `Qi et. al Huffman coding method `_. In essence, a DNA sequence is first converted into binary using the 2bit encoding scheme that maps T to 00, C to 01, A to 10, and G to 11. For example:: ATGC becomes:: 10001101 Then, starting at the origin, for each bit, the following vectors are layed end to end: .. raw:: html :file: figures/squiggle_vectors.html This mapping has the effect of giving each nucleotide a distinctive shape: .. raw:: html :file: figures/squiggle_bases.html This encoding method has several handy features: - Based on an open, common bioinformatics format. - No degeneracy in the encoding (an encoding can only map to one sequence and vice versa). - The overall GC-content can be inferred from at a glance based on whether the endpoint of the graph is above or below zero. - `Regions inside the gene with varying GC-content `_ can be seen as peaks and valleys. - Is limited to quadrants I and IV and is a function - The :math:`x`-axis corresponds directly with nucleotide position - Supports ambiguous nucleotides (which are displayed as horizontal lines) For an example, let's look at the human β-globin gene using the squiggle method:: $ squiggle example_seqs/human_HBB.fasta .. raw:: html :file: figures/human_hbb.html Gates ----- In `Gates's method `_, DNA sequences are converted into 2D walks in which Ts, As, Cs, and Gs are up, down, left, and right, respectively. This gives each sequence a "shape." However, there is degeneracy, meaning that a visualization is not necessarily unique. For example, ``TGAC`` is a square (up, right, down, and left), but so is ``GTCA`` (right, up, left, down). To see an example of Gate's method, we'll again look at human β-globin:: $ squiggle example_seqs/human_HBB.fasta --method=gates .. raw:: html :file: figures/human_hbb_gates.html Yau --- `Yau et. al's method `_ uses unit vectors with upward vectors indicating pyrimidine bases (C and T) and downward vectors indicating purine bases (A and G). Similar to Squiggle, this method has no degeneracy. Specifically, :math:`A\rightarrow\left(\frac{1}{2},-\frac{\sqrt{3}}{2}\right)`, :math:`T\rightarrow\left(\frac{1}{2},\frac{\sqrt{3}}{2}\right)`, :math:`G\rightarrow\left(\frac{\sqrt{3}}{2}, -\frac{1}{2}\right)`, :math:`C\rightarrow\left(\frac{\sqrt{3}}{2}, \frac{1}{2}\right)`. .. Warning:: The :math:`x`-coordinate in Yau's method is not equivalent to base position. .. raw:: html :file: figures/yau_bases.html It produces a visualization of β-globin like this:: $ squiggle example_seqs/human_HBB.fasta --method=yau .. raw:: html :file: figures/human_hbb_yau.html Yau-BP ------ Unique to Squiggle is the Yau-BP method, a slight modification of Yau's method that ensures that the :math:`x` axis is equivalent to the base position. It preserves that salient feature of the method, which is the purine/pyrimidine split. .. raw:: html :file: figures/yau-bp_bases.html Randić and Qi ------------- `Randić et al. `_ and `Qi and Qi `_'s methods are similar to `tablature `_, with a different base (or 2-mer in the case of Qi's method) assigned to each :math:`y` value. The best way visualize it is through an example. Let's look at the Randić visualization of ``GATC``: .. raw:: html :file: figures/randic_example.html Look's pretty good. However, this visualization method isn't well suited to long sequences, as we'll see when we look at β-globin:: $ squiggle example_seqs/human_HBB.fasta --method=randic .. raw:: html :file: figures/human_hbb_randic.html Qi's method produces very similar results, just with a much larger range of :math:`y` values:: $ squiggle example_seqs/human_HBB.fasta --method=qi .. raw:: html :file: figures/human_hbb_qi.html