Features
Description
The basic idea of Pygr is that all Python data can be viewed as a graph whose nodes are objects and whose edges are object relations (in Python, references from one object to another). This has a number of advantages.
Pygr provides one base class representing both sequences and sequence intervals (SeqPath), from which all sequence classes are derived (Sequence, SQLSequence, BlastSequence etc.). Full details are provided in the documentation.
Pygr provides a general model for interfacing with any kind of sequence alignment, and also a uniquely scalable storage system for working with huge multiple sequence alignments such as multigenome alignments. Specifically, it lets you work with an alignment both in the traditional Row-Column model (each row is a sequence, each column is a set of individual letters from different sequences, that are aligned; we will refer to this as the RC-MSA model), and also as a graph structure (known as a Partial Order Alignment, which we will refer to as the PO-MSA model). This supports ``traditional'' alignment analysis, as well as graph-algorithms, and even graph query of alignments.
The seqdb module provides a simple, consistent interface to sequence databases from a variety of different storage sources such as FASTA, BLAST and relational databases. Sequence databases are modeled (like other Pygr container classes) as dictionaries, whose keys are sequence IDs and whose values are sequence objects. Pygr sequence objects use the Python sequence protocol in all the ways you'd expect: a subinterval of a sequence object is just a Python slice (s[0:10]), which just returns a sequence object representing that interval; the reverse complement is just -s; the length of a sequence is just len(s); to obtain the actual string sequence of a sequence object is just str(s). Pygr sequence objects work intelligently with different types of back-end storage (e.g. relational databases or BLAST databases) to efficiently access just the parts of sequence that are requested, only when an actual sequence string is needed.
The coordinator module provides a simple system for running a large collection of tasks on a set of cluster nodes. Full details are provided in the documentation.
System Requirements
OS-Independent
Installation
Installation - Unzip, untar, run 'python setup.py install'
Purpose
Pygr can be used to represent data as a graph structure that is easily queried. For example, finding a set of exons that satisfy the following relationship (exon 1 is either connected directly to exon 3, or connected to exon 2 [which is then connected to exon 3]) using a traditional SQL database schema might require a six-way (or more) JOIN, which can inflate computation times to infeasible amounts. Using Pygr, the same query can be represented as {1:{2:None, 3:None}, 2:{3:None}}. Other included modules provide powerful and convenient interfaces for working with sequence alignments, sequence data stored in databases, and managing large jobs on cluster nodes.