CodonCode Corporation offers executable versions of the programs Phred, Phrap, and Cross_match for Windows, Mac OS X, Linux, and Unix. Phred, Cross_match, Phrap, Swat, and Consed were developed by Dr. Phil Green and co-workers at the University of Washington in Seattle. CodonCode Corporation has acquired the distribution rights for Phred, Phrap, Cross_match, Swat, and Consed.This page gives a brief description of Phred, Cross_match, and Phrap. The Phred-Phrap programs were developed for use by automated scripts, and therefore do not have a graphical user interface. For scientists who prefer to use Phred and Phrap from a graphical user interface on OS X or Windows, we offer the sequence assembly and editing software CodonCode Aligner. CodonCode Aligner makes basecalling with Phred and sequence assembly with Phrap easy, and also offers functions for contig editing and mutation detection.
Phred: Better Base Calling
Phred is a base-calling program for DNA sequence traces. The program was developed by Drs. Phil Green and Brent Ewing, and is copyrighted by the University of Washington. It is widely used by the largest academic and commercial sequencing laboratories. Two major reasons why Phred is used by leading sequencers are:
- High base calling accuracy. In an initial study, Phred achieved a 40-50% lower error rates than ABI software on large test data sets (Ewing, Hillier, Wendl & Green (1998), Genome Research 8: 175-185).
- Error probabilities for each base call. The highly accurate error probablilities Phred calculates for each base enable increase automation of the sequencing process, for example:
- More accurate consensus sequences.
- Automatic identification of areas that require "finishing" efforts.
- Drastically lower false positive error rates in mutation detection.
- Effective quality control immediately after sequence production.
- Quantitive benchmarking of different sequencing methods and protocol changes.
- Identification of repeat sequences in during assembly.
Phred was developed for the Human Genome Project, were large amounts of sequence data were processed by automated scripts; therefore, Phred's processing options are set by command line parameters. For Windows and OS X users who would like to use Phred through an easy-to-use graphical user interface, we have developed the sequence analysis software CodonCode Aligner. CodonCode Aligner greatly simplifies using Phred for base calling and Phrap for sequence assembly, and also offers a number of additional functions often needed in DNA sequencing projects, for example contig alignment and editing, reference sequence alignments, and mutation detection.
For corporate users who wish to use Phred-Phrap for larger-scale projects, we offer executable versions of Phred for Mac OS X, Linux, and Unix. Our dedicated support team has extensive experience in large-scale DNA sequencing projects.
Academic users who plan to use Phred from scripts or the command line can obtain source code for Phred-Phrap free of charge directly from the authors. For academic users who prefer a graphical user interface and purchase licenses for CodonCode Aligner, use of the workstation versions of Phred and Phrap that are included with CodonCode Aligner is free of charge.
To learn more about how Phred works or about Phred quality values, visit our PHRED page.
Phrap: Better Sequence Assemblies
Phrap is a leading program for DNA sequence assembly. Phrap is routinely used in some of the largest sequencing projects in the Human Genome Sequencing Project and in the biotech industry. Some of Phrap's feature include:
- Fast assemblies. Assemblies of cosmid- to BAC sized projects with several hundred to two thousand reads typically take only minutes to complete on high-powered workstations or personal computers.
- Accurate consensus sequences. Phrap uses Phred's quality scores to determine highly accurate consensus sequences. Phrap examines all individual sequences at a given position, and generally uses the highest quality sequence to build the consensus - similar to the way scientists would correct consensus sequences during "contig editing". Compared to simple majority rules use in older sequence assembly programs, Phrap's approach can give significantly more accurate consensus sequences, especially in regions of low coverage or regions of systematic errors like compressions.
- Consensus quality estimates. Phrap uses the quality information of individual sequences to estimate the quality of the consensus sequence. In addition, Phrap uses available information about sequencing chemistry (dye terminator or dye primer) and confirmation by "other strand" reads in estimating the consensus quality. This often allows scientists to ignore random errors, and to focus finishing efforts exclusively onto regions where the data quality is insufficient. Consensus quality estimates can also be very helpful in mutation detection by DNA sequencing (see Rieder, Taylor, Tobe & Nickerson (1998), Nucleic Acids Research 26: 967-973).
- Ability to assemble very large projects. Phrap has been used routinely to assembly bacterial genomes sequenced by the "shotgun" approach, where each project contained tens of thousands of reads. Smaller bacterial genomes (2 million bases or less) could often be assembled in less than three hours.
- Improved identification and handling of repeats. Phrap uses quality scores to estimate whether discrepancies between two overlapping sequences are more likely to arise from random errors, or from different copies of a repeated sequence. For repeats with 95 to 98% identity (like human Alu sequences) and high quality sequence data, this typically yields correct assemblies.
Cross_match: Fast DNA Sequence Comparisons and Vector Screening
Cross_match is a program for fast comparisons of DNA sequences that uses the same algorithms as Phrap. For example, the comparison of several hundred thousand bases of "raw" sequence to the sequence of an entire BAC typically takes less than one minute. Within the Phred - Phrap system, Cross_match is typically used for vector screening. Other common uses of Cross_match include:
- Identification of overlaps between contig ends after assembly with Phrap or other assembly programs.
- Identification of potential repeat sequences in assemblies.
- Generation of error summaries and lists after completion of sequencing projects.
- Estimation of vector contamination in newly created libraries.