libSlice v1.20

Introduction

libSlice is a standardized implementation for quality calculations on slices of reads. It has been implemented in optimized C code for maximum speed and control. Currently, libSlice contains functions for calculating quality class and consensus quality value information.

Algorithms

Consensus Base Calling and Quality Value Calculation

libSlice has routines for choosing the most likely base call represented by a slice, known as the consensus of the slice. In addition to the consensus, a slice has an associated consensus quality value as a measure of the probability that the called consensus is accurate. The consensus quality value is the negative log score that the probability of the consensus is inaccurate given the quality values of the bases in the slice, similiar to the quality value of a base in a read.

Several statistical models for calling the consensus of a slice are available within libSlice. By default, the consensus is called accoring to the Conic Ambiguity Model, which computes the consensus based on the cumulative quality value for each component in relation to the total quality value for the entire slice. Details on all of the model are available on the Understanding Ambiguity Codes page under "Additional Information".

The consensus quality value is calculated using the procedure described in Churchill, G.A. and Waterman, M.S. "The accuracy of DNA sequences: Estimating sequence quality." Genomics 14, pp. 89-98 (1992). The essential step in the calculation is to use Bayes rule to calculate for each of A,C,G,T and gap, five probabilities of error that the given base is the consensus, given the quality values in the slice. Each indicates the probability of error that the associated base is the consensus of the slice. The maximum value is called the consensus quality value and represents the probability that the unambiguous consensus is correct.

Quality Class Calculation

As a guide to the quality of a slice, the quality class of a slice can be computed using the same attribute tests as TIGR Assembler or TIGR Editor. A variety of attributes are tested on the slice, and the resultant quality class is a single number representation of the quality of those attributes. Quality classes range from 1 to 23, decreasing from high to low quality. Quality classes 1 through 8 are considered high quality slices; 9 through 23 are considered low quality slices.

The attributes considered include if there are conflicting reads against the consensus, if there are reads in both the 3' and 5' directions, if the consensus is ambiguous, and the quality of the individual reads. The single number quality class designation combines all of these attributes to give an estimate as to the quality of the slice. This number can then be used to mark possible sites for editing or resequencing. More information is available on the Understanding Quality Classes page under "Additional Information".

Authors

Designer: Pawel Gajer, The Institute for Genomic Research

Developer: Michael Schatz, Center for Bioinformatics and Computational Biology, University of Maryland

Special thanks: Martin Shumway, The Institute for Genomic Research, for original implementation of getConsQC in cutAsm.

$Date: 2005/07/29 20:40:45 $