Slice Tools
  • Home
  • SourceForge Page


  • libSlice
  • Home
  • Modules
  • Files
  • Examples
  • Additional Information


  • SourceForge.net Logo
     

    Frequently Asked Questions

    A number of common questions and concerns have come up while moving away from how TIGR Editor (TED) and TIGR Assembler (TA) calculate the consensus and quality classes. This page is an attempt to answer the most frequently asked questions regarding this change.

    FAQs

    General Questions

    1. What is a slice?

      A slice is a one base wide cut of an assembly. It consists of 0 or more reads stacked "vertically" at an offset. For the purposes of a slice calculation, each read contributes a base with an associated quality value and direction (forward or reverse).

    2. How are slices used?

      Cloe transmits slices to SliceService/libSlice, which calculates the quality class and consensus for each slice. Slices are a convenient representation for these calculations because they provide all of the information necessary for the calculation without the burden of maintaining an alignment from the traditional "horizontal" view of an assembly.

    3. What is the relationship between Cloe, SliceService, and libSlice?

      Cloe is the front-end tool users can use for viewing and editing assemblies. Cloe communicates slice information across the network to the SliceService for calculating quality classes and consensus base calling. SliceService uses libSlice to perform the actual calculations, and then returns the results back across the network to Cloe to display to the user.

      At the lowest level, the quality class and consensus base calling calculations are performed by libSlice. SliceService acts as an intermediate between the Cloe request and libSlice similar to how Cloe acts as an intermediate between the requests by the users and the assembly data.

    Consensus Calling and Ambiguity Codes

    1. How is the consensus calculated?

      The consensus of a slice represents the most likely base represented by a slice. It is calculated by choosing the base that has the greatest cummulative quality value in the slice.

      In the event of tie or near tie, an ambiguity code will be picked representing the most likely set of bases represented by the slice.

    2. How are ambiguity codes calculated?

      There are several different models for calculating the consensus, and consquently, calculated ambiguity codes. The model that is currently used is the Conic Ambiguity Model.

      The essense of the model is to compare the cumulative quality value for the most likely base in the slice and comparing its ratio to the next most likely. If there is more than a 2:1 ratio, than the slice is ambiguous.

    3. Where did this method for consensus calling come from?

      The original model from calling the consensus of a slice is calculated according to a modified procedure from Churchill, G.A. and Waterman, M.S. "The accuracy of DNA sequences: Estimating sequence quality." Genomics 14, pp. 89-98 (1992).

      The details and rationale of the current model are available on Understanding Ambiguity Codes under "Additional Information".

    4. Which ambiguity codes are possible?

      The entire alphabet of IUPAC codes are possible as the consensus of a slice, including V, H, D, and B, which represent an unlikely three-way tie. In addition extended ambiguity codes are encoded as lowercase letters.

      Uppercase ambiguity codes represent the IUPAC code, and lowercase letter represent the IUPAC code plus gap, ie, (A-) is 'a', (AC-) is 'm'.

      A reference of all codes and their complement is available under "Additional Information".

    5. What is the significance of lower case letters?

      libSlice generates what is know as extended ambiguity codes, which is the set of IUPAC codes, plus one additional code for each IUPAC code to represent the IUPAC code plus gap, ie (A-) is 'a', and (AC-) is 'm'. This treats ambiguities with gaps the same as ambiguities with other bases. Lowercase letters were chosen so that there would be an obvious mapping back to IUPAC codes.

      If you have scripts that search for ambiguities based strictly on the case of the letter, you will need to update your script to also search for the uppercase alphabet of IUPAC ambiguity codes, or search for quality class 22, which is used to indicate ambiguous slices.

    6. Why does libSlice sometimes calculate a different consensus than TA & TED would have?

      libSlice uses an updated method for determining the consensus of a slice and fully utilizes the quality value information given. The method used by TA & TED is more sensitive to the number of reads supporting a base than the quality values of the bases. The consensus as calculated by libSlice generally be in the sense that it is defined using a well defined and understood method.

    Quality Classes

    1. How are quality classes calculated?

      Quality classes are a single number representation of the set of attributes that a slice has. The attributes a slice has are determined by searching the slice for certain interesting features. The features considered include the quality and direction of the reads in the slice, and the value of the consensus.

      Quality classes should be used as a rough guide to the overall quality of the slice. Quality classes 1-8 are typically considered high quality, 9-23 are considered low quality. Slices with high quality class numbers are strong candidates for review.

    2. What do the different quality classes mean?

      In general, quality classes calculated by libSlice represent the same attributes as TA & TED. A few refinements were made when the quality classes were explicitly defined by attributes by libSlice. In addition, quality class 0 (reported by SliceService as 21) was added as an additional quality class.

      A full description of the quality classes generated by libSlice is available on the Understanding Quality Classes page under "Additional Information". As a convenience, a quick reference guide showing all possible quality classes and attributes is also available from that page. Whenever there are two quality classes that have the same number, such as 23a and 23b, they will both be considered to be of the same quality class, 23 in this case.

    3. What does "ATTR_" mean in the libSlice Quality Class documentation?

      "ATTR_" is used to indicate the term as a specific attribute associated with the quality classes. For instance ATTR_Quality_Same_Strand has a very precise meaning and always implies ATTR_Quality_Strand.

      Some of the attributes in the quality classes have a "Not" prefix, which means the attribute is not present in the slice. This was necessary to precisely define class 23b which means that there are no reads in the slice that support the consensus (this is very strong evidence the consensus has been miscalled).

    4. What happened to quality class 21?

      TA and TED used quality class 21 to denote the slice was ambiguous with single coverage. This means that the consensus was ambiguous, but there is only a single supporting read in the slice to support the consensus. This is an impossible quality class for libSlice to generate because it always takes at least 2 conflicting bases to create an ambiguity code.

      Normally, libSlice uses quality class 0 to indicate that the consensus is a gap and there are 0 or more other gaps in the slice. libSlice also uses quality class 0 to indicate that there is an internal problem calculating the quality class.

      Unfortunately, Cloe does not understand quality class 0 as indicating low quality. Therefore, whenever libSlice reports quality class 0 to the SliceService, the SliceService transforms this to quality class 21. Cloe understands this to to mean it is a low quality slice and will flag it as such. Quality class 21 was chosen for this transformation, because no slices would normally have quality class 21 as explained above.

    5. Why does libSlice sometimes calculate a different quality class than TA & TED would have?

      In almost all cases, libSlice will report the same quality class as TA & TED. The rare differences that do occur are generally because the quality class definitions in libSlice have been precisely defined with attributes. The differences that do exist should not be cause for alarm because typically the exact value of a quality class is not interesting, but only if it is a high or low quality class.


    $Date: 2005/07/29 02:55:17 $