Understanding Ambiguity Codes page, libSlice provides mulitple consensus calling algorithms to choose from. The default consensus calling algorithm uses the "Conic Ambiguity Model", which uses the cumulative quality values to compute a consensus. It was created to address the limitation of the other models and to call the consensus of slices consistently with an expert panel.
The component with the largest cqv is also called the non-ambiguous consensus of the slice. In this example, the ratio is 60:20, or 3:1 between A and C, so the consensus is A, as is the non-ambiguious consensus. If the ratio had been 60:50, the consensus would be M. If there are more than two components with non-zero cqv, then the algorithm iterates in order of cqv expanding the ambiguity code to include each component that is within the 2:1 cutoff in comparison to the non-ambiguous consensus.
The advantage of the Conic model is that it directly addresses the limitations of the other models in that it is sensitive to both the quality values of the slice, and the depth of coverage. High quality agreement is rewarded more strongly than low quality agreement, and many low quality disagreeing bases can balance a few high quality agreements. In our testing, we have found that it significantly outperforms the other models by calling the consensus in agreement with an expert panel for a much higher percentage of slices.
The model is called "Conic" because under a geometric interpretation, a multidimensional cone defines the region of ambiguity. In this interpretation, the cqv of each component is represented by perpendicular vectors. A vector sum is then performed, and if the resultant vector falls within the region of ambiguity, the consensus is ambiguous. From symmetry, the region of ambiguity is defined uniformly from perfect ambiguity by a parameterized angle, which when rotated in a higher dimensional space defines a cone.
At 0 degrees, only perfectly ambiguous slices would be called with an ambiguity code; at 45 degrees all slices are ambiguous. A large effort of the algorithm tuning was to find the angle that maximizes agreement with expert consensus callers. The angle that maximizes the agreement (36.86 degrees) was found to coincidentally define a region where the ratio of cumulative quality values is 2:1. For the common case of only two components present in a slice, the model defines the region of ambiguity as being a triangular zone near 45 degrees.
Conic Ambiguity Calculation
The Conic model outperformed the other ambiguity models by 20%. It consistently agreed with the expert panel in nearly every slice. The 6 slices that it did not agree with the experts were extreme borderline cases, and were evenly split between calling too many ambiguity codes and too few. In no cases did the Conic model call one unambiguous base where the experts decided the other unambiguous base should be the consensus in the conflicting slices. The Consensus Calling Worksheet of 100 slices used for validation is available, as are the results. It is also worth pointing out that the worksheet slices were directed towards borderline cases because all of the models handle the majority of slices where a single component dominates the slice correctly.
$Date: 2005/07/29 02:55:17 $