Redirect page
(Experiments and notes)
 
(57 intermediate revisions by one other user not shown)
Line 1: Line 1:
== Experiments and notes ==
+
#REDIRECT: [[Lecture 4 - Bayes Classification_OldKiwi]]
*[[Bayes Classification: Experiments and Notes_OldKiwi]]
+
 
+
Maybe you will think that the categorization accuracy of Bayes system goes down as the number of features goes up. The fact is that as we increase the size of the feature vector we have more elements to describe and categorize a class. Therefore, more features should not be bad. The issue is that if we add features with high correlation between classes, the information provided by those features is almost useless. The classes at those features are not separable, so we cannot make a good decision based on highly correlated features. Consequently, when we are characterizing classes by their features, we need to select features that have poor correlation between the classes. Then, the information in the feature vector is meaningful for our categorization. The more meaningful features we have the more accurate the system will be.
+
 
+
The setup for the experiment should be the following. We only have two classes, Class 1 and Class 2. You can test your algorithms for different priori probabilities for Class 1 and Class 2. However if you are working with synthetic data, it is better to have equal priori probabilities to analyze the data. First, generate Gaussian data to train and test the system. We generated N = 10<sup>5</sup> samples. Also, the variance for each feature was the same and the mean of each class feature changed depending on the hypothesis we wanted to test. Then for each data set, compute the probability model parameters—mean and covariance matrix. Consequently, use Bayes decision formula to categorize the data. Kept track of the accuracy of the system for each feature vector size. The size of our feature vector was in the range of 1 to 20 features. Our measure of accuracy was the number of samples classified correctly divided by the total number of samples for the class (N = 10<sup>5</sup>).
+
 
+
=== Experiments with highly correlated data ===
+
 
+
[[Image:Hcd_OldKiwi.jpg|none|500px|left|thumb|Figure 1]]
+
 
+
 
+
[[Image:Hca_OldKiwi.jpg|none|500px|left|thumb|Figure 2]]
+
 
+
As you can see in Figure 1 the classes are highly correlated. In Figure 2, you can observe that as we increase the size of the feature vector the accuracy converges close to 50%. The brute force approach is to classify the sample in either class, for an expected accuracy of 50%. Consequently, data that is highly correlated, increasing the feature vector size does not help. We need to recall that the accuracy results directly depend on the data we are analyzing. Therefore, we can expect the accuracy to goes up or down for different data sets. Although, we can conclude that the accuracy will be close to 50% for very similar data distributions.
+
 
+
=== Experiments with poorly correlated data ===
+
 
+
[[Image:Hcd_OldKiwi.jpg|none|500px|left|thumb|Figure 1]]
+
 
+
 
+
[[Image:Hca_OldKiwi.jpg|none|500px|left|thumb|Figure 2]]
+
 
+
=== Experiments with highly/poorly correlated data===
+
 
+
[[Image:Hcd_OldKiwi.jpg|none|500px|left|thumb|Figure 1]]
+
 
+
 
+
[[Image:Hca_OldKiwi.jpg|none|500px|left|thumb|Figure 2]]
+
 
+
== Related links ==
+
[http://balthier.ecn.purdue.edu/index.php/ECE662 ECE662 Main Page]
+
 
+
[http://balthier.ecn.purdue.edu/index.php/ECE662#Class_Lecture_Notes Class Lecture Notes]
+

Latest revision as of 10:37, 17 March 2008

Alumni Liaison

ECE462 Survivor

Seraj Dosenbach