Programming Assignment 4

In the first programming assignment, we used a network of TLUs to classify card indices according to rank and suit. For example, given a card described by the index 13, the network would output the suit class (Spade) and the rank class (Ace). Another way to classify data is via Bayes rule. One very simple example of a Bayesian approach to classification is the so-called "Naive Bayes Classifier". For this programming assignment, we want you to implement a Naive Bayes Classifier and use it to predict whether an individual is likely to play tennis given a set of weather conditions. (Data borrowed from Tom Mitchell).

Data

The data for this assignment consists of a set of training instances, each consisting of values for the following five features:
  1. Outlook may take one of three values: sunny, overcast, rain
  2. Temperature may take one of three values: hot, mild, cool
  3. Humidity may take one of two values: high, normal
  4. Wind may take one of two values: strong, weak
  5. Play_Tennis may take one of two values: yes, no
The 14 training instances are organized in this file such that each line represents one instance. The first item on each line corresponds to the Outlook feature, the second item corresponds to the Temperature feature and so on. The last item on each line represents the Play_Tennis feature, whose value we are regarding as the class label of the instance. You want to use these instances to form a decision rule which you can use to make predictions of Play_Tennis for any possible combination of the other four features. So the classes you want to try to distinguish are those for which someone will play tennis, and those for which someone will not play tennis. We call the first four features of an instance the input features, and the last feature the output feature.

Naive Bayes Classifier

Using Bayes rule to reason about a domain is often quite complex. Large tables of probabilities are often generated as the relationship between the various domain variables are taken into account. However, the Naive Bayes Classifier relies on the simplifying assumption that all the input features are conditionally independent of given the desired output value, which in this case is the value of Play_Tennis. Instead of performing complex calculations to find the necessary probability values, Naive Bayes uses only a simple product. (This approach is called "naive" since this conditional independence assumption is rarely actually satisfied in practice. Nevertheless, this method can work surprisingly well in some problems). To predict the class value of an instance, Naive Bayes must first calculate two sets of probabilities.
  1. Class Priors The prior probability of each class represents how frequently each class occurs in the data. For this assignment, there are two classes depending on the value of Play_Tennis. The prior for Play_Tennis = yes is estimated by counting the number of training instances with Play_Tennis = yes, and dividing it by the total number of training instances (14 in this case). Thus, the prior is a ratio of the number of instances in each class to the total number of instances. Note that the sum of all class prior probabilities should be 1.
  2. Input Conditionals The conditional probabilities reflect the relationship between values of a particular input feature and the output feature. For example, in the data for this assignment, we would need to calculate the conditional probability P[Outlook = sunny | Play_Tennis = yes]. This conditional probability may be estimated by counting the number of instances in which both Outlook = sunny and Play_Tennis = yes and dividing this by the total number of instances having Play_Tennis = yes. So from the data given, the estimate of P[Outlook = sunny | Play_Tennis = yes] is 0.22. Note that the conditionals for a given input feature and a given value of Play_Tennis will always sum to 1. Thus, P[Outlook = sunny | Play_Tennis = yes] + P[Outlook = overcast | Play_Tennis = yes] + P[Outlook = rain | Play_Tennis = yes] = 1.
After the probabilities have been calculated, Naive Bayes can be used to predict the class of a new instance by calculating which class is most likely given the input. The likelihood of each class is found by multiplying together the conditional probabilities corresponding to each input feature and then multiplying by the class prior. For example, suppose we want to predict the class of an instance with

Outlook = rain, Temperature = cool, Humidity = high, Wind = weak

Then we multiply

P[Outlook = rain | Play_Tennis = yes] * P[Temperature = cool | Play_Tennis = yes] * P[Humidity = high | Play_Tennis = yes] * P[Wind = weak | Play_Tennis = yes] * P[Play_Tennis = yes] = Liklihood_yes

and

P[Outlook = rain | Play_Tennis = no] * P[Temperature = cool | Play_Tennis = no] * P[Humidity = high | Play_Tennis = no] * P[Wind = weak | Play_Tennis = no] * P[ Play_Tennis = no] = Liklihood_no

Now simply choose the outcome whose likelihood has the highest value. Note that the values Likelihood_yes and Likelihood_no are not probabilities. They may be turned into probabilities by normalizing. For example, P_yes = Likelihood_yes / (Likelihood_yes + Likelihood_no) and P_no = Likelihood_Lno / (Likelihood_yes + Likelihood_no).

Assignment

As usual, all materials should be formatted by the a2ps program. Turn in:
  1. Your code.
  2. A printout of the class prior probabilities, P[Play_Tennis = yes] and P[Play_Tennis = no] calculated from the data given above.
  3. A printout of the input conditional probabilities calculated from the data above. For each input feature Outlook, Temperature, Humidity, Wind, you should calculate the conditional probabilities for each possible feature value and output value. For example, for the Wind feature, you must calculate P[Wind = weak | Play_Tennis = yes], P[Wind = strong | Play_Tennis = yes], P[Wind = weak | Play_Tennis = no], and P[Wind = strong | Play_Tennis = no]. Make sure you label your output so that I can understand what each probability means!
  4. After you have calculated all of the prior and conditional probabilities from the data, use Naive Bayes to predict the value of Play_Tennis from the input features (the first four features) of each instance. Turn in a printout of the predicted probability of each class for all of given data. For example, given the instance Outlook = rain, Temperature = cool, Humidity = high, Wind = weak, print P_yes and P_no. Again, make sure each probability is labeled!
  5. Answers the following questions: (1) Look at the probabilities produced by Naive Bayes and compare the results with the actual class of the instances. How well did Naive Bayes predict the class of the data? (2) Suppose we had left some of the data out when we calculated the conditional probabilities. How would this affect Naive Bayes? Do you see any potential problems?

David J. Stracuzzi
Last modified: Tue May 1 10:56:14 EDT 2001