CMPS 163: Business Analytics
Introduction
We now turn to classification. As was discussed earlier, classification is predicting a categorical variables, for example, whether someone will buy your product or whether you will pass this course. A lot of business analytics problems can solved with classification models, of which there are many, and we will look at one in particular, namely Naïve Bayes. One complication that we have to deal with in this module is that we will use text as input to the Naïve Bayes model. The problem with this is that text does not consist of clear cut features (model inputs), such as was the case for the deals for k-means, so we have to do some preprocessing on the text to make sure we can utilize Naïve Bayes. It also turns out that we need some probability theory for this, which can be a little challenging to understand, but implementing all of this in Excel will turn out to be relatively straightforward.
The case study we will look at involves Mandrill, which is both a MailChimp application (the author works for MailChimp) as well as a monkey. We will see how Naïve Bayes can be used to automatically tell the difference between tweets about the application and the monkey, which is really cool!
Module Objectives
- Recognize that Naïve Bayes is a classification approach
- Describe how Naïve Bayes works
- Paraphrase the role of probability in Naïve Bayes
- Apply Excel functions to process text
- Implement Naïve Bayes in Excel
Learning Resources
- Module 5 Readings: First half of Chapter 3
- Module 5 Slides: First half of Chapter 3
Learning Activities
- Module 5 Assignment
For Further Study
- Read more about classification on Wikipedia
- Read more about naïve bayes on Wikipedia