Module 5: Classification I

CMPS 163: Business Analytics

Introduction

We now turn to classification. As was discussed earlier, classification is predicting a categorical variables, for example, whether someone will buy your product or whether you will pass this course. A lot of business analytics problems can solved with classification models, of which there are many, and we will look at one in particular, namely Naïve Bayes. One complication that we have to deal with in this module is that we will use text as input to the Naïve Bayes model. The problem with this is that text does not consist of clear cut features (model inputs), such as was the case for the deals for k-means, so we have to do some preprocessing on the text to make sure we can utilize Naïve Bayes. It also turns out that we need some probability theory for this, which can be a little challenging to understand, but implementing all of this in Excel will turn out to be relatively straightforward.

The case study we will look at involves Mandrill, which is both a MailChimp application (the author works for MailChimp) as well as a monkey. We will see how Naïve Bayes can be used to automatically tell the difference between tweets about the application and the monkey, which is really cool!

Module Objectives

Recognize that Naïve Bayes is a classification approach
Describe how Naïve Bayes works
Paraphrase the role of probability in Naïve Bayes
Apply Excel functions to process text
Implement Naïve Bayes in Excel

Learning Resources

Module 5 Readings: First half of Chapter 3
Module 5 Slides: First half of Chapter 3

Learning Activities

Module 5 Assignment

For Further Study

Read more about classification on Wikipedia
Read more about naïve bayes on Wikipedia

Information Technology

Point Park University