Module 3: Cluster Analysis I

CMPS 163: Business Analytics

Introduction

Cluster analysis, or clustering, is a very powerful exploratory approach to make sense out of data. Its main strength is that it can find clusters, a natural grouping of ‘things’, that can give a lot of insight into a business problem. For example, in the book chapter we will look at a fictional company that sends out wine deals that customers can make use of. By grouping customers based on which deals they take, it is possible to define customer segments, for example, it will turn out that there is a customer segment that is very interested in Pinot Noir. This insight can be turned into an actionable insight by placing the wine deals that a customer is most likely to buy at the top of the email in future email blasts (personalization), thereby increasing the number of responses. All of this is derived directly from the data but requires the use of a distance function, i.e., a metric that defines how similar two customers are. Examples of distance functions are the euclidean distance and cosine similarity, among others. The distance function is used by a clustering algorithm, known as k-means, that automatically divides the customers into a set of k different groups based on the distance function.

Module Objectives

Explain how clustering works
Distinguish between unsupervised and supervised approaches
List examples of clustering
Identify objects for clustering (e.g., students)
Identify features for clustering (e.g., number of classes attended)
Select the distance metric (e.g., Euclidean distance)
Finding clusters in Excel through optimization
Interpret the discovered clusters
Explain how k-means works
Interpret Voronoi diagrams

Learning Resources

Module 3 Readings: First half of Chapter 2
Module 3 Slides: First half of Chapter 2

Learning Activities

Module 3 Assignment

Video

Using the solver for cluster analysis:

For Further Study

Read more about cluster analysis on Wikipedia
Read more about k-means on Wikipedia
Read more about Voronoi diagrams on Wikipedia

Information Technology

Point Park University