# Module 3: Cluster Analysis I

## Introduction

Cluster analysis, or clustering, is a very powerful exploratory approach to make sense out of data. Its main strength is that it can find clusters, a natural grouping of ‘things’, that can give a lot of insight into a business problem. For example, in the book chapter we will look at a fictional company that sends out wine deals that customers can make use of. By grouping customers based on which deals they take, it is possible to define customer segments, for example, it will turn out that there is a customer segment that is very interested in Pinot Noir. This insight can be turned into an actionable insight by placing the wine deals that a customer is most likely to buy at the top of the email in future email blasts (personalization), thereby increasing the number of responses. All of this is derived directly from the data but requires the use of a distance function, i.e., a metric that defines how similar two customers are. Examples of distance functions are the euclidean distance and cosine similarity, among others. The distance function is used by a clustering algorithm, known as k-means, that automatically divides the customers into a set of k different groups based on the distance function.

## Module Objectives

• Explain how clustering works
• Distinguish between unsupervised and supervised approaches
• List examples of clustering
• Identify objects for clustering (e.g., students)
• Identify features for clustering (e.g., number of classes attended)
• Select the distance metric (e.g., Euclidean distance)
• Finding clusters in Excel through optimization
• Interpret the discovered clusters
• Explain how k-means works
• Interpret Voronoi diagrams

## Learning Resources

• Module 3 Readings: First half of Chapter 2
• Module 3 Slides: First half of Chapter 2

## Learning Activities

• Module 3 Assignment

## Video

Using the solver for cluster analysis: