How to segment based on customer behavior data and AI/ML ?
As not all people are the same, segmentation helps divide your customers into various groups and then be able to drive specific marketing messages. Every business defines marketing segments as a set of business rules that are usually static and do not accommodate for changing customer behaviour. But what if we could?
Meet Happy Credit Cards
For our business scenario today, lets assume a credit card enterprise named Happy Credit Cards. Competing with the likes of American Express, HCC aims to bring exceptional experiences for its customers. The business is trying to understand its various customer segments that would be core to the business and be able to drive better decision making.
To find these core customer segments we have credit card usage data from the last couple of months sourced from various systems.
How does AI/ML help with segmentation?
In the AI/ML world ‘clustering’ is the process of identifying relationships between data attributes and be able to group them together. For marketing use cases, it identifies relationship between customer attributes coming from your CRM or CDP and then groups them together as ‘segments’ or ‘clusters’.
How many segments is too many segments?
The challenge with customer segments is to find those unique set of attributes that are common amongst a group of customers. Often these attributes might be exceptions or outliers that marketers cannot identify and end up placing them in the generic bucket. For a marketer, higher number of segments always seems attractive to create more opportunities to message these customers. But over time these many segments start overlapping and the same customer gets spammed & bombarded with multiple marketing messages.
To find the optimal number of segments we explored machine learning algorithms for answers based on the credit card usage data.
Elbow Method
This is the most optimal and popular method for finding the optimal number of clusters when classifying data into groups. The idea is to group our data into a range of clusters ( say between 1 to 15) and calculate the difference between the data and the cluster center. As the variation changes rapidly and then slows down leading to an elbow formation in the curve. That elbow point is considered as the number of clusters we can used in our algorithm.
For our credit card data we find that the number of clusters is identified as 7 using Elbow Method of clustering
I did try other means of clustering as well but research had shown that the Elbow Method has been the most optimal of them all.
How are the segments build ?
Before we deep dive into the outcomes, lets look at some of the algorithms available and commonly used for similar use cases. The objective is to explain the grouping/clustering methods to enable the right decision making.
K Means Clustering
The main objective is to find k centroids and assign each point to the set based on the nearest centroid such that the intra-cluster distance is minimized. The optimal centroids are found by taking the sum of squares of distances of points from its centroid.
Hence data attributes that are closely related would be centred around a common centroid !!
DBSCAN
DBSCAN or Density Based Spatial Clustering of Applications including Noise is a clustering algorithm with the main objective to find clusters around high density of data. Isolated data points are hence considered as noise.
DBSCAN doesn’t require number of clusters to be defined (like kmeans) , but requires to be provided size of data neighbourhoods and the minimum number of observations that constitutes a cluster.
Animation Source – https://dashee87.github.io/data%20science/general/Clustering-with-Scikit-with-GIFs/
Gaussian Mixture Model
Gaussian Mixture Models (GMMs) assume that there are a certain number of Gaussian distributions, and each of these distributions represent a cluster. Hence, a Gaussian Mixture Model tends to group the data points belonging to a single distribution together.
A Gaussian distribution is defined as a bell-shaped curve, and it is assumed that during any measurement values will follow a normal distribution with an equal number of measurements above and below the mean value.
What were the segments created?
As a starting point, I decided to continue with K Means Clustering method to find the 8 segments that we found using Elbow method earlier. There is use case in the future to create segments using multiple cluster algorithms and measure their performance towards business goals.
Applying the clustering algorithm on the data resulted in the below graph where you can see the variance of each data points or columns in each cluster. The variations are factored into defining the business rules for each cluster. Feel free to take a guess at what the segment rules would be like based on the graph below
Cluster 1 Customers use credit card as a loan : highest balance ($5000) and cash advance (~$5000), low purchase frequency, high cash advance frequency (0.5), high cash advance transactions (16) and low percentage of full payment (3%)
Cluster 2 Customers have high purchase frequency (0.9) who use payment installment facility the most (highest installment frequency 0.83), pay in full whenever possible (second highest Percentage of full payment = 25%) and do not use costly cash advance service
Cluster 3 Customers are active buyers who pay in full. Cluster with highest purchase frequency (0.93), second highest purchase transactions and one-off purchases, highest % of payment in full (29%)
Cluster 4 Customers have high credit limit $12K and highest percentage of full payment, target for increase credit limit and increase spending habits
Cluster 5 Customers have low tenure (7 years) and low balance
Cluster 6 Customers pay least amount of interest charges and careful with their money, Cluster with lowest balance ($104) and second lowest cash advance ($303), Percentage of full payment = 24%
Cluster 7 Customers who represent an exception or edge case for the business have record-high minimum payments level of nearly $ 28K
Cluster 8 Customers use the less their card (lowest purchase frequency) and with the lowest purchase amount