Customer Segmentation using Clustering Algorithm - K-Means Clustering, Hierarchical Agglomerative Clustering (HAC) and DBSCAN

Project Details

Customer segmentation is important for businesses that want to gain useful insights to make better decisions, stay competitive and improve customer satisfaction. This research investigates customer segmentation in the retail sector by combining demographic information, purchasing behaviours and sentiment analysis using different clustering methods. K-Means, Hierarchical Agglomerative Clustering (HAC) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) were employed to analyse demographic and behavioural data to identify distinct customer segments. Clustering results were evaluated using metrics such as the Silhouette Score, Davies-Bouldin Index and Calinski-Harabasz Index, which confirmed the effectiveness of algorithms. The findings demonstrate that all clustering methods effectively identify distinct customer segments based on demographic and behavioural characteristics across product categories, including Behavioral RFM (Recency, Frequency, Monetary) clusters. Sentiment analysis of product reviews provides deeper insights into customer feelings and opinions, illustrating how these sentiments affect purchasing behaviour. Read full report below

Language Python
Library Pandas, Matplotlib, TextBlob, Scikit-Learn 
Analysis  K-Means, Hierarchical Agglomerative Clustering (HAC), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Sentiment Analysis
Data Source Kaggle 
01.

K-Means Clustering

K-Means partitions the data into k clusters, where each data point is assigned to the cluster with the nearest mean value. The optimal number of clusters (k) was determined using the elbow method, which evaluates the sum of squared errors (SSE) and identifies the "elbow point" as the best value for k​.

02.

Hierarchical Agglomerative Clustering (HAC)

HAC uses the Ward linkage method to merge clusters based on minimising the variance between them. A dendrogram was constructed to visualize the cluster hierarchy, and the optimal number of clusters was determined by cutting the dendrogram at the appropriate height.

03.

Density-based spatial clustering of applications with noise (DBSCAN)

DBSCAN groups points based on density, marking isolated points as noise. The parameters epsilon (ε\varepsilonε) and minPts were tuned to maximise the Silhouette Score, a measure of how similar each point is to its own cluster compared to others​.

Customer Segmentation using Clustering Algorithm
Want to know more?

Read Full Report

Curious about how advanced clustering techniques can transform raw customer data into actionable insights? Dive into the full report on Customer Segmentation Using Clustering Algorithms and explore the use of K-Means Clustering, Hierarchical Agglomerative Clustering (HAC), and DBSCAN to uncover distinct customer groups.

📊 What you’ll learn:

  • How data-driven segmentation enhances business strategies.
  • The application of clustering algorithms on real-world data.
  • Key metrics and insights that drive customer-focused decision-making.