Starbucks Capstone Challenge

Udacity Data scientist Capstone project to explore Starbucks data and build prediction model.

Photo by 𝙆 on Unsplash

Project Motivation

Dataset Description

Problem Introduction

Strategy to solve problem

Data Wrangling

Portfolio Dataset

Portfolio data before processing
Portfolio data after processing

Transcript Dataset

transcript data before processing
transcript data after processing

Profile Dataset

Profile data before processing.
Profile data after processing

Exploratory Data Analysis

1. Cumulative count of members over time

Cumulative sum of number of members

2. Distribution of member’s income

Members income distribution
Income group of members

3. Age Distribution per gender of members

Age distribution by gender
Age generation of members

4. Percentage of offers received, viewed and completed.

Offer Received vs Offer viewed vs Offer completed
OfferIDs > Received , Viewed and completed.
Total offer completed by age groups
Total offer completed by genders

Merging Datasets

Master dataframe
Splitting data into test and train data
Standardize X_train
Standardize X_test
Merging the X_train and X_test data

Data Modelling


Unbalanced classes

What is more important to Starbucks True Positive or True Negative?

Which one has a higher costs to business, False Positives or False Negatives?

Performance Metrics

Prediction F1 Score

Hyperparameter tuning

Hyperparameter Tuning of Lightgbm model.
F1 score after hypertuning
Hypertuning XGBoost Model.


Calculating feature importance
Features important to predict customer action



Classification Report for final report.



Data Science Enthusiast. Engineer. MBA. Learning to analyze data with Python.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store