0% found this document useful (0 votes)
14 views18 pages

Bank Marketing Campaign Analysis Proposal

Uploaded by

nooif nhoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views18 pages

Bank Marketing Campaign Analysis Proposal

Uploaded by

nooif nhoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Bank Marketing Project Proposal

March 7, 2018
The Team

Chaitra Hegde Aakash Kaku Neelang Parghi

Masters Student at CDS, Masters Student at CDS, Masters Student in Math/CS.


Active learner of Data Interested in using data Studying scientific
Science methodologies and science techniques to make computing but recently
Machine learning informed business decisions interested in data science.
techniques
Agenda
● Business Understanding

● Data Exploration and Preparation

● Model Building

● Hyper-parameter Tuning and Model Evaluation

● Result / Outcomes
Business Understanding
Business Understanding
● Problem Statement: Improve marketing campaign of a Portuguese bank by
analyzing their past marketing campaign data and recommending which customer
to target
● Problem Motivation: By devising such a prediction algorithm, the bank can better
target its customers and better channelize its marketing efforts
● Banco de Portugal offered their clients fixed-term products such as CDs. Data was
collected about each client, type of contact, and outcome.
● What can this data tell us about marketing success for this campaign?
● Can these data science techniques be applied to other areas?
Data Exploration and Preparation
Data Exploration and Preparation (1/2)
● All coding done in Python 3.
● Extensive use of pandas, numpy, matplotlib, as well as seaborn and sklearn packages.
● Dataset contained 20 different features on more than 41,000 clients.
● Features were both categorical and numerical. Target variable was binary (“Yes” or “No”).
● Pandas package was imported and a dataframe was created.
● Categorical variables were looked at first. Visualizations were created using the seaborn package.
Data Exploration and Preparation (2/2)
● Many features had missing values. How do we handle this?
● For categorical features, imputation using other independent variables. For example,
cross-tabulation between 'job' and 'education'; 'age' and 'job'; 'home ownership' and 'loan
status.'
● Among numerical features, fortunately only column (‘pdays’) had any missing values.
Unfortunately, missing values made up the majority of the column.
● To handle this, ‘pdays’ was converted from a numerical feature to a categorical feature using
buckets: < 5 days, 6-15 days, etc.
● Heatmap using seaborn package was created to show us any particularly strong correlations
between the independent variables and the target variable outcome.
Correlation Heatmap :
Model Building
Model Building (1/2)
Logistic Regression Decision Tree

● sklearn.linear_model.LogisticReg ● [Link]
ression ● Simple to understand and effective
● Its a classification model though ● Splits the data at every node based
name is Logistic regression on one feature
● Fits a sigmoid function to a data ● Uses information gain as measure
● Outputs probability which is in for split
[0,1] range unlike linear models.
Model Building (2/2)
Random Forest AdaBoost and Gradient Boosting
● [Link]
● [Link]
● [Link]
er
● Many decision trees with single split are
● Constructs multiple decision trees and
constructed
takes the mode of those trees for an ● Instance which is hard to classify gets more
example to make the final decision attention by giving it a larger weight
● Individual Trees are intentionally over ● Gradient Boosting is generalized version of
fit and validation set is used to optimize AdaBoost
the forest level parameters ● One weak learner is added at a time and
existing weak learners remain unchanged
Hyper-Parameter Tuning and Model Evaluation
Hyper-parameter tuning and Model Evaluation
● Used mean AUC of 5 fold cross validation as the metric for evaluation
● Choose the model with highest mean AUC
Model Hyper-parameters Tuned Optimal hyper-parameters Mean AUC

C: Regularization Coefficient C = 0.1


Logistic Regression 0.7903
Type: L1, L2 L1 Logistic Regression

Min. Split Value and Min. Leaf Min. Split Value = 1110
Decisions Trees 0.7919
Value Min. Leaf Value = 132

Min. Split Value and Min. Leaf Min. Split Value = 189
Random Forests 0.7979
Value Min. Leaf Value = 7

Min. Split Value and Min. Leaf Min. Split Value = 85


Gradient Boosted Trees 0.8006
Value Min. Leaf Value = 37

AdaBoost Number of Estimators Number of Estimators = 1000 0.8157


Results / Outcome
Best Model and Feature Importance
● Best Model: AdaBoost with 1000 estimators.
● Obtained an AUC of 0.8036 on the test set.
● Below is the Feature Importance Chart for the AdaBoost Model:
Recommendations to the Marketing Team
Significant Variablesasd Recommendations
● Collaborate with the economic experts
Libor Rate, [Link], ● Be a fast mover, capture customers before the competitors
[Link]
capture them
● Target relatively Old Age people
Age ● Convey Peace of mind, Safe investment, steady income
source as the value proposition

Duration, Mode of Contact: ● Try to engage customers and have longer calls
Telephone ● Preferably use Telephone as the mode of contact

● Prioritize those customers to who were part of the previous


Campaign
marketing campaigns.
Thank You

You might also like