PROJECT REPORT
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and Hatchback
models. In its recent board meeting, concerns were raised by the members on the efficiency of the
marketing campaign currently being used. The board decides to rope in an analytics professional to
improve the existing campaign.
1. You as an analyst have been tasked with performing a thorough analysis of the data
and coming up with insights to improve the marketing campaign.
A. What is the important technical information about the dataset that a database
administrator would be interested in?
The Dataset has a total of 1581 rows and 14 columns meaning it has 1581 entries and 14
different variables
First, we take a look at the sample of the rows and columns
General information of data
<class '[Link]'>
RangeIndex: 1581 entries, 0 to 1580
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 1581 non-null int64
1 Gender 1528 non-null object
2 Profession 1581 non-null object
3 Marital_status 1581 non-null object
4 Education 1581 non-null object
5 No_of_Dependents 1581 non-null int64
6 Personal_loan 1581 non-null object
7 House_loan 1581 non-null object
8 Partner_working 1581 non-null object
9 Salary 1581 non-null int64
10 Partner_salary 1475 non-null float64
11 Total_salary 1581 non-null int64
12 Price 1581 non-null int64
13 Make 1581 non-null object
dtypes: float64(1), int64(5), object(8)
memory usage: 173.0+ KB
If we look at some basic information about the data
Out of the 14 variables there are 6 numerical and 8 categorical variables
Also, there are a few null values in the Gender and Partner_salary variables
B) Take a critical look at the data and do a preliminary analysis of the variables. Do a
quality check of the data so that the variables are consistent. Are there any discrepancies
present in the data?
From the previous table we found out that there are null values in Gender and Partner_salary
variables
In Gender there are 53 nulls and
Partner_salary there are 106 nulls
Now to fill in the missing data or the nulls -
For Gender we can use the majority of the 2 outputs to fill in the nulls
In this case the nulls are imputed with ‘Male’ since there are in majority
For Partner_salary
We are using conditional imputation since there are other variables related to salary-
Salary + Partner_salary = Total_salary
The condition is that if the Partner_working is YES then the
Partner_salary = Total_salary – Salary
If the Partner_working is NO the
Partner_salary = 0
Numerical description of data
• There are wide range of customers from the age of 22 to 54 and the mean age is 31.92
whereas the median age is 29 years.
• The salary of the customers ranges from 30,000 to 99,300 and the mean salary is around
60,000
• Total_salary is between 30,000 and 1,71,000
• The price of the automobiles starts from 18,000 and the highest price is 70,000 The
average price of a car is around 36,000
No. of entries of categorical variables
GENDER: 4
Male 1252 Female 327
Femal 1
Femle 1
Name: Gender, dtype: int64
PROFESSION: 2
Salaried 896
Business 685
Name: Profession, dtype: int64 EDUCATION: 2
Post Graduate 985
Graduate 596
Name: Education, dtype:int64
HOUSE_LOAN : 2
No 1054
Yes 527
Name: House_loan, dtype: int64
PERSONAL LOAN:2
Yes 792
No 789
Name: Personal_loan, dtype: int64
PARTNER_WORKING:2
Yes 868
No 713
Name: Partner_working, dtype: int64
MARITAL_STATUS: 2
Married 1443
Single 138
Name: Marital_status, dtype: int64
MAKE:3
Sedan 702
Hatchback 582
SUV 297
Name: Make, dtype: int64
We can see that there might be 2 errors of the Female output being misspelt as ‘Femal’ and
‘Femle’
Since we know that it is a spelling mistake, we can go ahead and correct it into Female
C) Explore all the features of the data separately by using appropriate visualizations and
draw insights that can be utilized by the business.
Univariate analysis of the Numerical fields
• Majority of the customer base lies between the age group of 22 to 28
• The automobiles are prices from 18,000 to 35,000
•
Univariate analysis of Categorical variables
• There are more salaried customers than those doing a business
• Over 90% of the customers are married
• Majority of the customers have done Post graduation
• No. of customers who have taken a personal loan and did not take them are almost equal whereas
no. of customers who did not take a house loan are almost double than those who took it
• There are many customers who have either 2 or 3 no. of dependents
• Out of the three Sedan is the most preferred automobile.
D) Understanding the relationships among the variables in the dataset is crucial for every
analytical project. Perform analysis on the data fields to gain deeper insights. Comment on
your understanding of the data.
Bi-variate analysis of Numerical variables
• There is high correlation between total salary and partner salary
• Price and age also have a high correlation
• The other variables do not seem to have any correlation between them
• Females preferred to buy a SUV and did not prefer hatchbacks at all and Males prefer a
hatchback or Sedan.
• Customers with a house loan are least likely to buy a SUV
• Single customers preferred to buy hatchback while the married customers preferred Sedan
E) Employees working on the existing marketing campaign have made the following
remarks. Based on the data and your analysis state whether you agree or disagree with
their observations. Justify your answer Based on the data available. ***
E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women” **
If we look at the count plot table of Gender vs the Make of automobile,
Male prefer hatchback more than SUV Whereas the Females prefer SUV more than the other two
Therefore the Steve’s observation is wrong in this case
E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
From the table Profession vs Make
We can see te that salaried men clearly preferred Sedan more than the Businessmen
Hence the observation made by Ned Stark is true
E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an
easier target for a SUV sale over a Sedan Sale.
From the table Profession vs Make for male customers
It is seen that Salaried Male customers prefer Sedan more than a SUV so it would not be apt to target
them for a SUV sale
And therefore the statement made by Sheldon is not right
F) From the given data, comment on the amount spent on purchasing automobiles across
the following categories.
Comment on how a Business can utilize the results from this exercise.
Give justification along with presenting metrics/charts used for arriving at the conclusions.
F1) Gender
Mean of Price across Gender:
Female = 47705
Male = 32416
Median of Price across Gender:
Female = 49000
Male = 29000
Both mean and median price for female customers is higher than the males
Mean and Median Price for Female customers is higher than Male customers.
F2) Personal_loan
Mean of Price across Personal Loan:
Personal Loan: No= 36742
Personal Loan: Yes= 34457
Median of Price across Personal Loan
Personal Loan: No= 32000
Personal Loan: Yes= 31000
The mean and the median price of purchase of the customers who did not opt for a Personal loan
is slightly higher than the ones with it
G) From the current data set comment if having a working partner leads to purchase of a
higher priced car.
Mean of Price across Partner_working:
Partner_working: No = 36000
Partner_working: Yes = 35267
Median of Price across Partner_working:
Partner_working: No = 31000
Partner_working: Yes = 31000
For both Partner working and not working both the mean and median are almost same meaning
that the purchase made by the customer does not depend on the partner working
H) The main objective of this analysis is to devise an improved marketing strategy to send
targeted information to different groups of potential buyers present in the data. For the
current analysis use Gender and Marital_status - fields to arrive at groups with similar
purchase history.
We can use this chart to divide our customers into 4 different segments -
Married Male, Single Male, Single Female and Single Female
Based on the chart we can also go ahead and assign a car to each segment
1. Married Male – Sedan
2. Single Male – Hatchback
3. Married Female – SUV
4. Single Female – Sedan
Framing An Analytics Problem
Analyse the dataset and list down the top 5 important variables, along with the business
justifications.
A bank can generate revenue in a variety of ways, such as charging interest, transaction fees and
financial advice. Interest charged on the capital that the bank lends out to customers has
historically been the most significant method of revenue generation. The bank earns profits from
the difference between the interest rates it pays on deposits and other sources of funds, and the
interest rates it charges on the loans it gives out.
GODIGT Bank is a mid-sized private bank that deals in all kinds of banking products, such as
savings accounts, current accounts, investment products, etc. among other offerings. The bank
also cross-sells asset products to its existing customers through personal loans, auto loans,
business loans, etc., and to do so they use various communication methods including cold
calling, e-mails, recommendations on the net banking, mobile banking, etc.
GODIGT Bank also has a set of customers who were given credit cards based on risk policy and
customer category class but due to huge competition in the credit card market, the bank is
observing high attrition in credit card spending. The bank makes money only if customers spend
more on credit cards. Given the attrition, the Bank wants to revisit its credit card policy and
make sure that the card given to the customer is the right credit card. The bank will make a profit
only through the customers that show higher intent towards a recommended credit card. (Higher
intent means consumers would want to use the card and hence not be attrite.)
The top 5 important variables for GODIGIT bank’s dataset are –
1. Annual_income_at_source –
Based on the income earned by the customers Godigit can segregate their customers into
different groups by the income bracket
2. Cc_active60
This variable can be used to track the activity of the credit card and know how often the
customer is using it. Godigit can use this information to remarket to their customers and
look for probable reasons for reduced usage of credit card
3. T+1 month_activity -
This variable can be used to target the customers accordingly and design specific
marketing strategies in order to retain them
4. Avg_spends_3m -
This variable tells us the average credit card spends of the customer in the last 3 months
based on this Godigit can divide the customers into high, medium and low spenders and
provide them with different offers and target more efficiently
5. Cc_limit -