0% found this document useful (0 votes)
7 views4 pages

Customer Churn Data Analysis Overview

Great Lakes Question Paper

Uploaded by

Karthi Keyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Customer Churn Data Analysis Overview

Great Lakes Question Paper

Uploaded by

Karthi Keyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Data Analysis of Customer Churn Dataset

1. Introduction
In this analysis, we will load the customer churn dataset, reflect on its metadata, and provide a
detailed understanding of its structure. The analysis covers the following tasks:

● Importing the necessary libraries.

● Checking the versions of the libraries used.

● Loading the dataset and verifying its structure.

● Reflecting on the metadata and describing each attribute.

2. Library Imports
We will start by importing the necessary Python libraries that are required for this analysis.
Below are the essential libraries for data manipulation and analysis:

import pandas as pd
import numpy as np
import sklearn
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, precision_score, recall_score,
f1_score, confusion_matrix, classification_report
import [Link] as plt
import seaborn as sns
import matplotlib

3. Library Versions
Next, we will check the versions of the libraries used in the project. This ensures that the
analysis is reproducible and helps with debugging in case of discrepancies in the future.

print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")
print(f"scikit-learn: {sklearn.__version__}")
print(f"matplotlib: {matplotlib.__version__}")
print(f"seaborn: {sns.__version__}")

pandas: 2.2.3
numpy: 2.1.3
scikit-learn: 1.6.1
matplotlib: 3.10.0
seaborn: 0.13.2

4. Loading the Data


Next, we'll load the dataset. Assuming the data file is available as a CSV file, we'll read it into a
Pandas DataFrame and perform some preliminary checks.

# reading the CSV file into pandas dataframe


df = pd.read_csv('/Users/karthikeyanmoorthy/Downloads/customer_churn.csv')

#View the top 10 records


[Link](10)

# View the bottom 10 records


[Link](10)

5. Dataset Meta-Information

After loading the data, it's important to understand its structure. We'll use the .info() method
to reflect on the metadata (data types, null values, etc.) of the dataframe.

# Identify the Shape. Meaning, the number of rows or columns in the


dataset
print("no. of rows: ",[Link][0], "\n""no. of columns: ",[Link][1])

no. of rows: 7043


no. of columns: 21

# Retrive the list of columns along with its datatype


[Link]()

<class '[Link]'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

6. Attribute Explanation and Data Types


Here’s a summary of each attribute in the dataset, explaining what each one stands for and the
range of values it can hold:

1. customerID: Unique identifier for each customer (string).

2. gender: Gender of the customer (Male/Female).

3. SeniorCitizen: Whether the customer is a senior citizen (1 = Yes, 0 = No).

4. Partner: Whether the customer has a partner (Yes/No).

5. Dependents: Whether the customer has dependents (Yes/No).

6. tenure: Number of months the customer has been with the company (integer, range).

7. PhoneService: Whether the customer subscribes to phone service (Yes/No).


8. MultipleLines: Whether the customer subscribes to multiple phone lines (Yes/No/No
phone service).

9. InternetService: Type of internet service the customer subscribes to (DSL, Fiber optic,
etc).

10. OnlineSecurity: Whether the customer subscribes to online security (Yes/No/No


internet service).

11. OnlineBackup: Whether the customer subscribes to online backup (Yes/No/No internet
service).

12. DeviceProtection: Whether the customer subscribes to device protection (Yes/No/No


internet service).

13. TechSupport: Whether the customer subscribes to tech support (Yes/No/No internet
service).

14. StreamingTV: Whether the customer subscribes to streaming TV (Yes/No/No internet


service).

15. StreamingMovies: Whether the customer subscribes to streaming movies (Yes/No/No


internet service).

16. Contract: Type of contract the customer has (Month-to-month/One year/Two year).

17. PaperlessBilling: Whether the customer has paperless billing (Yes/No).

18. PaymentMethod: Payment method used by the customer (Electronic check, Mailed
check, Bank transfer, etc).

19. MonthlyCharges: Monthly charges for the customer’s service (float, range).

20. TotalCharges: Total charges incurred by the customer (float, range).

21. Churn: Whether the customer has churned (Yes/No).

The dataset has been successfully loaded, and the metadata has been examined. Each
attribute has been described, and its data type has been noted. Moving forward, we can perform
exploratory data analysis, data preprocessing, and eventually build predictive models.

You might also like