Project Report: Bank Loan Prediction using Excel
Tools Used:
Microsoft Excel / Google Sheets
Section 1: Exploratory Data Analysis (EDA)
1. Finding the Loan Application with the Highest Applicant Income: --> Sort the
'ApplicantIncome' column in descending order. --> Identify the Loan_ID of the first
row.
2. Calculating the Average Loan Amount Applied For: --> Use the formula:
=AVERAGE(B2:B614) (assuming LoanAmount is in column B and rows 2 to 614
contain data).
3. Finding the Property Area with the Highest Loan Approval Rate: --> Apply a filter on
'Loan_Status' to select only 'Y'. --> Use =COUNTIFS(E2:E614, "Urban", G2:G614, "Y") /
COUNTIF(E2:E614, "Urban") (assuming Property_Area is column E and Loan_Status is
column G).
4. Calculating the Approximate Loan Approval Rate: --> Use formula:
=COUNTIF(G2:G614, "Y") / COUNTA(G2:G614).
5. Determining the Group with the Highest Average Loan Amount: --> Use
=AVERAGEIF(D2:D614, "Yes", B2:B614) for self-employed applicants (assuming
Self_Employed is column D). --> Use =AVERAGEIF(F2:F614, "Graduate", B2:B614) for
graduates (assuming Education is column F).
6. Comparing Average Applicant Income for Approved vs. Rejected Loans: --> Use
=AVERAGEIF(G2:G614, "Y", A2:A614) (assuming ApplicantIncome is column A). -->
Compare with =AVERAGEIF(G2:G614, "N", A2:A614).
7. Observing Trends in Applicant Income vs. Loan Amount Scatter Plot: --> Insert a
scatter plot with ApplicantIncome (column A) on X-axis and LoanAmount (column B)
on Y-axis. --> Analyze the trend visually.
8. Calculating Loan Approval Rate for Urban Areas: --> Use =COUNTIFS(E2:E614,
"Urban", G2:G614, "Y") / COUNTIF(E2:E614, "Urban").
Section 2: K-Nearest Neighbors (KNN) Classification
1. Encoding Categorical Data: --> Convert categories into numerical values using Find &
Replace or a lookup table.
2. Computing Euclidean Distance: --> Use formula: =SQRT((A2-A3)^2 + (B2-B3)^2) for
two numerical columns (e.g., ApplicantIncome and LoanAmount).
3. Finding the 3 Nearest Neighbors: --> Sort the computed Euclidean distances in
ascending order. --> Select the top 3 Loan_IDs.
4. Predicting Loan Status Using K = 5: --> Select the 5 nearest Loan_Status values. -->
Use =MODE(G2:G6) to determine the majority class.
5. Counting Approved Loans in K = 20 Nearest Neighbors: --> Count the number of 'Y'
in the 20 closest Loan_Status values using =COUNTIF(G2:G21, "Y").
Section 3: K-Means Clustering
1. Calculating Distance Between Applicant and Initial Cluster Centers: --> Compute
Euclidean distance using =SQRT((A2-A3)^2 + (B2-B3)^2).
2. Assigning Loan_ID LP001872 to a Cluster: --> Compute distances to both cluster
centers. --> Assign to the closer cluster.
3. Recomputing New Cluster Centers: --> Calculate the new mean values for each
cluster using =AVERAGE(range_of_cluster_points). --> Replace the old cluster centers
with new mean values.
Key Insights:
Identified trends in income vs. loan amount requests.
Determined loan approval patterns based on property areas.
Applied KNN classification for loan prediction.
Clustered loan applicants into meaningful groups using K-Means.