0% found this document useful (0 votes)
5 views47 pages

03 Exploratory Data Analysis

The document details an exploratory data analysis of IPL cricket data, utilizing a dataset with 278,205 entries and 58 columns. It includes data loading, basic statistics, and grouping operations to analyze match outcomes, teams, and scores across different seasons. Visualizations of matches per season and first innings scores are also presented.

Uploaded by

Nakka Srikanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views47 pages

03 Exploratory Data Analysis

The document details an exploratory data analysis of IPL cricket data, utilizing a dataset with 278,205 entries and 58 columns. It includes data loading, basic statistics, and grouping operations to analyze match outcomes, teams, and scores across different seasons. Visualizations of matches per season and first innings scores are also presented.

Uploaded by

Nakka Srikanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

3/4/26, 9:30 PM 03_exploratory_data_analysis

In [70]: import pandas as pd


import numpy as np
import [Link] as plt
import seaborn as sns
pd.set_option('display.max_columns', None)

In [71]: df = pd.read_csv(r"C:\Users\LENOVO\Downloads\mywork\Ipl_Analysis\/data/ipl_cleaned.csv", low_memory=False)

In [72]: [Link]

Out[72]: (278205, 58)

In [73]: [Link]()

[Link] 1/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

<class '[Link]'>
RangeIndex: 278205 entries, 0 to 278204
Data columns (total 58 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 match_id 278205 non-null int64
1 date 278205 non-null object
2 innings 278205 non-null int64
3 batting_team 278205 non-null object
4 bowling_team 278205 non-null object
5 over 278205 non-null int64
6 ball 278205 non-null int64
7 batter 278205 non-null object
8 bat_pos 278205 non-null int64
9 runs_batter 278205 non-null int64
10 balls_faced 278205 non-null int64
11 bowler 278205 non-null object
12 valid_ball 278205 non-null int64
13 runs_extras 278205 non-null int64
14 runs_total 278205 non-null int64
15 runs_bowler 278205 non-null int64
16 runs_not_boundary 278205 non-null bool
17 extra_type 15133 non-null object
18 non_striker 278205 non-null object
19 non_striker_pos 278205 non-null int64
20 wicket_kind 13823 non-null object
21 player_out 13823 non-null object
22 fielders 10013 non-null object
23 runs_target 133903 non-null float64
24 review_batter 872 non-null object
25 team_reviewed 872 non-null object
26 review_decision 872 non-null object
27 umpire 872 non-null object
28 umpires_call 278205 non-null bool
29 player_of_match 278205 non-null object
30 match_won_by 273503 non-null object
31 toss_winner 278205 non-null object
32 toss_decision 278205 non-null object
33 venue 278205 non-null object
34 city 278205 non-null object
35 day 278205 non-null int64

[Link] 2/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

36 month 278205 non-null int64


37 year 278205 non-null int64
38 superover_winner 3896 non-null object
39 result_type 278205 non-null object
40 balls_per_over 278205 non-null int64
41 overs 278205 non-null int64
42 event_match_no 261492 non-null float64
43 stage 278205 non-null object
44 team_runs 278205 non-null int64
45 team_balls 278205 non-null int64
46 team_wicket 278205 non-null int64
47 new_batter 13321 non-null object
48 batter_runs 278205 non-null int64
49 batter_balls 278205 non-null int64
50 bowler_wicket 278205 non-null int64
51 next_batter 13321 non-null object
52 striker_out 278205 non-null bool
53 ipl_season 278205 non-null int64
54 is_super_over 278205 non-null bool
55 win_margin 273503 non-null float64
56 win_type 273503 non-null object
57 is_dl_method 278205 non-null bool
dtypes: bool(5), float64(3), int64(24), object(26)
memory usage: 113.8+ MB

In [74]: [Link]()

[Link] 3/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[74]: match_id innings over ball bat_pos runs_batter balls_faced valid_ball runs_e

count 2.782050e+05 278205.000000 278205.000000 278205.000000 278205.000000 278205.000000 278205.000000 278205.000000 278205.00

mean 9.422687e+05 1.482914 9.193839 3.488855 3.612555 1.277378 0.967362 0.963182 0.06

std 3.817198e+05 0.502571 5.681511 1.708263 2.168978 1.651107 0.177687 0.188315 0.34

min 3.359820e+05 1.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.00

25% 5.483530e+05 1.000000 4.000000 2.000000 2.000000 0.000000 1.000000 1.000000 0.00

50% 1.082601e+06 1.000000 9.000000 3.000000 3.000000 1.000000 1.000000 1.000000 0.00

75% 1.304049e+06 2.000000 14.000000 5.000000 5.000000 1.000000 1.000000 1.000000 0.00

max 1.485779e+06 6.000000 19.000000 7.000000 11.000000 6.000000 1.000000 1.000000 7.00

In [75]: df["match_id"].nunique()

Out[75]: 1169

In [76]: df["ipl_season"].unique()

Out[76]: array([2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018,
2019, 2020, 2021, 2022, 2023, 2024, 2025], dtype=int64)

In [77]: df["ipl_season"].nunique()

Out[77]: 18

In [78]: df["batter"].nunique()

Out[78]: 703

In [79]: df["bowler"].nunique()

Out[79]: 550

[Link] 4/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [80]: match_df = [Link]("match_id").agg({


"ipl_season": "first",
"date": "first",
"venue": "first",
"city": "first",
"match_won_by": "first",
"win_margin": "first",
"win_type": "first",
"result_type": "first",
"toss_winner": "first",
"toss_decision": "first"
}).reset_index()

match_df.head()

Out[80]: match_id ipl_season date venue city match_won_by win_margin win_type result_type toss_winner toss_decision

M
Royal
2008- Chinnaswamy Kolkata Knight
0 335982 2008 Bengaluru 140.0 runs normal Challengers field
04-18 Stadium, Riders
Bengaluru
Bengaluru

Punjab
Cricket
2008- Chennai Super Chennai
1 335983 2008 Association Mohali 33.0 runs normal bat
04-19 Kings Super Kings
IS Bindra
Stadium, ...

Arun Jaitley
2008- Rajasthan
2 335984 2008 Stadium, Delhi Delhi Capitals 9.0 wickets normal bat
04-19 Royals
Delhi

Wankhede Royal
2008- Mumbai
3 335985 2008 Stadium, Mumbai Challengers 5.0 wickets normal bat
04-20 Indians
Mumbai Bengaluru

Eden
2008- Kolkata Knight Deccan
4 335986 2008 Gardens, Kolkata 5.0 wickets normal bat
04-20 Riders Chargers
Kolkata

[Link] 5/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [81]: team_df = df[df["innings"] == 1].groupby("match_id").agg({


"batting_team": "first",
"bowling_team": "first"
}).reset_index()

team_df = team_df.rename(columns={
"batting_team": "team1",
"bowling_team": "team2"
})

match_info = [Link]("match_id").agg({
"ipl_season": "first",
"date": "first",
"venue": "first",
"city": "first",
"match_won_by": "first",
"win_margin": "first",
"win_type": "first",
"result_type": "first",
"toss_winner": "first",
"toss_decision": "first"
}).reset_index()

match_df = match_info.merge(team_df, on="match_id", how="left")

match_df.head()

[Link] 6/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[81]: match_id ipl_season date venue city match_won_by win_margin win_type result_type toss_winner toss_decision

M
Royal
2008- Chinnaswamy Kolkata Knight
0 335982 2008 Bengaluru 140.0 runs normal Challengers field
04-18 Stadium, Riders
Bengaluru
Bengaluru

Punjab
Cricket C
2008- Chennai Super Chennai
1 335983 2008 Association Mohali 33.0 runs normal bat
04-19 Kings Super Kings
IS Bindra
Stadium, ...

Arun Jaitley
2008- Rajasthan Ra
2 335984 2008 Stadium, Delhi Delhi Capitals 9.0 wickets normal bat
04-19 Royals
Delhi

Wankhede Royal
2008- Mumbai M
3 335985 2008 Stadium, Mumbai Challengers 5.0 wickets normal bat
04-20 Indians
Mumbai Bengaluru

Eden
2008- Kolkata Knight Deccan
4 335986 2008 Gardens, Kolkata 5.0 wickets normal bat
04-20 Riders Chargers C
Kolkata

In [82]: match_df.shape

Out[82]: (1169, 13)

In [83]: matches_per_season = match_df.groupby("ipl_season")["match_id"].count()

matches_per_season

[Link] 7/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[83]: ipl_season
2008 58
2009 57
2010 60
2011 73
2012 74
2013 76
2014 60
2015 59
2016 60
2017 59
2018 60
2019 60
2020 60
2021 60
2022 74
2023 74
2024 71
2025 74
Name: match_id, dtype: int64

In [84]: [Link]()
matches_per_season.plot(kind="line", marker="o")
[Link]("Matches Per Season")
[Link]("Season")
[Link]("Number of Matches")
[Link]()

[Link] 8/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [85]: first_innings = df[


(df["innings"] == 1) &
(df["is_super_over"] == False)
]

first_innings_score = first_innings.groupby(
["match_id", "ipl_season"]
)["team_runs"].max().reset_index()

first_innings_score.head()

[Link] 9/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[85]: match_id ipl_season team_runs

0 335982 2008 222

1 335983 2008 240

2 335984 2008 129

3 335985 2008 165

4 335986 2008 110

In [86]: avg_first_innings = first_innings_score.groupby(


"ipl_season"
)["team_runs"].mean()

avg_first_innings

Out[86]: ipl_season
2008 160.965517
2009 150.263158
2010 164.783333
2011 152.369863
2012 157.540541
2013 155.894737
2014 163.066667
2015 166.254237
2016 162.600000
2017 165.779661
2018 172.466667
2019 166.733333
2020 169.500000
2021 159.316667
2022 171.121622
2023 182.729730
2024 189.591549
2025 188.837838
Name: team_runs, dtype: float64

[Link] 10/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [87]: [Link]()
avg_first_innings.plot(kind="line", marker="o")
[Link]("Average First Innings Score Per Season")
[Link]("Season")
[Link]("Average Score")
[Link]()

In [88]: win_type_counts = match_df["win_type"].value_counts()


win_type_counts

[Link] 11/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[88]: win_type
wickets 615
runs 531
Name: count, dtype: int64

In [89]: [Link]()
win_type_counts.plot(kind="bar")
[Link]("Win Type Distribution")
[Link]("Win Type")
[Link]("Number of Matches")
[Link]()

[Link] 12/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [90]: match_df["toss_win_match_win"] = match_df["toss_winner"] == match_df["match_won_by"]

In [91]: match_df["toss_win_match_win"].value_counts()

Out[91]: toss_win_match_win
True 591
False 578
Name: count, dtype: int64

In [92]: (match_df["toss_win_match_win"].mean()) * 100

[Link] 13/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[92]: 50.556030795551756

In [93]: match_df["toss_decision"].value_counts()

Out[93]: toss_decision
field 764
bat 405
Name: count, dtype: int64

In [94]: toss_decision_win = match_df.groupby("toss_decision")["toss_win_match_win"].mean() * 100


toss_decision_win

Out[94]: toss_decision
bat 45.185185
field 53.403141
Name: toss_win_match_win, dtype: float64

In [95]: chasing_trend = match_df.groupby("ipl_season").apply(


lambda x: (x["win_type"] == "wickets").mean() * 100
)

chasing_trend

C:\Users\LENOVO\AppData\Local\Temp\ipykernel_7896\[Link]: DeprecationWarning: [Link] operated on the g


rouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the
operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby
to silence this warning.
chasing_trend = match_df.groupby("ipl_season").apply(

[Link] 14/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[95]: ipl_season
2008 58.620690
2009 50.877193
2010 46.666667
2011 53.424658
2012 54.054054
2013 48.684211
2014 61.666667
2015 40.677966
2016 65.000000
2017 54.237288
2018 53.333333
2019 58.333333
2020 48.333333
2021 61.666667
2022 50.000000
2023 44.594595
2024 50.704225
2025 50.000000
dtype: float64

In [96]: [Link]()
chasing_trend.plot(kind="line", marker="o")
[Link]("Chasing Win Percentage Per Season")
[Link]("Season")
[Link]("Chasing Win %")
[Link]()

[Link] 15/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [97]: match_df = match_df.merge(


first_innings_score[["match_id", "team_runs"]],
on="match_id",
how="left"
)

match_df = match_df.rename(columns={"team_runs": "first_innings_score"})

In [98]: match_df["score_bucket"] = [Link](


match_df["first_innings_score"],
bins=[0,150,170,190,300],

[Link] 16/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

labels=["Below 150","150-170","170-190","190+"]
)

In [99]: bucket_analysis = match_df.groupby("score_bucket")["win_type"].apply(


lambda x: (x == "wickets").mean() * 100
)

bucket_analysis

C:\Users\LENOVO\AppData\Local\Temp\ipykernel_7896\[Link]: FutureWarning: The default of observed=False is deprecated a


nd will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to ad
opt the future default and silence this warning.
bucket_analysis = match_df.groupby("score_bucket")["win_type"].apply(
Out[99]: score_bucket
Below 150 75.294118
150-170 57.770270
170-190 47.426471
190+ 22.605364
Name: win_type, dtype: float64

In [100… venue_scoring = first_innings_score.merge(


match_df[["match_id","venue"]],
on="match_id"
)

venue_avg = venue_scoring.groupby("venue")["team_runs"].mean().sort_values(ascending=False)

venue_avg.head(10)

[Link] 17/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[100… venue
Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium, Visakhapatnam 208.750000
Sawai Mansingh Stadium, Jaipur 187.235294
Narendra Modi Stadium, Ahmedabad 186.636364
Himachal Pradesh Cricket Association Stadium, Dharamsala 183.133333
Brabourne Stadium 180.400000
Brabourne Stadium, Mumbai 177.411765
Bharat Ratna Shri Atal Bihari Vajpayee Ekana Cricket Stadium, Lucknow 175.363636
Barsapara Cricket Stadium, Guwahati 174.600000
M Chinnaswamy Stadium, Bengaluru 172.979798
Arun Jaitley Stadium, Delhi 171.773196
Name: team_runs, dtype: float64

In [101… venue_count = venue_scoring.groupby("venue")["match_id"].count()

venue_analysis = [Link]([venue_avg, venue_count], axis=1)


venue_analysis.columns = ["avg_score", "match_count"]

venue_analysis.sort_values("avg_score", ascending=False).head(10)

[Link] 18/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[101… avg_score match_count

venue

Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium, Visakhapatnam 208.750000 4

Sawai Mansingh Stadium, Jaipur 187.235294 17

Narendra Modi Stadium, Ahmedabad 186.636364 33

Himachal Pradesh Cricket Association Stadium, Dharamsala 183.133333 15

Brabourne Stadium 180.400000 10

Brabourne Stadium, Mumbai 177.411765 17

Bharat Ratna Shri Atal Bihari Vajpayee Ekana Cricket Stadium, Lucknow 175.363636 22

Barsapara Cricket Stadium, Guwahati 174.600000 5

M Chinnaswamy Stadium, Bengaluru 172.979798 99

Arun Jaitley Stadium, Delhi 171.773196 97

In [102… df[df["venue"].[Link]("Chandigarh", case=False, na=False)]["venue"].value_counts()

Out[102… venue
Punjab Cricket Association IS Bindra Stadium, Mohali, Chandigarh 14476
Maharaja Yadavindra Singh International Cricket Stadium, New Chandigarh 2561
Name: count, dtype: int64

In [103… match_df[match_df["ipl_season"] == 2025]["venue"].value_counts()

[Link] 19/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[103… venue
Narendra Modi Stadium, Ahmedabad 9
Bharat Ratna Shri Atal Bihari Vajpayee Ekana Cricket Stadium, Lucknow 8
Eden Gardens, Kolkata 7
Wankhede Stadium, Mumbai 7
Sawai Mansingh Stadium, Jaipur 7
Arun Jaitley Stadium, Delhi 7
Rajiv Gandhi International Stadium, Uppal, Hyderabad 6
MA Chidambaram Stadium, Chepauk, Chennai 6
Maharaja Yadavindra Singh International Cricket Stadium, New Chandigarh 6
M Chinnaswamy Stadium, Bengaluru 5
Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium, Visakhapatnam 2
Barsapara Cricket Stadium, Guwahati 2
Himachal Pradesh Cricket Association Stadium, Dharamsala 2
Name: count, dtype: int64

In [141… team_scoring = first_innings.groupby(


["match_id", "batting_team"]
)["team_runs"].max().reset_index()

team_avg_score = team_scoring.groupby("batting_team")["team_runs"].mean().sort_values(ascending=False)

team_avg_score

[Link] 20/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[141… batting_team
Lucknow Super Giants 185.090909
Gujarat Titans 184.407407
Sunrisers Hyderabad 170.245098
Chennai Super Kings 170.088235
Royal Challengers Bengaluru 169.478261
Mumbai Indians 168.405594
Punjab Kings 165.604317
Kolkata Knight Riders 165.015873
Rajasthan Royals 164.214953
Delhi Capitals 162.781513
Gujarat Lions 161.928571
Rising Pune Supergiants 161.800000
Deccan Chargers 157.325581
Pune Warriors 148.650000
Kochi Tuskers Kerala 144.142857
Name: team_runs, dtype: float64

In [147… team_match_count = team_scoring.groupby("batting_team")["match_id"].count()

team_analysis = [Link]([team_avg_score, team_match_count], axis=1)


team_analysis.columns = ["avg_score", "match_count"]

team_analysis.sort_values("avg_score", ascending=False)

[Link] 21/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[147… avg_score match_count

batting_team

Lucknow Super Giants 185.090909 33

Gujarat Titans 184.407407 27

Sunrisers Hyderabad 170.245098 102

Chennai Super Kings 170.088235 136

Royal Challengers Bengaluru 169.478261 138

Mumbai Indians 168.405594 143

Punjab Kings 165.604317 139

Kolkata Knight Riders 165.015873 126

Rajasthan Royals 164.214953 107

Delhi Capitals 162.781513 119

Gujarat Lions 161.928571 14

Rising Pune Supergiants 161.800000 15

Deccan Chargers 157.325581 43

Pune Warriors 148.650000 20

Kochi Tuskers Kerala 144.142857 7

In [149… filtered_team_analysis = team_analysis[team_analysis["match_count"] >= 100]

filtered_team_analysis.sort_values("avg_score", ascending=False)

[Link] 22/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[149… avg_score match_count

batting_team

Sunrisers Hyderabad 170.245098 102

Chennai Super Kings 170.088235 136

Royal Challengers Bengaluru 169.478261 138

Mumbai Indians 168.405594 143

Punjab Kings 165.604317 139

Kolkata Knight Riders 165.015873 126

Rajasthan Royals 164.214953 107

Delhi Capitals 162.781513 119

In [151… team1_matches = match_df["team1"].value_counts()


team2_matches = match_df["team2"].value_counts()

total_matches_played = team1_matches.add(team2_matches, fill_value=0)

total_wins = match_df["match_won_by"].value_counts()

win_percentage = (total_wins / total_matches_played) * 100

win_percentage.sort_values(ascending=False)

[Link] 23/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[151… Gujarat Titans 61.666667


Chennai Super Kings 56.349206
Mumbai Indians 54.512635
Lucknow Super Giants 51.724138
Kolkata Knight Riders 51.136364
Rising Pune Supergiants 50.000000
Royal Challengers Bengaluru 48.888889
Rajasthan Royals 48.510638
Sunrisers Hyderabad 47.448980
Punjab Kings 45.075758
Delhi Capitals 44.194757
Gujarat Lions 43.333333
Kochi Tuskers Kerala 42.857143
Deccan Chargers 38.666667
Pune Warriors 26.086957
Name: count, dtype: float64

In [153… chasing_matches = match_df.copy()

# Create column for chasing team


chasing_matches["chasing_team"] = chasing_matches.apply(
lambda row: row["team1"] if row["win_type"] == "runs" else row["team2"],
axis=1
)

In [155… chasing_wins = match_df[match_df["win_type"] == "wickets"]["match_won_by"].value_counts()

In [157… total_chasing = chasing_matches["chasing_team"].value_counts()

In [159… chasing_win_pct = (chasing_wins / total_chasing) * 100

chasing_win_pct.sort_values(ascending=False)

[Link] 24/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[159… Gujarat Lions 92.307692


Kochi Tuskers Kerala 66.666667
Rising Pune Supergiants 61.538462
Gujarat Titans 59.459459
Rajasthan Royals 58.974359
Delhi Capitals 58.196721
Kolkata Knight Riders 56.934307
Royal Challengers Bengaluru 52.238806
Punjab Kings 51.639344
Mumbai Indians 49.019608
Chennai Super Kings 48.611111
Sunrisers Hyderabad 46.464646
Pune Warriors 46.153846
Deccan Chargers 37.931034
Lucknow Super Giants 33.333333
Name: count, dtype: float64

In [ ]:

In [161… defending_wins = match_df[match_df["win_type"] == "runs"]["match_won_by"].value_counts()

In [163… defending_matches = chasing_matches.copy()

defending_matches["defending_team"] = defending_matches.apply(
lambda row: row["team1"],
axis=1
)

total_defending = defending_matches["defending_team"].value_counts()

In [165… defending_win_pct = (defending_wins / total_defending) * 100

defending_win_pct.sort_values(ascending=False)

[Link] 25/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[165… Lucknow Super Giants 60.606061


Gujarat Titans 55.555556
Mumbai Indians 53.146853
Chennai Super Kings 52.941176
Rising Pune Supergiants 46.666667
Sunrisers Hyderabad 46.078431
Kolkata Knight Riders 45.238095
Royal Challengers Bengaluru 44.927536
Rajasthan Royals 42.056075
Deccan Chargers 41.860465
Punjab Kings 40.287770
Delhi Capitals 39.495798
Pune Warriors 30.000000
Kochi Tuskers Kerala 28.571429
Gujarat Lions 7.142857
Name: count, dtype: float64

In [167… match_df = match_df.merge(


[Link]("match_id")["city"].first(),
on="match_id",
how="left"
)

In [191… home_city_map = {
"Chennai Super Kings": ["Chennai"],
"Mumbai Indians": ["Mumbai"],
"Royal Challengers Bengaluru": ["Bengaluru"],
"Kolkata Knight Riders": ["Kolkata"],
"Rajasthan Royals": ["Jaipur"],
"Delhi Capitals": ["Delhi"],
"Sunrisers Hyderabad": ["Hyderabad"],
"Punjab Kings": ["Mohali", "New Chandigarh"],
"Gujarat Titans": ["Ahmedabad"],
"Lucknow Super Giants": ["Lucknow"]
}

In [189… def determine_home(row):


for team, cities in home_city_map.items():
if row["venue"] in cities:
if row["team1"] == team:

[Link] 26/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

return team
elif row["team2"] == team:
return team
return None

match_df["home_team"] = match_df.apply(determine_home, axis=1)

In [187… home_matches = match_df[match_df["home_team"].notna()]

home_win_pct = (
home_matches[home_matches["home_team"] == home_matches["match_won_by"]]
["home_team"].value_counts()
/
home_matches["home_team"].value_counts()
) * 100

home_win_pct.sort_values(ascending=False)

Out[187… Series([], Name: count, dtype: float64)

In [199… primary_home.head(130)

[Link] 27/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[199… ipl_season batting_team venue match_id

0 2008 Chennai Super Kings MA Chidambaram Stadium, Chepauk, Chennai 4

1 2008 Deccan Chargers Rajiv Gandhi International Stadium, Uppal, Hyd... 5

2 2008 Delhi Capitals Arun Jaitley Stadium, Delhi 4

3 2008 Kolkata Knight Riders Eden Gardens, Kolkata 5

4 2008 Mumbai Indians Dr DY Patil Sports Academy, Mumbai 2

... ... ... ... ...

125 2022 Sunrisers Hyderabad Wankhede Stadium, Mumbai 3

126 2023 Chennai Super Kings MA Chidambaram Stadium, Chepauk, Chennai 5

127 2023 Delhi Capitals Arun Jaitley Stadium, Delhi 2

128 2023 Gujarat Titans Narendra Modi Stadium, Ahmedabad 7

129 2023 Kolkata Knight Riders Eden Gardens, Kolkata 3

130 rows × 4 columns

In [201… match_df.iloc[0]

[Link] 28/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[201… match_id 335982


ipl_season 2008
date 2008-04-18
venue M Chinnaswamy Stadium, Bengaluru
city_x Bengaluru
match_won_by Kolkata Knight Riders
win_margin 140.0
win_type runs
result_type normal
toss_winner Royal Challengers Bengaluru
toss_decision field
team1 Kolkata Knight Riders
team2 Royal Challengers Bengaluru
toss_win_match_win False
first_innings_score 222
score_bucket 190+
city_y Bengaluru
home_team None
Name: 0, dtype: object

In [203… def determine_home_dynamic(row):


season = row["ipl_season"]
venue = row["venue"]

team1_home = primary_home[
(primary_home["ipl_season"] == season) &
(primary_home["batting_team"] == row["team1"])
]["venue"].values

team2_home = primary_home[
(primary_home["ipl_season"] == season) &
(primary_home["batting_team"] == row["team2"])
]["venue"].values

if len(team1_home) > 0 and venue == team1_home[0]:


return row["team1"]
elif len(team2_home) > 0 and venue == team2_home[0]:
return row["team2"]
else:
return None

[Link] 29/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

match_df["home_team"] = match_df.apply(determine_home_dynamic, axis=1)

In [205… match_df["home_team"].value_counts()

Out[205… home_team
Kolkata Knight Riders 107
Royal Challengers Bengaluru 106
Delhi Capitals 100
Chennai Super Kings 99
Mumbai Indians 99
Punjab Kings 80
Rajasthan Royals 76
Sunrisers Hyderabad 69
Gujarat Titans 26
Lucknow Super Giants 24
Pune Warriors 22
Deccan Chargers 20
Rising Pune Supergiants 11
Kochi Tuskers Kerala 5
Gujarat Lions 4
Name: count, dtype: int64

In [207… home_matches = match_df[match_df["home_team"].notna()]

home_wins = home_matches[
home_matches["home_team"] == home_matches["match_won_by"]
]

home_win_pct = (
home_wins["home_team"].value_counts() /
home_matches["home_team"].value_counts()
) * 100

home_win_pct.sort_values(ascending=False)

[Link] 30/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[207… home_team
Chennai Super Kings 62.626263
Mumbai Indians 60.606061
Rajasthan Royals 55.263158
Kolkata Knight Riders 54.205607
Gujarat Titans 53.846154
Sunrisers Hyderabad 52.173913
Royal Challengers Bengaluru 46.226415
Lucknow Super Giants 45.833333
Rising Pune Supergiants 45.454545
Punjab Kings 43.750000
Delhi Capitals 43.000000
Kochi Tuskers Kerala 40.000000
Pune Warriors 27.272727
Deccan Chargers 25.000000
Gujarat Lions 25.000000
Name: count, dtype: float64

In [209… season_runs = [Link]("ipl_season")["runs_total"].sum()


season_balls = [Link]("ipl_season")["valid_ball"].sum()

run_rate = (season_runs / season_balls) * 6

run_rate

[Link] 31/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[209… ipl_season
2008 8.310579
2009 7.492784
2010 8.128713
2011 7.727489
2012 7.828801
2013 7.689063
2014 8.204117
2015 8.373992
2016 8.310471
2017 8.411014
2018 8.647596
2019 8.415416
2020 8.289760
2021 8.051699
2022 8.540670
2023 8.993873
2024 9.560464
2025 9.636837
dtype: float64

In [211… import [Link] as plt

[Link]()
run_rate.plot(marker='o')
[Link]("IPL Run Rate Evolution")
[Link]("Season")
[Link]("Run Rate")
[Link]()

[Link] 32/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [213… close_runs = match_df[


(match_df["win_type"] == "runs") &
(match_df["win_margin"] <= 10)
].shape[0]

close_wickets = match_df[
(match_df["win_type"] == "wickets") &
(match_df["win_margin"] <= 2)
].shape[0]

total_matches = match_df.shape[0]

close_percentage = ((close_runs + close_wickets) / total_matches) * 100

[Link] 33/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

close_runs, close_wickets, close_percentage

Out[213… (128, 16, 12.318220701454234)

In [215… match_df["is_close"] = (
((match_df["win_type"] == "runs") & (match_df["win_margin"] <= 10)) |
((match_df["win_type"] == "wickets") & (match_df["win_margin"] <= 2))
)

close_by_season = match_df.groupby("ipl_season")["is_close"].mean() * 100

close_by_season

Out[215… ipl_season
2008 15.517241
2009 14.035088
2010 6.666667
2011 8.219178
2012 10.810811
2013 10.526316
2014 6.666667
2015 16.949153
2016 13.333333
2017 13.559322
2018 16.666667
2019 10.000000
2020 6.666667
2021 18.333333
2022 9.459459
2023 20.270270
2024 12.676056
2025 12.162162
Name: is_close, dtype: float64

In [219… super_over_match = [Link]("match_id")["is_super_over"].any().reset_index()

super_over_match.columns = ["match_id", "is_super_over"]

[Link] 34/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [221… match_df = match_df.merge(


super_over_match,
on="match_id",
how="left"
)

In [223… super_over_count = match_df["is_super_over"].sum()


super_over_pct = (super_over_count / match_df.shape[0]) * 100

super_over_count, super_over_pct

Out[223… (15, 1.2831479897348161)

In [225… dl_match = [Link]("match_id")["is_dl_method"].any().reset_index()

dl_match.columns = ["match_id", "is_dl_method"]

match_df = match_df.merge(
dl_match,
on="match_id",
how="left"
)

In [227… dl_count = match_df["is_dl_method"].sum()


dl_pct = (dl_count / match_df.shape[0]) * 100

dl_count, dl_pct

Out[227… (22, 1.8819503849443968)

In [229… top_runs = [Link]("batter")["runs_batter"].sum().sort_values(ascending=False)

top_runs.head(10)

[Link] 35/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[229… batter
V Kohli 8671
RG Sharma 7048
S Dhawan 6769
DA Warner 6567
SK Raina 5536
MS Dhoni 5439
KL Rahul 5235
AB de Villiers 5181
AM Rahane 5032
CH Gayle 4997
Name: runs_batter, dtype: int64

In [231… batter_stats = [Link]("batter").agg({


"runs_batter": "sum",
"balls_faced": "sum"
})

batter_stats["strike_rate"] = (
batter_stats["runs_batter"] /
batter_stats["balls_faced"]
) * 100

# Apply minimum filter


batter_stats = batter_stats[batter_stats["balls_faced"] >= 1000]

batter_stats.sort_values("strike_rate", ascending=False).head(10)

[Link] 36/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[231… runs_batter balls_faced strike_rate

batter

AD Russell 2655 1525 174.098361

N Pooran 2293 1359 168.727005

SP Narine 1780 1069 166.510758

Abhishek Sharma 1816 1114 163.016158

V Sehwag 2728 1755 155.441595

GJ Maxwell 2820 1818 155.115512

YBK Jaiswal 2166 1417 152.858151

AB de Villiers 5181 3411 151.890941

CH Gayle 4997 3346 149.342499

JC Buttler 4121 2760 149.311594

In [237… # Step 1: Total runs per batter per match


batter_match_runs = [Link](["match_id", "batter"])["runs_batter"].sum().reset_index()

# Step 2: Filter 50+ scores


fifties = batter_match_runs[batter_match_runs["runs_batter"] >= 50]

# Step 3: Count number of 50+ scores per batter


fifties_count = [Link]("batter")["match_id"].count().sort_values(ascending=False)

fifties_count.head(10)

[Link] 37/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[237… batter
V Kohli 72
DA Warner 66
S Dhawan 53
RG Sharma 49
KL Rahul 45
AB de Villiers 44
SK Raina 40
F du Plessis 39
CH Gayle 38
G Gambhir 36
Name: match_id, dtype: int64

In [239… batter_matches = [Link]("batter")["match_id"].nunique()


batter_runs = [Link]("batter")["runs_batter"].sum()

batter_avg = (batter_runs / batter_matches)

batter_avg = batter_avg[batter_matches >= 50] # minimum 50 matches

batter_avg.sort_values(ascending=False).head(10)

Out[239… batter
KL Rahul 38.777778
SE Marsh 36.072464
RD Gaikwad 35.742857
DA Warner 35.690217
CH Gayle 35.439716
JC Buttler 34.630252
MEK Hussey 34.086207
Shubman Gill 33.912281
V Kohli 33.478764
YBK Jaiswal 32.818182
dtype: float64

In [241… top_wickets = [Link]("bowler")["bowler_wicket"].sum().sort_values(ascending=False)

top_wickets.head(10)

[Link] 38/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[241… bowler
YS Chahal 221
B Kumar 198
PP Chawla 192
SP Narine 192
R Ashwin 187
JJ Bumrah 186
DJ Bravo 183
A Mishra 174
RA Jadeja 170
SL Malinga 170
Name: bowler_wicket, dtype: int64

In [243… bowler_stats = [Link]("bowler").agg({


"runs_bowler": "sum",
"valid_ball": "sum",
"bowler_wicket": "sum"
})

bowler_stats["economy"] = (
bowler_stats["runs_bowler"] /
bowler_stats["valid_ball"]
) * 6

# Minimum 1000 balls


bowler_stats = bowler_stats[bowler_stats["valid_ball"] >= 1000]

bowler_stats.sort_values("economy").head(10)

[Link] 39/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[243… runs_bowler valid_ball bowler_wicket economy

bowler

M Muralitharan 1706 1528 64 6.698953

SP Narine 4933 4351 192 6.802574

DW Steyn 2523 2182 97 6.937672

Harbhajan Singh 4030 3416 150 7.078454

Rashid Khan 3781 3202 158 7.084947

SL Malinga 3366 2827 170 7.143969

R Ashwin 5652 4710 187 7.200000

M Kartik 1388 1149 31 7.248042

JJ Bumrah 4059 3359 186 7.250372

SK Warne 1447 1194 57 7.271357

In [245… bowler_stats["strike_rate"] = (
bowler_stats["valid_ball"] /
bowler_stats["bowler_wicket"]
)

bowler_stats = bowler_stats[bowler_stats["bowler_wicket"] >= 50]

bowler_stats.sort_values("strike_rate").head(10)

[Link] 40/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[245… runs_bowler valid_ball bowler_wicket economy strike_rate

bowler

AD Russell 2863 1806 123 9.511628 14.682927

K Rabada 2741 1911 122 8.605965 15.663934

Imran Tahir 1703 1316 82 7.764438 16.048780

HV Patel 3579 2424 151 8.858911 16.052980

MA Starc 1537 1064 65 8.667293 16.369231

SL Malinga 3366 2827 170 7.143969 16.629412

DJ Bravo 4360 3120 183 8.384615 17.049180

YS Chahal 5032 3791 221 7.964126 17.153846

KK Ahmed 2328 1556 89 8.976864 17.483146

Arshdeep Singh 2570 1714 97 8.996499 17.670103

In [247… death_df = df[df["over"] >= 15]

In [249… death_batter = death_df.groupby("batter").agg({


"runs_batter": "sum",
"balls_faced": "sum"
})

death_batter["strike_rate"] = (
death_batter["runs_batter"] /
death_batter["balls_faced"]
) * 100

# Minimum filter
death_batter = death_batter[death_batter["balls_faced"] >= 500]

death_batter.sort_values("strike_rate", ascending=False).head(10)

[Link] 41/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[249… runs_batter balls_faced strike_rate

batter

AB de Villiers 1868 838 222.911695

AD Russell 1432 731 195.896033

V Kohli 1507 808 186.509901

RG Sharma 1527 843 181.138790

Yuvraj Singh 906 501 180.838323

MS Dhoni 3468 1982 174.974773

DJ Bravo 909 522 174.137931

SO Hetmyer 933 537 173.743017

KD Karthik 1904 1098 173.406193

DA Miller 1381 805 171.552795

In [251… death_bowler = death_df.groupby("bowler").agg({


"runs_bowler": "sum",
"valid_ball": "sum",
"bowler_wicket": "sum"
})

death_bowler["economy"] = (
death_bowler["runs_bowler"] /
death_bowler["valid_ball"]
) * 6

death_bowler["strike_rate"] = (
death_bowler["valid_ball"] /
death_bowler["bowler_wicket"]
)

# Minimum 300 balls in death overs


death_bowler = death_bowler[death_bowler["valid_ball"] >= 300]

[Link] 42/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

death_bowler.sort_values("economy").head(10)

Out[251… runs_bowler valid_ball bowler_wicket economy strike_rate

bowler

SP Narine 1287 1051 75 7.347288 14.013333

SL Malinga 1464 1117 108 7.863921 10.342593

JJ Bumrah 1837 1345 101 8.194796 13.316832

R Ashwin 825 601 36 8.236273 16.694444

Kuldeep Yadav 462 334 29 8.299401 11.517241

DW Steyn 877 634 50 8.299685 12.680000

CV Varun 422 304 17 8.328947 17.882353

Rashid Khan 812 576 41 8.458333 14.048780

PP Chawla 641 444 39 8.662162 11.384615

CH Morris 933 646 58 8.665635 11.137931

In [253… powerplay_df = df[df["over"] < 6]

In [257… pp_batter = powerplay_df.groupby("batter").agg({


"runs_batter": "sum",
"balls_faced": "sum"
})

pp_batter["strike_rate"] = (
pp_batter["runs_batter"] /
pp_batter["balls_faced"]
) * 100

pp_batter = pp_batter[pp_batter["balls_faced"] >= 500]

[Link] 43/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

pp_batter.sort_values("strike_rate", ascending=False).head(20)

[Link] 44/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[257… runs_batter balls_faced strike_rate

batter

SP Narine 1150 673 170.876672

Abhishek Sharma 1018 635 160.314961

YBK Jaiswal 1378 869 158.573072

JM Bairstow 944 625 151.040000

P Simran Singh 872 585 149.059829

PP Shaw 1347 919 146.572361

CA Lynn 779 540 144.259259

V Sehwag 1593 1105 144.162896

SA Yadav 936 668 140.119760

JC Buttler 1688 1205 140.082988

RA Tripathi 1013 726 139.531680

DA Warner 3318 2403 138.077403

F du Plessis 2281 1673 136.341901

CH Gayle 2405 1774 135.569335

AC Gilchrist 1385 1033 134.075508

Ishan Kishan 1337 999 133.833834

SK Raina 1560 1171 133.219471

WP Saha 1389 1047 132.664756

Shubman Gill 1714 1310 130.839695

Q de Kock 1931 1480 130.472973

[Link] 45/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

In [261… pp_bowler = powerplay_df.groupby("bowler").agg({


"runs_bowler": "sum",
"valid_ball": "sum",
"bowler_wicket": "sum"
})

pp_bowler["economy"] = (
pp_bowler["runs_bowler"] /
pp_bowler["valid_ball"]
) * 6

pp_bowler["strike_rate"] = (
pp_bowler["valid_ball"] /
pp_bowler["bowler_wicket"]
)

pp_bowler = pp_bowler[pp_bowler["valid_ball"] >= 500]

pp_bowler.sort_values("economy").head(10)

[Link] 46/47
3/4/26, 9:30 PM 03_exploratory_data_analysis

Out[261… runs_bowler valid_ball bowler_wicket economy strike_rate

bowler

SL Malinga 1161 1110 37 6.275676 30.000000

DW Steyn 1199 1140 34 6.310526 33.529412

JC Archer 582 546 23 6.395604 23.739130

B Kumar 2542 2339 80 6.520735 29.237500

P Kumar 1754 1572 40 6.694656 39.300000

JJ Bumrah 1282 1144 41 6.723776 27.902439

Z Khan 1452 1292 52 6.743034 24.846154

SP Narine 1059 924 27 6.876623 34.222222

MM Patel 762 654 29 6.990826 22.551724

Sandeep Sharma 1976 1690 62 7.015385 27.258065

In [ ]:

[Link] 47/47

You might also like