03 Exploratory Data Analysis
03 Exploratory Data Analysis
In [72]: [Link]
In [73]: [Link]()
[Link] 1/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
<class '[Link]'>
RangeIndex: 278205 entries, 0 to 278204
Data columns (total 58 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 match_id 278205 non-null int64
1 date 278205 non-null object
2 innings 278205 non-null int64
3 batting_team 278205 non-null object
4 bowling_team 278205 non-null object
5 over 278205 non-null int64
6 ball 278205 non-null int64
7 batter 278205 non-null object
8 bat_pos 278205 non-null int64
9 runs_batter 278205 non-null int64
10 balls_faced 278205 non-null int64
11 bowler 278205 non-null object
12 valid_ball 278205 non-null int64
13 runs_extras 278205 non-null int64
14 runs_total 278205 non-null int64
15 runs_bowler 278205 non-null int64
16 runs_not_boundary 278205 non-null bool
17 extra_type 15133 non-null object
18 non_striker 278205 non-null object
19 non_striker_pos 278205 non-null int64
20 wicket_kind 13823 non-null object
21 player_out 13823 non-null object
22 fielders 10013 non-null object
23 runs_target 133903 non-null float64
24 review_batter 872 non-null object
25 team_reviewed 872 non-null object
26 review_decision 872 non-null object
27 umpire 872 non-null object
28 umpires_call 278205 non-null bool
29 player_of_match 278205 non-null object
30 match_won_by 273503 non-null object
31 toss_winner 278205 non-null object
32 toss_decision 278205 non-null object
33 venue 278205 non-null object
34 city 278205 non-null object
35 day 278205 non-null int64
[Link] 2/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [74]: [Link]()
[Link] 3/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[74]: match_id innings over ball bat_pos runs_batter balls_faced valid_ball runs_e
count 2.782050e+05 278205.000000 278205.000000 278205.000000 278205.000000 278205.000000 278205.000000 278205.000000 278205.00
mean 9.422687e+05 1.482914 9.193839 3.488855 3.612555 1.277378 0.967362 0.963182 0.06
std 3.817198e+05 0.502571 5.681511 1.708263 2.168978 1.651107 0.177687 0.188315 0.34
min 3.359820e+05 1.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.00
25% 5.483530e+05 1.000000 4.000000 2.000000 2.000000 0.000000 1.000000 1.000000 0.00
50% 1.082601e+06 1.000000 9.000000 3.000000 3.000000 1.000000 1.000000 1.000000 0.00
75% 1.304049e+06 2.000000 14.000000 5.000000 5.000000 1.000000 1.000000 1.000000 0.00
max 1.485779e+06 6.000000 19.000000 7.000000 11.000000 6.000000 1.000000 1.000000 7.00
In [75]: df["match_id"].nunique()
Out[75]: 1169
In [76]: df["ipl_season"].unique()
Out[76]: array([2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018,
2019, 2020, 2021, 2022, 2023, 2024, 2025], dtype=int64)
In [77]: df["ipl_season"].nunique()
Out[77]: 18
In [78]: df["batter"].nunique()
Out[78]: 703
In [79]: df["bowler"].nunique()
Out[79]: 550
[Link] 4/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
match_df.head()
Out[80]: match_id ipl_season date venue city match_won_by win_margin win_type result_type toss_winner toss_decision
M
Royal
2008- Chinnaswamy Kolkata Knight
0 335982 2008 Bengaluru 140.0 runs normal Challengers field
04-18 Stadium, Riders
Bengaluru
Bengaluru
Punjab
Cricket
2008- Chennai Super Chennai
1 335983 2008 Association Mohali 33.0 runs normal bat
04-19 Kings Super Kings
IS Bindra
Stadium, ...
Arun Jaitley
2008- Rajasthan
2 335984 2008 Stadium, Delhi Delhi Capitals 9.0 wickets normal bat
04-19 Royals
Delhi
Wankhede Royal
2008- Mumbai
3 335985 2008 Stadium, Mumbai Challengers 5.0 wickets normal bat
04-20 Indians
Mumbai Bengaluru
Eden
2008- Kolkata Knight Deccan
4 335986 2008 Gardens, Kolkata 5.0 wickets normal bat
04-20 Riders Chargers
Kolkata
[Link] 5/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
team_df = team_df.rename(columns={
"batting_team": "team1",
"bowling_team": "team2"
})
match_info = [Link]("match_id").agg({
"ipl_season": "first",
"date": "first",
"venue": "first",
"city": "first",
"match_won_by": "first",
"win_margin": "first",
"win_type": "first",
"result_type": "first",
"toss_winner": "first",
"toss_decision": "first"
}).reset_index()
match_df.head()
[Link] 6/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[81]: match_id ipl_season date venue city match_won_by win_margin win_type result_type toss_winner toss_decision
M
Royal
2008- Chinnaswamy Kolkata Knight
0 335982 2008 Bengaluru 140.0 runs normal Challengers field
04-18 Stadium, Riders
Bengaluru
Bengaluru
Punjab
Cricket C
2008- Chennai Super Chennai
1 335983 2008 Association Mohali 33.0 runs normal bat
04-19 Kings Super Kings
IS Bindra
Stadium, ...
Arun Jaitley
2008- Rajasthan Ra
2 335984 2008 Stadium, Delhi Delhi Capitals 9.0 wickets normal bat
04-19 Royals
Delhi
Wankhede Royal
2008- Mumbai M
3 335985 2008 Stadium, Mumbai Challengers 5.0 wickets normal bat
04-20 Indians
Mumbai Bengaluru
Eden
2008- Kolkata Knight Deccan
4 335986 2008 Gardens, Kolkata 5.0 wickets normal bat
04-20 Riders Chargers C
Kolkata
In [82]: match_df.shape
matches_per_season
[Link] 7/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[83]: ipl_season
2008 58
2009 57
2010 60
2011 73
2012 74
2013 76
2014 60
2015 59
2016 60
2017 59
2018 60
2019 60
2020 60
2021 60
2022 74
2023 74
2024 71
2025 74
Name: match_id, dtype: int64
In [84]: [Link]()
matches_per_season.plot(kind="line", marker="o")
[Link]("Matches Per Season")
[Link]("Season")
[Link]("Number of Matches")
[Link]()
[Link] 8/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
first_innings_score = first_innings.groupby(
["match_id", "ipl_season"]
)["team_runs"].max().reset_index()
first_innings_score.head()
[Link] 9/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
avg_first_innings
Out[86]: ipl_season
2008 160.965517
2009 150.263158
2010 164.783333
2011 152.369863
2012 157.540541
2013 155.894737
2014 163.066667
2015 166.254237
2016 162.600000
2017 165.779661
2018 172.466667
2019 166.733333
2020 169.500000
2021 159.316667
2022 171.121622
2023 182.729730
2024 189.591549
2025 188.837838
Name: team_runs, dtype: float64
[Link] 10/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [87]: [Link]()
avg_first_innings.plot(kind="line", marker="o")
[Link]("Average First Innings Score Per Season")
[Link]("Season")
[Link]("Average Score")
[Link]()
[Link] 11/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[88]: win_type
wickets 615
runs 531
Name: count, dtype: int64
In [89]: [Link]()
win_type_counts.plot(kind="bar")
[Link]("Win Type Distribution")
[Link]("Win Type")
[Link]("Number of Matches")
[Link]()
[Link] 12/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [91]: match_df["toss_win_match_win"].value_counts()
Out[91]: toss_win_match_win
True 591
False 578
Name: count, dtype: int64
[Link] 13/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[92]: 50.556030795551756
In [93]: match_df["toss_decision"].value_counts()
Out[93]: toss_decision
field 764
bat 405
Name: count, dtype: int64
Out[94]: toss_decision
bat 45.185185
field 53.403141
Name: toss_win_match_win, dtype: float64
chasing_trend
[Link] 14/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[95]: ipl_season
2008 58.620690
2009 50.877193
2010 46.666667
2011 53.424658
2012 54.054054
2013 48.684211
2014 61.666667
2015 40.677966
2016 65.000000
2017 54.237288
2018 53.333333
2019 58.333333
2020 48.333333
2021 61.666667
2022 50.000000
2023 44.594595
2024 50.704225
2025 50.000000
dtype: float64
In [96]: [Link]()
chasing_trend.plot(kind="line", marker="o")
[Link]("Chasing Win Percentage Per Season")
[Link]("Season")
[Link]("Chasing Win %")
[Link]()
[Link] 15/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
[Link] 16/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
labels=["Below 150","150-170","170-190","190+"]
)
bucket_analysis
venue_avg = venue_scoring.groupby("venue")["team_runs"].mean().sort_values(ascending=False)
venue_avg.head(10)
[Link] 17/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[100… venue
Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium, Visakhapatnam 208.750000
Sawai Mansingh Stadium, Jaipur 187.235294
Narendra Modi Stadium, Ahmedabad 186.636364
Himachal Pradesh Cricket Association Stadium, Dharamsala 183.133333
Brabourne Stadium 180.400000
Brabourne Stadium, Mumbai 177.411765
Bharat Ratna Shri Atal Bihari Vajpayee Ekana Cricket Stadium, Lucknow 175.363636
Barsapara Cricket Stadium, Guwahati 174.600000
M Chinnaswamy Stadium, Bengaluru 172.979798
Arun Jaitley Stadium, Delhi 171.773196
Name: team_runs, dtype: float64
venue_analysis.sort_values("avg_score", ascending=False).head(10)
[Link] 18/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
venue
Bharat Ratna Shri Atal Bihari Vajpayee Ekana Cricket Stadium, Lucknow 175.363636 22
Out[102… venue
Punjab Cricket Association IS Bindra Stadium, Mohali, Chandigarh 14476
Maharaja Yadavindra Singh International Cricket Stadium, New Chandigarh 2561
Name: count, dtype: int64
[Link] 19/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[103… venue
Narendra Modi Stadium, Ahmedabad 9
Bharat Ratna Shri Atal Bihari Vajpayee Ekana Cricket Stadium, Lucknow 8
Eden Gardens, Kolkata 7
Wankhede Stadium, Mumbai 7
Sawai Mansingh Stadium, Jaipur 7
Arun Jaitley Stadium, Delhi 7
Rajiv Gandhi International Stadium, Uppal, Hyderabad 6
MA Chidambaram Stadium, Chepauk, Chennai 6
Maharaja Yadavindra Singh International Cricket Stadium, New Chandigarh 6
M Chinnaswamy Stadium, Bengaluru 5
Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium, Visakhapatnam 2
Barsapara Cricket Stadium, Guwahati 2
Himachal Pradesh Cricket Association Stadium, Dharamsala 2
Name: count, dtype: int64
team_avg_score = team_scoring.groupby("batting_team")["team_runs"].mean().sort_values(ascending=False)
team_avg_score
[Link] 20/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[141… batting_team
Lucknow Super Giants 185.090909
Gujarat Titans 184.407407
Sunrisers Hyderabad 170.245098
Chennai Super Kings 170.088235
Royal Challengers Bengaluru 169.478261
Mumbai Indians 168.405594
Punjab Kings 165.604317
Kolkata Knight Riders 165.015873
Rajasthan Royals 164.214953
Delhi Capitals 162.781513
Gujarat Lions 161.928571
Rising Pune Supergiants 161.800000
Deccan Chargers 157.325581
Pune Warriors 148.650000
Kochi Tuskers Kerala 144.142857
Name: team_runs, dtype: float64
team_analysis.sort_values("avg_score", ascending=False)
[Link] 21/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
batting_team
filtered_team_analysis.sort_values("avg_score", ascending=False)
[Link] 22/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
batting_team
total_wins = match_df["match_won_by"].value_counts()
win_percentage.sort_values(ascending=False)
[Link] 23/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
chasing_win_pct.sort_values(ascending=False)
[Link] 24/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [ ]:
defending_matches["defending_team"] = defending_matches.apply(
lambda row: row["team1"],
axis=1
)
total_defending = defending_matches["defending_team"].value_counts()
defending_win_pct.sort_values(ascending=False)
[Link] 25/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [191… home_city_map = {
"Chennai Super Kings": ["Chennai"],
"Mumbai Indians": ["Mumbai"],
"Royal Challengers Bengaluru": ["Bengaluru"],
"Kolkata Knight Riders": ["Kolkata"],
"Rajasthan Royals": ["Jaipur"],
"Delhi Capitals": ["Delhi"],
"Sunrisers Hyderabad": ["Hyderabad"],
"Punjab Kings": ["Mohali", "New Chandigarh"],
"Gujarat Titans": ["Ahmedabad"],
"Lucknow Super Giants": ["Lucknow"]
}
[Link] 26/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
return team
elif row["team2"] == team:
return team
return None
home_win_pct = (
home_matches[home_matches["home_team"] == home_matches["match_won_by"]]
["home_team"].value_counts()
/
home_matches["home_team"].value_counts()
) * 100
home_win_pct.sort_values(ascending=False)
In [199… primary_home.head(130)
[Link] 27/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [201… match_df.iloc[0]
[Link] 28/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
team1_home = primary_home[
(primary_home["ipl_season"] == season) &
(primary_home["batting_team"] == row["team1"])
]["venue"].values
team2_home = primary_home[
(primary_home["ipl_season"] == season) &
(primary_home["batting_team"] == row["team2"])
]["venue"].values
[Link] 29/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [205… match_df["home_team"].value_counts()
Out[205… home_team
Kolkata Knight Riders 107
Royal Challengers Bengaluru 106
Delhi Capitals 100
Chennai Super Kings 99
Mumbai Indians 99
Punjab Kings 80
Rajasthan Royals 76
Sunrisers Hyderabad 69
Gujarat Titans 26
Lucknow Super Giants 24
Pune Warriors 22
Deccan Chargers 20
Rising Pune Supergiants 11
Kochi Tuskers Kerala 5
Gujarat Lions 4
Name: count, dtype: int64
home_wins = home_matches[
home_matches["home_team"] == home_matches["match_won_by"]
]
home_win_pct = (
home_wins["home_team"].value_counts() /
home_matches["home_team"].value_counts()
) * 100
home_win_pct.sort_values(ascending=False)
[Link] 30/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[207… home_team
Chennai Super Kings 62.626263
Mumbai Indians 60.606061
Rajasthan Royals 55.263158
Kolkata Knight Riders 54.205607
Gujarat Titans 53.846154
Sunrisers Hyderabad 52.173913
Royal Challengers Bengaluru 46.226415
Lucknow Super Giants 45.833333
Rising Pune Supergiants 45.454545
Punjab Kings 43.750000
Delhi Capitals 43.000000
Kochi Tuskers Kerala 40.000000
Pune Warriors 27.272727
Deccan Chargers 25.000000
Gujarat Lions 25.000000
Name: count, dtype: float64
run_rate
[Link] 31/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[209… ipl_season
2008 8.310579
2009 7.492784
2010 8.128713
2011 7.727489
2012 7.828801
2013 7.689063
2014 8.204117
2015 8.373992
2016 8.310471
2017 8.411014
2018 8.647596
2019 8.415416
2020 8.289760
2021 8.051699
2022 8.540670
2023 8.993873
2024 9.560464
2025 9.636837
dtype: float64
[Link]()
run_rate.plot(marker='o')
[Link]("IPL Run Rate Evolution")
[Link]("Season")
[Link]("Run Rate")
[Link]()
[Link] 32/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
close_wickets = match_df[
(match_df["win_type"] == "wickets") &
(match_df["win_margin"] <= 2)
].shape[0]
total_matches = match_df.shape[0]
[Link] 33/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
In [215… match_df["is_close"] = (
((match_df["win_type"] == "runs") & (match_df["win_margin"] <= 10)) |
((match_df["win_type"] == "wickets") & (match_df["win_margin"] <= 2))
)
close_by_season
Out[215… ipl_season
2008 15.517241
2009 14.035088
2010 6.666667
2011 8.219178
2012 10.810811
2013 10.526316
2014 6.666667
2015 16.949153
2016 13.333333
2017 13.559322
2018 16.666667
2019 10.000000
2020 6.666667
2021 18.333333
2022 9.459459
2023 20.270270
2024 12.676056
2025 12.162162
Name: is_close, dtype: float64
[Link] 34/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
super_over_count, super_over_pct
match_df = match_df.merge(
dl_match,
on="match_id",
how="left"
)
dl_count, dl_pct
top_runs.head(10)
[Link] 35/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[229… batter
V Kohli 8671
RG Sharma 7048
S Dhawan 6769
DA Warner 6567
SK Raina 5536
MS Dhoni 5439
KL Rahul 5235
AB de Villiers 5181
AM Rahane 5032
CH Gayle 4997
Name: runs_batter, dtype: int64
batter_stats["strike_rate"] = (
batter_stats["runs_batter"] /
batter_stats["balls_faced"]
) * 100
batter_stats.sort_values("strike_rate", ascending=False).head(10)
[Link] 36/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
batter
fifties_count.head(10)
[Link] 37/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[237… batter
V Kohli 72
DA Warner 66
S Dhawan 53
RG Sharma 49
KL Rahul 45
AB de Villiers 44
SK Raina 40
F du Plessis 39
CH Gayle 38
G Gambhir 36
Name: match_id, dtype: int64
batter_avg.sort_values(ascending=False).head(10)
Out[239… batter
KL Rahul 38.777778
SE Marsh 36.072464
RD Gaikwad 35.742857
DA Warner 35.690217
CH Gayle 35.439716
JC Buttler 34.630252
MEK Hussey 34.086207
Shubman Gill 33.912281
V Kohli 33.478764
YBK Jaiswal 32.818182
dtype: float64
top_wickets.head(10)
[Link] 38/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
Out[241… bowler
YS Chahal 221
B Kumar 198
PP Chawla 192
SP Narine 192
R Ashwin 187
JJ Bumrah 186
DJ Bravo 183
A Mishra 174
RA Jadeja 170
SL Malinga 170
Name: bowler_wicket, dtype: int64
bowler_stats["economy"] = (
bowler_stats["runs_bowler"] /
bowler_stats["valid_ball"]
) * 6
bowler_stats.sort_values("economy").head(10)
[Link] 39/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
bowler
In [245… bowler_stats["strike_rate"] = (
bowler_stats["valid_ball"] /
bowler_stats["bowler_wicket"]
)
bowler_stats.sort_values("strike_rate").head(10)
[Link] 40/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
bowler
death_batter["strike_rate"] = (
death_batter["runs_batter"] /
death_batter["balls_faced"]
) * 100
# Minimum filter
death_batter = death_batter[death_batter["balls_faced"] >= 500]
death_batter.sort_values("strike_rate", ascending=False).head(10)
[Link] 41/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
batter
death_bowler["economy"] = (
death_bowler["runs_bowler"] /
death_bowler["valid_ball"]
) * 6
death_bowler["strike_rate"] = (
death_bowler["valid_ball"] /
death_bowler["bowler_wicket"]
)
[Link] 42/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
death_bowler.sort_values("economy").head(10)
bowler
pp_batter["strike_rate"] = (
pp_batter["runs_batter"] /
pp_batter["balls_faced"]
) * 100
[Link] 43/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
pp_batter.sort_values("strike_rate", ascending=False).head(20)
[Link] 44/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
batter
[Link] 45/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
pp_bowler["economy"] = (
pp_bowler["runs_bowler"] /
pp_bowler["valid_ball"]
) * 6
pp_bowler["strike_rate"] = (
pp_bowler["valid_ball"] /
pp_bowler["bowler_wicket"]
)
pp_bowler.sort_values("economy").head(10)
[Link] 46/47
3/4/26, 9:30 PM 03_exploratory_data_analysis
bowler
In [ ]:
[Link] 47/47