0% found this document useful (0 votes)

50 views27 pages

Lung Cancer Data Analysis Project

Uploaded by

Ekansh Girdhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views27 pages

Lung Cancer Data Analysis Project

Uploaded by

Ekansh Girdhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

INDEX

1. Introduction
2. Features of the Project
3. Code(Syntax)
4. Output
5. Bibliography
6. Conclusion
CERTIFICATE

This is to certify that the Project entitled,

PROJECT ON DATA ANALYSIS OF THE LUNG
CANCER is a bonded work done by EKANSH
GIRDHAR of class XII-A AND ANUJ SALIL of class
XII-B, session 2024-25, and has been carried out
under my supervision and guidance.

……………………………..
……………………………
(Signature of Teacher) (Signature of
External Examiner)

…………………………….
(Signature of Principal)
ACKNOWLEDGEMENTS

I would like to express my sincere gratitude

towards my project guide and Informatics
Practices teacher, Ms. Pooja Thakur for
guiding me throughout the making of this
project. Her valuable advice helped me
immensely in the successful completion of
this project. I would also like to thank our
school principal, Ms. Meenu Kanwar, for her
coordination in extending every possible
support.
INTRODUCTION

Lung cancer is one of the most common and

serious types of cancer, causing a large number
of deaths worldwide. Understanding the disease,
its causes, and how it progresses is essential for
improving diagnosis, treatment, and survival
rates. However, the complexity of lung cancer
makes it a challenge for doctors and researchers.

This project uses a detailed dataset of 23,658

lung cancer patients to study the disease. The
data includes basic information like age, gender,
and ethnicity, as well as lifestyle details such as
smoking history and the number of years a
person smoked. It also contains important
medical details like the size and location of the
tumor, the stage of cancer, and the type of
treatment received. Blood test results and other
health indicators, such as hemoglobin, glucose,
and creatinine levels, provide additional insights.

By analyzing this information, the project aims to

find patterns and connections that can help
doctors make better decisions about diagnosis
and treatment. These insights could lead to
better care for patients and contribute to the
ongoing fight against lung cancer.

The dataset used for this project is obtained from

[Link] and is given ahead
I have done the analysis using Python Pandas on
windows machine but the project can be run on
any machine supporting Python and Pandas.
Aside from pandas, matplotlib python module has
also been used for visualization of this dataset.

FEATURES OF THE PROJECT

The whole project is majorly divided into four

major parts of reading, analysis and visualization.

It includes:
[Link] the data of csv file
[Link] the Dataframe Statistics
[Link] Analysis (Detailed)
[Link] Analysis (Comprehensive)
[Link] graphs using the data from csv file
(data visualization)
CODE
import pandas as pd
import numpy as np
import [Link] as plt
df = pd.read_csv("C:\\Users\\anujs\\Downloads\\[Link]",nrows=250)
while(True):
print("Main Menu")
print("1. Fetch data")
print("2. Dataframe Statistics")
print("3. Detailed and comprehensive Data Analysis")
print("4. Data Visualization")
print("5. Exit")
ch=int(input("Enter your choice:"))
if ch==1:
print(df)
elif ch==2:
while (True):
print("Dataframe Statistics Menu")
print("1. Display the Transpose")
print("2. Display all column names")
print("3. Display the indexes")
print("4. Display the shape")
print("5. Display the dimension")
print("6. Display the data types of all columns")
print("7. Display the size")
print("8. Modify Dataset")
print("9. Exit")
ch2=int(input("Enter choice"))
if ch2==1:
print(df.T)
elif ch2==2:
print([Link])
elif ch2==3:
print([Link])
elif ch2==4:
print([Link])
elif ch2==5:
print([Link])
elif ch2==6:
print([Link])
elif ch2==7:
print([Link])
elif ch2==8:
while(True) :
print( "Modify the Dataset")
print( "1. Add a column")
print( "2. Add a row")
print( "3. Remove a Row")
print( "4. Remove a Column")
print( "5. Exit")
choice=int(input( "Enter your choice: "))
if choice==1:
i=input("Enter the column name")
b=input("Enter the values needed to be added")
df[i]=b
print(df[i])
elif choice==2:
i=input("Enter the row name")
b=input("Enter the values needed to be added")
[Link][i]=b
print([Link][i])
elif choice==3:
a=input("Enter the row needed to be deleted")
[Link](a,inplace=True)
print(a," row deleted")
elif choice==4:
a=input("Enter the column needed to be deleted")
[Link](a,axis=1,inplace=True)
print(a," column deleted")
elif choice==5:
break
elif ch2==9:
break
elif ch==3:
while(True):
print('Data Extraction and Analysis:')
print('1. Display all Records')
print('2. Extract patients data')
print('3. Data Analysis')
print('4. Exit')
ch3=int(input("Enter choice"))
if ch3==1:
print('Display Data:')
print(df)
elif ch3==2:
while(True):
print('Extract Data:')
print('1. Top 5 Records')
print('2. Bottom 5 Records')
print('3. Specific number of records from the top')
print('4. Specific number of records from the bottom')
print('5. Sort Based on Category')
print('6. Exit')
e=int(input("Enter choice"))
if e==1:
print([Link](5))
elif e==2:
print([Link](5))
elif e==3:
i_top=int(input("Enter no of records to be extracted from
the top"))
print([Link](i_top))
elif e==4:
i_bottom=int(input("Enter no of records to be extracted
from the bottom"))
print([Link](i_bottom))
elif e==5:
while (True):
print('Sort based on category:')
print('1. Gender')
print('2. Ethnicity')
print('3. Smoking History of patient')
print('4. Smoking History in family')
print('5. Tumour Location')
print('6. Treatment Used')
print('7. Exit')
oi=int(input("Enter choice"))
if oi==1:
gender=int(input('Gender: 1-Male | 2-Female' ))
if gender==1:
print(df[[Link]=="Male"])
elif gender==2:
print(df[[Link]=="Female"])
elif oi==2:
while(True):
print('Ethnicity:')
print('1-Hispanic')
print('2-Caucasian')
print('3-African American')
print('4-Asian')
print('5-Other')
print('6. Exit')
ethnicity=int(input("Enter choice"))
if ethnicity==1:
print(df[[Link]=="Hispanic"])
elif ethnicity==2:
print(df[[Link]=="Caucasian"])
elif ethnicity==3:
print(df[[Link]=="African American"])
elif ethnicity==4:
print(df[[Link]=="Asian"])
elif ethnicity==5:
print(df[[Link]=="Other"])
elif ethnicity==6:
break
elif oi==3:
while(True):
print('Smoking History of patient:')
print('1-Current Smoker')
print('2-Never Smoked')
print('3-Former Smoker')
print('4. Exit')
smoke=int(input("Enter choice"))
if smoke==1:
print(df[df.Smoking_History=="Current
Smoker"])
elif smoke==2:
print(df[df.Smoking_History=="Never Smoked"])
elif smoke==3:
print(df[df.Smoking_History=="Former
Smoker"])
elif smoke==4:
break
elif oi==4:
history=int(input('Smoking History: 1-Yes | 2-No'))
if history==1:
print(df[df.Family_History=="Yes"])
elif history==2:
print(df[df.Family_History=="No"])
elif oi==5:
while(True):
print('Tumour location:')
print('1-Upper Lobe')
print('2-Middle Lobe')
print('3-Lower Lobe')
print('4. Exit')
loc=int(input("Enter choice"))
if loc==1:
print(df[df.Tumor_Location=="Upper Lobe"])
elif loc==2:
print(df[df.Tumor_Location=="Middle Lobe"])
elif loc==3:
print(df[df.Tumor_Location=="Lower Lobe"])
elif loc==4:
break
elif oi==6:
while(True):
print('Treatment Used:')
print('1-Surgery')
print('2-Radiation Therapy')
print('3-Chemotherapy')
print('4-Targeted Therapy')
print('5. Exit')
method=int(input("Enter choice"))
if method==1:
print(df[[Link]=="Surgery"])
elif method==2:
print(df[[Link]=="Radiation Therapy"])
elif method==3:
print(df[[Link]=="Chemotherapy"])
elif method==4:
print(df[[Link]=="Targeted Therapy"])
elif method==5:
break
elif oi==7:
break
elif e==6:
break

if ch3==3:
while(True):
print('Data Analysis:')
print('1. Count of all ethnic groups')
print('2. Average age of patients')
print('3. Oldest patient')
print('4. Youngest patient')
print('5. Maximum no. of smoking packs consumed by a
patient yearly')
print('6. Minimum no. of smoking packs consumed by a
patient yearly')
print('7. No. of patients having a history of smoking in both
personal and family')
print('8. Exit')
data = int(input('Enter the Choice'))
if data==1:
m=[Link]("Ethnicity")
print([Link]())
elif data==2:
print([Link]())
elif data==3:
print([Link]())
elif data==4:
print([Link]())
elif data==5:
print(df.Smoking_Pack_Years.max())
elif data==6:
print(df.Smoking_Pack_Years.min())
elif data==7:
c=0
p=len([Link])
for x in range(p):
if ([Link][x,"Smoking_History"] in ("Current
Smoker","Former Smoker")) and [Link][x,"Family_History"]=="Yes":
c+=1
print(c)
elif data==8:
break
elif ch3==4:
break

elif ch==4:
while(True):
print("Line Plot Sub Menu")
print("1. graph between male or female paitents ")
print("2. pie chart different ethinicity")
print('3. graph b/w Smoking history')
print('4. family history')
print('5. histogram for age category')
print("6. Exit")
ch4=int(input("Enter choice"))
if ch4==1:
gender= df['Gender'].value_counts()
[Link](kind='bar')
[Link]('Gender Distribution')
[Link]('Gender')
[Link]('Number of Patients')
[Link]()
elif ch4==2:
m = [Link]("Ethnicity").size()
[Link]([Link],labels=[Link])
[Link]("Distribution by Ethnicity")
[Link]()
elif ch4==3:
smoking_history= df['Smoking_History'].value_counts()
smoking_history.plot(kind='bar')
[Link]('Distribution of Smoking History')
[Link]('Smoking History')
[Link]('Number of Patients')
[Link](rotation=45)
[Link]()
elif ch4==4:
family_history = df['Family_History'].value_counts()
family_history.plot(kind='bar', color='lightgreen',
edgecolor='black')
[Link]('Distribution of Family History')
[Link]('Family History')
[Link]('Number of Patients')
[Link](rotation=45)
[Link]()
elif ch4==5:
[Link](df['Age'].dropna(), rwidth=0.8)
[Link]('Distribution by Age')
[Link]('Age')
[Link]('Number of Patients')
[Link]()
elif ch==5:
break
OUTPUT
1. Fetch Data

2. Dataframe Statistics
a. Display the Transpose

b. Display all column names

c. Display the indexes

d. Display the shape

e. Display the dimension

f. Display the data types of all columns

g. Display the size

h. Modify Dataset
(i) Add a column
(ii) Add a row

(iii) Remove a row

(iv) Remove a column

3. Detailed and comprehensive Data Analysis

a. Display all records

b. Extract patient data

(i) Top 5 Records

(ii) Bottom 5 Records

(iii) Specific number of records from the top

(iv) Specific number of records from the bottom

(v) Sort Based on Category

a. Gender

b. Ethnicity

c. Smoking History of patient

d. Smoking History in family

e. Tumour Location

f. Treatment Used

4. Comprehensive Data Analysis

(i) Count of all ethnic groups

(ii) Average age of patients

(iii) Oldest patient

(iv) Youngest patient

(v) Maximum no. of smoking packs consumed by a patient yearly

(vi) Minimum no. of smoking packs consumed by a patient yearly

(vii) No. of patients having a history of smoking in both personal and

family

4. Data Visualization
a. Graph between Male or Female paitents

b. Pie chart Different Ethnicity

c. Graph b/w Smoking History

d. Family History
e. Histogram for Age Category
BIBLIOGRAPHY

The following books/websites helped me in

the completion of this project.

• Informatics Practices NCERT

• Informatics Practices by Preeti Arora
• [Link]
• [Link]
• [Link]
CONCLUSION

Through this project, I have garnered an

extensive knowledge of the Pandas, and
PyPlot modules and their workings. I have
also understood the importance of data
analysis through its practical application.

Common questions

In this project, matplotlib is used to create bar graphs that visually represent the distribution of smoking histories among patients. This helps to quickly identify patterns, such as the proportion of current versus former smokers, which can influence understanding of the risk factors and effectiveness of interventions for different smoking groups .

Utilizing a dataset from Kaggle allows for access to a large, diverse pool of information which is crucial for conducting robust analyses. Kaggle data typically comes with good documentation and a community of data scientists who can provide insights and verify findings, enhancing this project's credibility in identifying trends in lung cancer diagnosis and treatment .

Data visualization is crucial for interpreting complex datasets like those related to lung cancer, as it can reveal trends and patterns that are not immediately apparent. This project uses the Python modules matplotlib and pandas for data visualization, creating graphs and charts that facilitate understanding of patient demographics, treatment methods, and disease characteristics .

Including lifestyle details such as smoking history provides context for understanding the etiology and progression of lung cancer. This project highlights the role of smoking in tumor development and helps assess the effectiveness of various treatments on smokers versus non-smokers. Such data enables researchers to focus on preventative measures and individualized treatment plans .

Lung cancer is complex, making diagnosis and treatment challenging due to its diverse causes and progression patterns. This project helps address these difficulties by analyzing a detailed dataset of 23,658 lung cancer patients to identify patterns and connections that can inform better clinical decisions. The analysis focuses on factors such as age, gender, lifestyle, tumor characteristics, and treatment types to improve diagnosis and treatment strategies .

By allowing modifications like adding or removing rows and columns, the project ensures the dataset remains relevant to ongoing analyses. This capability can be crucial for hypothesis testing, where new variables need to be introduced, or redundant data removed. It increases flexibility in examining how different factors affect patient outcomes, providing a dynamic framework for exploratory data analysis .

The project employs Python's pandas library to read, manipulate, and analyze lung cancer datasets, allowing for operations such as fetching data, displaying statistics, and extracting insights. Pandas enables detailed and comprehensive data analysis by providing functions to assess patient demographics, smoking history, and treatment modalities, which helps in identifying meaningful patterns in patient data .

Demographic data such as age, gender, and ethnicity can reveal patterns regarding the prevalence and characteristics of lung cancer in specific populations. This project's analysis can uncover trends like which demographics are more susceptible to smoking-related lung cancer or how certain ethnic groups respond to specific treatments. These insights are critical for tailored public health interventions and personalized medicine .

Cross-referencing diverse sources, such as books and online resources, supports comprehensive learning and problem-solving. For this project, consulting resources like NCERT and online platforms like stackoverflow.com allowed for troubleshooting coding issues and gaining theoretical insights, which were essential for effective implementation and interpretation of data analysis techniques .

While Pandas is powerful for data manipulation, it may limit analysis due to its in-memory data processing, which can consume lots of RAM, especially with large datasets. This could restrict the project's scalability and performance. Moreover, Pandas provides limited machine learning functionalities compared to more advanced packages like SciKit-Learn, potentially affecting predictive analysis outcomes .

Library Management System Project Report
No ratings yet
Library Management System Project Report
19 pages
Class XII Result Analysis Tool
No ratings yet
Class XII Result Analysis Tool
15 pages
Student Management System in Python
No ratings yet
Student Management System in Python
9 pages
Hotel Management System Overview
No ratings yet
Hotel Management System Overview
25 pages
Data Handling and Visualization Guide
No ratings yet
Data Handling and Visualization Guide
20 pages
Class 12 Informatics Practices Syllabus
No ratings yet
Class 12 Informatics Practices Syllabus
27 pages
Pandas Programming Practical Tasks
100% (1)
Pandas Programming Practical Tasks
7 pages
Informatics Practices Practical File 2023-24
No ratings yet
Informatics Practices Practical File 2023-24
51 pages
Class 12 IP Dataframe Q&A Guide
No ratings yet
Class 12 IP Dataframe Q&A Guide
4 pages
Pandas Series Worksheet for Class XII
No ratings yet
Pandas Series Worksheet for Class XII
5 pages
Informatics Practices Practical File 2023-24
No ratings yet
Informatics Practices Practical File 2023-24
6 pages
MySQL Queries for Informatics Project
No ratings yet
MySQL Queries for Informatics Project
22 pages
Informatics Practices Record 2022-23
No ratings yet
Informatics Practices Record 2022-23
92 pages
Indian Literacy Rate Analysis System
No ratings yet
Indian Literacy Rate Analysis System
33 pages
Informatics Practices Exam Papers 2022-23
No ratings yet
Informatics Practices Exam Papers 2022-23
8 pages
Advanced Pandas DataFrame Operations
No ratings yet
Advanced Pandas DataFrame Operations
71 pages
Data Transfer: CSV and DataFrames
No ratings yet
Data Transfer: CSV and DataFrames
30 pages
Anurag Mondal's Informatics Practices Record
No ratings yet
Anurag Mondal's Informatics Practices Record
43 pages
Class 12 IP Pandas Series Notes
No ratings yet
Class 12 IP Pandas Series Notes
5 pages
Informatics Practices Question Bank XI
No ratings yet
Informatics Practices Question Bank XI
3 pages
Series vs DataFrame in Pandas
No ratings yet
Series vs DataFrame in Pandas
35 pages
Class 12 Informatics Practices File
0% (1)
Class 12 Informatics Practices File
52 pages
Class 12 Python Practical List 2025-26
No ratings yet
Class 12 Python Practical List 2025-26
27 pages
Class 12 Informatics Practices Projects
No ratings yet
Class 12 Informatics Practices Projects
46 pages
Class 12 MySQL Revision Tour Notes
No ratings yet
Class 12 MySQL Revision Tour Notes
26 pages
Pandas Data Handling and Analysis Guide
No ratings yet
Pandas Data Handling and Analysis Guide
20 pages
Class XI Informatics Practices Homework
No ratings yet
Class XI Informatics Practices Homework
2 pages
Pandas and Numpy Basics in Python
0% (1)
Pandas and Numpy Basics in Python
7 pages
Python Series and DataFrame Exercises
No ratings yet
Python Series and DataFrame Exercises
29 pages
Computer Science Practical Record 2025-26
No ratings yet
Computer Science Practical Record 2025-26
34 pages
Employee Data Analysis System
No ratings yet
Employee Data Analysis System
11 pages
Class 12 IP: Pandas Code Examples
No ratings yet
Class 12 IP: Pandas Code Examples
5 pages
Python Practical File for Class XII
No ratings yet
Python Practical File for Class XII
40 pages
Python Programs for Data Science Practices
No ratings yet
Python Programs for Data Science Practices
39 pages
Practical File for Informatics Practices
No ratings yet
Practical File for Informatics Practices
87 pages
Class XII Informatics Practices Pre-Board Exam 2025-26
No ratings yet
Class XII Informatics Practices Pre-Board Exam 2025-26
8 pages
Class 12 Informatics Practices Guide
No ratings yet
Class 12 Informatics Practices Guide
13 pages
Class 12 IP Sample Question Paper 2024-25
No ratings yet
Class 12 IP Sample Question Paper 2024-25
172 pages
Informatics Practices Practical Record
No ratings yet
Informatics Practices Practical Record
27 pages
Practical DataFrame Operations Guide
No ratings yet
Practical DataFrame Operations Guide
8 pages
Pandas Data Handling & Python Basics
No ratings yet
Pandas Data Handling & Python Basics
25 pages
Class 12 IP Project: SQL & Python Tasks
No ratings yet
Class 12 IP Project: SQL & Python Tasks
5 pages
Informatics Practices-Xii-Model Test Paper-1
No ratings yet
Informatics Practices-Xii-Model Test Paper-1
6 pages
IP065 Periodic Test IV Paper
No ratings yet
IP065 Periodic Test IV Paper
6 pages
Practical Record for Informatics 2022-23
0% (1)
Practical Record for Informatics 2022-23
22 pages
Class 12 IP Practical File 2025-26
No ratings yet
Class 12 IP Practical File 2025-26
10 pages
Class 12 IP Practical File 2025-26
No ratings yet
Class 12 IP Practical File 2025-26
2 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
28 pages
SQL Queries and Pandas Series Worksheet
No ratings yet
SQL Queries and Pandas Series Worksheet
72 pages
Class 12 IP DataFrame Questions
No ratings yet
Class 12 IP DataFrame Questions
8 pages
SQL Single Row Functions Explained
No ratings yet
SQL Single Row Functions Explained
3 pages
Overview of Pandas for Data Analysis
No ratings yet
Overview of Pandas for Data Analysis
30 pages
XII CS Mid-Term 2024-25
No ratings yet
XII CS Mid-Term 2024-25
13 pages
Python Code Output Prediction Guide
No ratings yet
Python Code Output Prediction Guide
11 pages
Syntax Error Corrections in Python Code
No ratings yet
Syntax Error Corrections in Python Code
19 pages
Student Performance Analysis Project
No ratings yet
Student Performance Analysis Project
20 pages
Project
No ratings yet
Project
23 pages
Diet Management Project Overview
No ratings yet
Diet Management Project Overview
14 pages
Bank Customer Churn Analysis Project
No ratings yet
Bank Customer Churn Analysis Project
15 pages
Air Pollution Data Analysis Tool
No ratings yet
Air Pollution Data Analysis Tool
21 pages
Trigonometry Applications in Astronomy
No ratings yet
Trigonometry Applications in Astronomy
5 pages
Full Wave Bridge Rectifier Project
No ratings yet
Full Wave Bridge Rectifier Project
15 pages
Count Occurrences of 4 in List L1
No ratings yet
Count Occurrences of 4 in List L1
16 pages
Fish Tanks in Apartment Living
No ratings yet
Fish Tanks in Apartment Living
2 pages
Complaint for Defective Washing Machine
No ratings yet
Complaint for Defective Washing Machine
1 page
Power Sharing in Modern Democracies
No ratings yet
Power Sharing in Modern Democracies
2 pages
Oncology 2024 V3
No ratings yet
Oncology 2024 V3
19 pages
HPV Vaccination Awareness in India
No ratings yet
HPV Vaccination Awareness in India
9 pages
Acute Inflammation Mechanisms Overview
No ratings yet
Acute Inflammation Mechanisms Overview
49 pages
TMC Punjab Nursing Positions 2025
No ratings yet
TMC Punjab Nursing Positions 2025
6 pages
APTIS Reading Test: Set 1 Answers
No ratings yet
APTIS Reading Test: Set 1 Answers
7 pages
Breast Biopsy Solutions for Accuracy
No ratings yet
Breast Biopsy Solutions for Accuracy
1 page
Stasis Dermatitis and Leg Ulcers Guide
100% (1)
Stasis Dermatitis and Leg Ulcers Guide
76 pages
Breast History and Examination Guide
No ratings yet
Breast History and Examination Guide
4 pages
CNN Model for Skin Cancer Diagnosis
No ratings yet
CNN Model for Skin Cancer Diagnosis
5 pages
Female and Male Reproductive Anatomy
No ratings yet
Female and Male Reproductive Anatomy
4 pages
Many Faces of Pulmonary Aspergillosis: Clinical Review
No ratings yet
Many Faces of Pulmonary Aspergillosis: Clinical Review
11 pages
Rife's Revolutionary Medical Discoveries
100% (10)
Rife's Revolutionary Medical Discoveries
396 pages
V Moscow Hematology School 2025
No ratings yet
V Moscow Hematology School 2025
20 pages
Ch. 16 Dosage Based On Body Weight and Body Surface Area (BSA) Assignment
No ratings yet
Ch. 16 Dosage Based On Body Weight and Body Surface Area (BSA) Assignment
3 pages
New Text Document
0% (1)
New Text Document
23 pages
Understanding Autopsy Procedures
100% (2)
Understanding Autopsy Procedures
10 pages
Hypersplenism and Splenomegaly Guide
100% (1)
Hypersplenism and Splenomegaly Guide
37 pages
Omega-3 Health Benefits Digest
No ratings yet
Omega-3 Health Benefits Digest
4 pages
Top 12 Healthy Breakfast Foods
No ratings yet
Top 12 Healthy Breakfast Foods
6 pages
Introduction to Histopathology Basics
No ratings yet
Introduction to Histopathology Basics
6 pages
RevoLix 120 Laser System 510(k) Summary
No ratings yet
RevoLix 120 Laser System 510(k) Summary
12 pages
Pediatric Blood Transfusion Guidelines
No ratings yet
Pediatric Blood Transfusion Guidelines
2 pages
Gut Bacteria's Role in Colorectal Cancer
No ratings yet
Gut Bacteria's Role in Colorectal Cancer
27 pages
Vitamin C: Benefits and Safety Insights
No ratings yet
Vitamin C: Benefits and Safety Insights
11 pages
Nursing Interventions for Cancer Patients
No ratings yet
Nursing Interventions for Cancer Patients
28 pages
Cancer Incorporated Web Version PDF
No ratings yet
Cancer Incorporated Web Version PDF
247 pages
NAFLD Whitepaper
No ratings yet
NAFLD Whitepaper
12 pages
Chronic Leukemia Overview and Treatment
No ratings yet
Chronic Leukemia Overview and Treatment
29 pages
Nikki Sharp 5 Day Detox 3rd Edition
93% (14)
Nikki Sharp 5 Day Detox 3rd Edition
37 pages
Iris Melanoma Detection Using Image Processing
No ratings yet
Iris Melanoma Detection Using Image Processing
2 pages

Lung Cancer Data Analysis Project

Uploaded by

Lung Cancer Data Analysis Project

Uploaded by

INDEX

This is to certify that the Project entitled,

I would like to express my sincere gratitude

Lung cancer is one of the most common and

This project uses a detailed dataset of 23,658

By analyzing this information, the project aims to

The dataset used for this project is obtained from

FEATURES OF THE PROJECT

The whole project is majorly divided into four

b. Display all column names

c. Display the indexes

d. Display the shape

f. Display the data types of all columns

g. Display the size

(iii) Remove a row

(iv) Remove a column

3. Detailed and comprehensive Data Analysis

b. Extract patient data

(ii) Bottom 5 Records

(iii) Specific number of records from the top

(iv) Specific number of records from the bottom

(v) Sort Based on Category

c. Smoking History of patient

4. Comprehensive Data Analysis

(ii) Average age of patients

(iii) Oldest patient

(iv) Youngest patient

(v) Maximum no. of smoking packs consumed by a patient yearly

(vi) Minimum no. of smoking packs consumed by a patient yearly

(vii) No. of patients having a history of smoking in both personal and

b. Pie chart Different Ethnicity

c. Graph b/w Smoking History

The following books/websites helped me in

• Informatics Practices NCERT

Through this project, I have garnered an

Common questions

Explain how Python's matplotlib is used to gain insights into patient smoking histories in the context of this lung cancer project.

Explain how Python's matplotlib is used to gain insights into patient smoking histories in the context of this lung cancer project.

Discuss the significance of using a dataset from Kaggle for analyzing lung cancer data in this project.

Discuss the significance of using a dataset from Kaggle for analyzing lung cancer data in this project.

Why is data visualization important in the analysis of lung cancer data, and what tools does the project use for this purpose?

Why is data visualization important in the analysis of lung cancer data, and what tools does the project use for this purpose?

In what ways does incorporating lifestyle details like smoking history improve lung cancer research?

In what ways does incorporating lifestyle details like smoking history improve lung cancer research?

What are the challenges that lung cancer presents to doctors and researchers, and how does this project aim to address them?

What are the challenges that lung cancer presents to doctors and researchers, and how does this project aim to address them?

How does the project's approach to modifying the dataset enhance its analysis of lung cancer patients?

How does the project's approach to modifying the dataset enhance its analysis of lung cancer patients?

How does the project utilize Python Pandas to analyze lung cancer patient data?

How does the project utilize Python Pandas to analyze lung cancer patient data?

What insights can the demographic data of patients provide in the study of lung cancer, based on this project?

What insights can the demographic data of patients provide in the study of lung cancer, based on this project?

Evaluate the role of cross-referencing different sources such as books and websites in the successful completion of this lung cancer data analysis project.

Evaluate the role of cross-referencing different sources such as books and websites in the successful completion of this lung cancer data analysis project.

What might be the limitations of using the Pandas library for analyzing a lung cancer dataset?

What might be the limitations of using the Pandas library for analyzing a lung cancer dataset?

You might also like