0% found this document useful (0 votes)

6 views9 pages

Python

Python is a high-level, interpreted programming language known for its readability and support for multiple programming paradigms. It is widely used in data analysis due to its extensive libraries like Pandas and NumPy, which facilitate data manipulation, statistical analysis, and visualization. Key features include dynamic typing, platform independence, and a rich ecosystem for machine learning and automation.

Uploaded by

sahil.kumar03200311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Python

Uploaded by

sahil.kumar03200311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

What is Python?

Python is an interpreted, high-level, general-purpose programming language.

Created by Guido van Rossum and released in 1991, Python emphasizes code
readability with its clean and easy-to-understand syntax. It supports multiple
programming paradigms such as object-oriented, imperative, functional, and
procedural programming.

Key Features of Python:

 Easy to learn and use: Python has a simple syntax similar to English.
 Open-source: Freely available for use and modification.
 Extensive Libraries: Rich libraries such as Pandas, NumPy, Matplotlib, Seaborn, etc.
 Platform Independent: Python programs can run on any operating system.
 Interpreted Language: Code is executed line-by-line, which makes debugging easier.
 Dynamic Typing: No need to declare variable types explicitly.

Applications of Python in Data Analysis:

Python has become the most preferred language for data analysis because of:

 Data Cleaning and Preprocessing: Libraries like Pandas make it easy to filter,
manipulate, and clean datasets.
 Statistical Analysis: Libraries like SciPy and Statsmodels provide robust statistical
methods.
 Data Visualization: With Matplotlib and Seaborn, Python allows the creation of
interactive and publication-quality graphs.
 Machine Learning & Predictive Analytics: Integrated tools like scikit-learn and
TensorFlow.
 Automation: Automating repetitive data processing tasks

Basics of Python Programming

Variables and Data Types
A variable is a container for storing data values. Python has dynamically
typed variables — you don’t need to declare the type.

Common Data Types:

 int – integer numbers
 float – decimal numbers
 str – text (strings)
 bool – Boolean values (True, False)
 list – ordered, mutable sequence
 tuple – ordered, immutable sequence
 set – unordered collection of unique items
 dict – key-value pairs

Variables in Python
What is a Variable?
A variable in Python is a named storage location used to hold a value that can be
modified during program execution. It acts as a reference to a memory location where
data is stored.

x = 10

Rules for Naming Variables

1. Must begin with a letter (A–Z or a–z) or an underscore _.
2. The rest of the name can include letters, numbers, and underscores.
3. Case-sensitive – Age and age are different variables.
4. No reserved keywords (e.g., if, for, class, def, etc.).
5. Cannot contain spaces – use underscores (_) instead.
6. Should be meaningful – always use descriptive names for clarity.

Valid Variable Names:

name = "John" _age = 25 total_marks = 95.5

Invalid Variable Names:

2ndname = "Alice"
my name = "Bob"

Variable Assignment
Python supports multiple ways of assigning values to variables:

 Single assignment:
x=5

 Multiple assignment:
x, y, z = 1, 2, 3
 Same value to multiple variables:
a = b = c = 10
Variable Types (Implicit Typing)

Python is dynamically typed — the type of a variable is determined automatically

when a value is assigned.

x = 10 # int y = 3.14 # float name = "Ram" # str

Use type() to check the variable type:

type(x) # Output: <class 'int'>

2. List in Python
What is a List?
A List is a mutable, ordered collection of items. Items can be of any data type (integer,
string, float, another list, etc.).

fruits = ["apple", "banana", "cherry"]

Characteristics of Lists:
 Mutable: You can change, add, or remove items.
 Ordered: Maintains the order of insertion.
 Allows duplicate values.
 Supports indexing and slicing.

Common List Operations:

[Link]("mango") # Add item [Link]("banana") # Remove item fruits[0] = "kiwi" # Modify item

3. Tuple in Python
What is a Tuple?
A Tuple is an immutable, ordered collection of items. Once created, its elements cannot be modified.

dimensions = (10, 20, 30)

Characteristics of Tuples:

 Immutable: Cannot modify after creation.

 Ordered: Preserves order of elements.
 Faster than lists for reading data.
 Useful for fixed collections like coordinates, RGB values, etc.

Tuple Use-Cases:
 Representing read-only data.
 Used in dictionary keys (as they are hashable).
 Preferred when data integrity must be preserved.

4. Dictionary in Python
What is a Dictionary?
A Dictionary is an unordered, mutable collection of key-value pairs. It is used to
store data values like a real-life dictionary where each word (key) has a definition
(value).

student = { "name": "Alice", "age": 20, "course": "Data Science" }

Characteristics of Dictionaries:
 Key-value mapping.
 Keys are unique and immutable.
 Values can be any data type.
 Unordered (in versions < Python 3.7) but insertion-ordered in ≥ Python 3.7.

Common Dictionary Operations:

student["age"] = 21 # Modify value student["grade"] = "A" # Add new key-value pair del student["course"]
# Delete a key

Pandas and Data Preprocessing

What is Pandas?
Pandas (short for Python Data Analysis Library) is a powerful and flexible open-
source tool built on top of NumPy. It is specifically designed for data manipulation and
analysis.

Key Features:
 Data wrangling: filtering, transformation, merging, reshaping
 Reading/writing from multiple file formats
 Handling missing data efficiently
 Supports labeled axes (rows and columns)
Pandas Data Structures
1. Series
 A one-dimensional array with axis labels.
 Can hold any data type (integers, strings, floats, etc.)
2. DataFrame
 A two-dimensional labeled data structure with columns that can be of different types
(like a table in a database or an Excel spreadsheet).
 More commonly used in real-world analysis.

Data Importing and Exporting

 Data can be imported from CSV, Excel, SQL, JSON, and more.
 Exporting allows processed data to be saved in a desired format for reporting or further
use.

Supported formats:

 read_csv(), read_excel(), to_csv(), to_excel(), etc.

Data Cleaning & Preprocessing

1. Handling Missing Values
 Missing values are common in real-world datasets.
 They can be handled by:
 Removing missing data
 Replacing with mean/median/mode
 Using forward/backward fill
2. Data Imputation
 Filling missing values using statistics (mean, median, etc.)
 Maintains data consistency and avoids dropping rows/columns unnecessarily.
3. Data Transformation
 Includes renaming columns, converting data types, scaling values, formatting strings,
etc.
 Helps prepare data in a standard format for analysis or machine learning.

Data Visualization

Importance of Visualization
 Helps understand trends, patterns, and relationships in data.
 Makes complex data easier to interpret.
 Essential for reporting insights to stakeholders.
 Aids in data storytelling.

Matplotlib – Basic Visualizations

1. Line Plot
 Shows trends over intervals (e.g., time).
 Good for stock prices, temperature variation.
2. Bar Chart
 Used for comparing quantities across categories.
3. Scatter Plot
 Displays correlation/relationship between two numerical variables.
4. Histogram
 Shows distribution of a dataset

Seaborn – Advanced Visualizations

Seaborn is built on top of Matplotlib and offers a high-level interface for attractive
statistical graphics.

1. Box Plot
 Displays distribution and outliers.
 Highlights median and quartiles.
2. Violin Plot
 Combines box plot and kernel density estimation.
3. Pair Plot
 Creates scatter plots for all pairwise combinations of variables in a dataset.
4. Heatmap
 Used for correlation matrices or any matrix-style data.
 Color-coded representation of values.
Supplementary: Introduction to NumPy
What is NumPy?
NumPy (Numerical Python) is the foundation for scientific computing in Python. It
provides support for large, multi-dimensional arrays and matrix operations, along
with a collection of high-level mathematical functions.

Key Concepts:
 Efficient memory management and performance.
 Array broadcasting (performing operations on arrays of different shapes).
 Vectorized operations (faster than Python loops).
 Random number generation, linear algebra, statistics, etc.
Introduction to Data Visualization and Its Importance
What is Data Visualization?
Data Visualization refers to the graphical representation of information and data. By
using visual elements like charts, graphs, and maps, data visualization tools make it
easier to understand trends, outliers, and patterns in data.

Importance in Data Analysis:

 Helps identify patterns and trends.
 Makes complex data easier to understand.
 Facilitates better decision-making.
 Supports storytelling with data.
 Useful for data exploration and reporting.
Tools commonly used: Matplotlib, Seaborn, Plotly, Tableau, etc.

2. Basic Plots Using Matplotlib

Matplotlib
Matplotlib is the foundational Python library for creating static, animated, and
interactive visualizations.

import [Link] as plt

A. Line Plot

Used to visualize trends over time or continuous variables.

import [Link] as plt x = [1, 2, 3, 4, 5] y = [10, 15, 13, 18, 16] [Link](x, y) [Link]("Line Plot")
[Link]("X-Axis") [Link]("Y-Axis") [Link](True) [Link]()

B. Bar Plot
Used to compare categories or discrete data.

x = ["Apple", "Banana", "Cherry"] y = [10, 15, 7] [Link](x, y, color='skyblue') [Link]("Bar Plot")

[Link]("Quantity") [Link]()
C. Scatter Plot
Used to visualize relationship between two numerical variables.

python
CopyEdit
x = [1, 2, 3, 4, 5] y = [5, 7, 6, 8, 9] [Link](x, y, color='red') [Link]("Scatter Plot") [Link]("X Value")
[Link]("Y Value") [Link]()

D. Histogram
Used to show the distribution of a variable (frequency plot).

import numpy as np data = [Link](1000) [Link](data, bins=30, color='green')

[Link]("Histogram") [Link]("Value") [Link]("Frequency") [Link]()

3. Advanced Visualization Using Seaborn

What is Seaborn?
Seaborn is a high-level visualization library built on top of Matplotlib. It provides a
more attractive and informative statistical visualizations.

import seaborn as sns import [Link] as plt

We use built-in datasets like tips, iris, or titanic.

A. Box Plot

Used to visualize the distribution and detect outliers.

[Link](x='day', y='total_bill', data=sns.load_dataset('tips')) [Link]("Box Plot of Total Bill by Day")

[Link]()

B. Violin Plot

Combines boxplot and KDE (kernel density estimation). Shows distribution and
probability density.
[Link](x='day', y='total_bill', data=sns.load_dataset('tips')) [Link]("Violin Plot of Total Bill by
Day") [Link]()

C. Pair Plot

Used to visualize pairwise relationships in a dataset.

[Link](sns.load_dataset('iris'), hue='species') [Link]("Pair Plot of Iris Dataset", y=1.02)

[Link]()

D. Heatmap

Used for visualizing correlation or any matrix data.

iris = sns.load_dataset('iris') correlation_matrix = [Link]() [Link](correlation_matrix,

annot=True, cmap='coolwarm') [Link]("Heatmap of Iris Feature Correlation") [Link]()

Python Data Analysis Basics Guide
No ratings yet
Python Data Analysis Basics Guide
6 pages
Python for Data Science Basics
No ratings yet
Python for Data Science Basics
37 pages
Data Wrangling with Python Guide
No ratings yet
Data Wrangling with Python Guide
8 pages
Python Programming Basics for Data Science
No ratings yet
Python Programming Basics for Data Science
36 pages
Python Data Science Essentials
No ratings yet
Python Data Science Essentials
27 pages
Data Analysis with Python: NumPy & Pandas
No ratings yet
Data Analysis with Python: NumPy & Pandas
76 pages
Python Data Science Course Overview
No ratings yet
Python Data Science Course Overview
10 pages
Python Data Analysis Syllabus
No ratings yet
Python Data Analysis Syllabus
75 pages
ppt1 - Intro To Data Analytics and Visualization
No ratings yet
ppt1 - Intro To Data Analytics and Visualization
35 pages
Python for Data Science Basics
No ratings yet
Python for Data Science Basics
10 pages
Refsheet 2
No ratings yet
Refsheet 2
25 pages
NumPy vs. Pandas in Python
No ratings yet
NumPy vs. Pandas in Python
72 pages
Python Unit - 5 Notes
No ratings yet
Python Unit - 5 Notes
11 pages
Python Basics for Data Science
No ratings yet
Python Basics for Data Science
52 pages
Key Features of Python Explained
No ratings yet
Key Features of Python Explained
10 pages
Machine Learning with Python
No ratings yet
Machine Learning with Python
29 pages
Week 3
No ratings yet
Week 3
8 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
22 pages
Python Programming Basics and Libraries
No ratings yet
Python Programming Basics and Libraries
47 pages
Python Basics: Variables, Lists, and Functions
No ratings yet
Python Basics: Variables, Lists, and Functions
40 pages
CS3361 Data Science Lab Overview
No ratings yet
CS3361 Data Science Lab Overview
139 pages
Chapter 2 Introduction To Python Libraries
No ratings yet
Chapter 2 Introduction To Python Libraries
46 pages
Python Data Structures and Functions Guide
No ratings yet
Python Data Structures and Functions Guide
16 pages
Introduction to Python Programming Basics
No ratings yet
Introduction to Python Programming Basics
36 pages
Python Data Science Foundations Guide
No ratings yet
Python Data Science Foundations Guide
74 pages
Python Basics For Data Science
No ratings yet
Python Basics For Data Science
31 pages
Day 2 Workshop
No ratings yet
Day 2 Workshop
16 pages
Python Programming
No ratings yet
Python Programming
7 pages
Python for Data Analysis Course Guide
No ratings yet
Python for Data Analysis Course Guide
105 pages
Python Data Structures and Visualization
No ratings yet
Python Data Structures and Visualization
32 pages
Python Libraries Overview: NumPy, Matplotlib, Pandas
No ratings yet
Python Libraries Overview: NumPy, Matplotlib, Pandas
33 pages
Python Data Analysis Essentials
No ratings yet
Python Data Analysis Essentials
29 pages
Python Data Visualization Techniques
No ratings yet
Python Data Visualization Techniques
52 pages
Python Basics and Data Structures
No ratings yet
Python Basics and Data Structures
47 pages
Data Science with Python: NumPy, Pandas, Matplotlib
No ratings yet
Data Science with Python: NumPy, Pandas, Matplotlib
36 pages
Data Science Foundations and Python Guide
No ratings yet
Data Science Foundations and Python Guide
17 pages
Python Data Analytics: NumPy, Pandas, Matplotlib
100% (1)
Python Data Analytics: NumPy, Pandas, Matplotlib
14 pages
Bat404 CH4
No ratings yet
Bat404 CH4
9 pages
MANUAL Basic NumPy Pandas MatPlot
No ratings yet
MANUAL Basic NumPy Pandas MatPlot
5 pages
Python Programming Basics and Applications
No ratings yet
Python Programming Basics and Applications
32 pages
Presentation - Python, NumPy, and Pandas
No ratings yet
Presentation - Python, NumPy, and Pandas
11 pages
Machine Learning with Python Essentials
No ratings yet
Machine Learning with Python Essentials
105 pages
NumPy Basics: A Quick Reference Guide
No ratings yet
NumPy Basics: A Quick Reference Guide
75 pages
NumPy, Pandas, and Matplotlib Overview
No ratings yet
NumPy, Pandas, and Matplotlib Overview
68 pages
Python Programming Environment Setup
No ratings yet
Python Programming Environment Setup
21 pages
Introduction to NumPy and Pandas
No ratings yet
Introduction to NumPy and Pandas
57 pages
Python Functions
No ratings yet
Python Functions
12 pages
Introduction to NumPy for ML
No ratings yet
Introduction to NumPy for ML
27 pages
Part A Assignment No 1 PDF
No ratings yet
Part A Assignment No 1 PDF
24 pages
Python for Data Science Essentials
No ratings yet
Python for Data Science Essentials
19 pages
Data Preprocessing in Python Libraries
No ratings yet
Data Preprocessing in Python Libraries
159 pages
Python for Data Analysis Guide
No ratings yet
Python for Data Analysis Guide
42 pages
Libraries in Python
No ratings yet
Libraries in Python
18 pages
Introduction to NumPy for Data Science
No ratings yet
Introduction to NumPy for Data Science
17 pages
Matplotlib: Essential Python Visualization Tool
No ratings yet
Matplotlib: Essential Python Visualization Tool
26 pages
Python Data Analysis Study Notes
No ratings yet
Python Data Analysis Study Notes
10 pages
Weather Forecasting with Python
No ratings yet
Weather Forecasting with Python
36 pages
Data Analysis and Python Libraries Guide
No ratings yet
Data Analysis and Python Libraries Guide
19 pages
Data Science Experiments with Python
No ratings yet
Data Science Experiments with Python
20 pages
Weekend Activities Discussion
No ratings yet
Weekend Activities Discussion
10 pages
Đề Thi Học Sinh Giỏi Tiếng Anh Lớp 9
No ratings yet
Đề Thi Học Sinh Giỏi Tiếng Anh Lớp 9
12 pages
Understanding Dolphins for Kids
No ratings yet
Understanding Dolphins for Kids
30 pages
Annual Training for Guidance Coordinators
No ratings yet
Annual Training for Guidance Coordinators
4 pages
Kunal Abhishek's IT Career Profile
No ratings yet
Kunal Abhishek's IT Career Profile
4 pages
Vocabulary and Exercises for Letter H
No ratings yet
Vocabulary and Exercises for Letter H
7 pages
Scaffolding Labourer Level 1 Overview
No ratings yet
Scaffolding Labourer Level 1 Overview
3 pages
Overview of Peace Education Module
No ratings yet
Overview of Peace Education Module
4 pages
Psychoanalysis: Sigmund Freud
No ratings yet
Psychoanalysis: Sigmund Freud
23 pages
Muhammad Bilal Farooq's Career Profile
No ratings yet
Muhammad Bilal Farooq's Career Profile
3 pages
Numerical Analysis Course Syllabus
No ratings yet
Numerical Analysis Course Syllabus
11 pages
Essential Seizure First Aid Tips
No ratings yet
Essential Seizure First Aid Tips
2 pages
Entrep-Module-6 For Teacher
79% (14)
Entrep-Module-6 For Teacher
19 pages
S Y - B Tech-ECE
No ratings yet
S Y - B Tech-ECE
48 pages
Online Courier and Shipment Solutions
No ratings yet
Online Courier and Shipment Solutions
30 pages
Benefits of Physical Recreation Activities
No ratings yet
Benefits of Physical Recreation Activities
20 pages
Chong Et Al 2015 PDF
No ratings yet
Chong Et Al 2015 PDF
9 pages
Srijan-24 Commerce Competitions Invitation
No ratings yet
Srijan-24 Commerce Competitions Invitation
2 pages
MSc Project Management in the UK
No ratings yet
MSc Project Management in the UK
2 pages
Sustainability Factors in World Vision Ethiopia
No ratings yet
Sustainability Factors in World Vision Ethiopia
81 pages
Evening Hymn: Te Lucis Ante Terminum
No ratings yet
Evening Hymn: Te Lucis Ante Terminum
9 pages
K'NEX Simple Machines Teacher's Guide
No ratings yet
K'NEX Simple Machines Teacher's Guide
41 pages
Security Assistant Application Details
No ratings yet
Security Assistant Application Details
4 pages
Database Management for Craft Store
No ratings yet
Database Management for Craft Store
7 pages
7 - LSTM
No ratings yet
7 - LSTM
12 pages
Landscape Ecology: Theory & Practice
No ratings yet
Landscape Ecology: Theory & Practice
10 pages
Evaluation Criteria
No ratings yet
Evaluation Criteria
1 page
Philosophy in Free Verse Poetry
No ratings yet
Philosophy in Free Verse Poetry
2 pages
Free Business Training Camp in Kolkata
No ratings yet
Free Business Training Camp in Kolkata
2 pages
Principles of Educational Assessment
100% (1)
Principles of Educational Assessment
73 pages

Python

Uploaded by

Python

Uploaded by

What is Python?

Python is an interpreted, high-level, general-purpose programming language.

Key Features of Python:

Applications of Python in Data Analysis:

Basics of Python Programming

Common Data Types:

Rules for Naming Variables

Valid Variable Names:

name = "John" _age = 25 total_marks = 95.5

Invalid Variable Names:

Python is dynamically typed — the type of a variable is determined automatically

x = 10 # int y = 3.14 # float name = "Ram" # str

Use type() to check the variable type:

type(x) # Output: <class 'int'>

fruits = ["apple", "banana", "cherry"]

Common List Operations:

dimensions = (10, 20, 30)

 Immutable: Cannot modify after creation.

student = { "name": "Alice", "age": 20, "course": "Data Science" }

Common Dictionary Operations:

Pandas and Data Preprocessing

Data Importing and Exporting

 read_csv(), read_excel(), to_csv(), to_excel(), etc.

Data Cleaning & Preprocessing

Matplotlib – Basic Visualizations

Seaborn – Advanced Visualizations

Importance in Data Analysis:

2. Basic Plots Using Matplotlib

import [Link] as plt

Used to visualize trends over time or continuous variables.

x = ["Apple", "Banana", "Cherry"] y = [10, 15, 7] [Link](x, y, color='skyblue') [Link]("Bar Plot")

import numpy as np data = [Link](1000) [Link](data, bins=30, color='green')

3. Advanced Visualization Using Seaborn

import seaborn as sns import [Link] as plt

We use built-in datasets like tips, iris, or titanic.

Used to visualize the distribution and detect outliers.

[Link](x='day', y='total_bill', data=sns.load_dataset('tips')) [Link]("Box Plot of Total Bill by Day")

Used to visualize pairwise relationships in a dataset.

[Link](sns.load_dataset('iris'), hue='species') [Link]("Pair Plot of Iris Dataset", y=1.02)

Used for visualizing correlation or any matrix data.

iris = sns.load_dataset('iris') correlation_matrix = [Link]() [Link](correlation_matrix,

You might also like