0% found this document useful (0 votes)
11 views5 pages

Python for Data Analysis Basics

Uploaded by

3444900809
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Python for Data Analysis Basics

Uploaded by

3444900809
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Python Session 01: Foundations and Data Analysis Introduction

I. Course Context and Environment Setup

• Prerequisites: Students are expected to have prior experience with Excel, SQL, and
Power BI.

• Course Focus: This training focuses on Python relevant to Data Analysis. The full
Python program consists of four sessions.

• Python Significance: Python is a General-Purpose Language, making it versatile. It


is one of the most popular programming languages globally today.

o Top organizations use Python for data analysis, including Google, Netflix,
Spotify, Amazon, and Uber.

o Python is used for complex tasks like dynamic pricing and fraud detection
(e.g., J.P. Morgan Chase).

• Integrated Development Environment (IDE):

o We are currently using Google Colab. Colab is cloud-based, which


simplifies setup and avoids installation issues.

o Access Colab by navigating to [Link].

o Alternative IDEs include VS Code, Jupyter Notebook, Pycharm, and


Anaconda (a Python dashboard).

II. Python Execution and Structure

• Interpreter vs. Compiler:

o A Compiler creates an object file and then executes it directly.

o Python uses an Interpreter, meaning it executes code line by line. Once a


code block is executed, the results (like variable assignments) remain
available in the memory for subsequent blocks.

• Notebook Components: A Google Colab notebook contains:

o Coding Cells: Where executable Python code is written.

o Markdown Cells: Used for documentation, adding headings, titles, defining


problem statements, and general text explanations.

• Adding Code: A new code block can be added by pressing Control M B.


III. Python Basics: Syntax, Data Types, and Variables

• Printing Output: Use the print() statement to display any value.

• Comments: Use the hash sign (#) to write comments in your code. Comments are
non-executable text used to explain code functionality.

• Case Sensitivity: Python is case sensitive. Keywords, function names, and


variable names must match their defined casing (e.g., print must be lowercase).

• Data Types:

o Numeric/Integer: Used for whole numbers.

o String/Character: Used for textual data, often enclosed in quotes (e.g.,


"Hello, world!").

• Variables: Variables act as temporary storage to hold values.

o Example: a = 10.

• Operators:

o Assignment (=): A single equals sign is used to assign a value to a variable


(e.g., x = 100 means assigning 100 to x).

o Comparison (==): A double equals sign is used for comparison (to check if
two values are equal).

o Other Comparison Operators: Include > (greater than), < (less than), >=
(greater than equals), <= (less than equals), and != (not equal).

o Mathematical Operators: Include +, -, *, and /.

o Order of Operations (PEMDAS): Parenthesis, Exponential, Multiplication,


Division, Addition, Subtraction.

o Modulus (%): Returns the remainder of a division. This is essential for


determining if a number is even or odd (e.g., an even number % 2 will be 0).

IV. Control Structures and Data Structures

• Indentation: Python uses indentation to group statements and define structure. All
statements belonging to a control structure (like a loop or an if statement) must be
properly indented.

• Loops: Used for repeated execution.


o for Loop: Used to execute a block of code a specific number of times.

o range() Function: Generates a sequence of numbers.

▪ range(5) returns numbers starting from 0 up to, but not including, 5


(i.e., 0, 1, 2, 3, 4).

▪ To specify a start and end, use two parameters (e.g., range(1, 11) runs
from 1 to 10).

• Conditional Statements: Used for decision making.

o The if statement checks a condition. If the condition is true, the indented


code block is executed.

• Lists (Data Structure): Lists are used to store a collection of items (data structure).
They are defined using square brackets.

o Example: branches = ["US", "Pakistan", "Dubai"].

o When looping through a list, the variable used in the loop (e.g., B in for B in
branches) retains the last iterated value after the loop finishes executing.

o Dry Run: A mental simulation of code execution used to apply the business
logic and verify expected outcomes.

V. Functions and Libraries

• Functions: Reusable, defined blocks of code.

o Defining: Functions are defined using the def keyword, followed by the
function name and parameters.

o return Statement: A function that uses return passes a value back to the
code that called it. If you want to display the result, you must use print()
when calling the function.

o Direct Printing: A function that uses print() inside its definition executes the
print statement immediately when called, and does not require an external
print() statement.

• Libraries and Modules: Libraries contain pre-written code and functions that
extend Python's capabilities.

o Crowdsourcing: Python's ecosystem is strong due to crowdsourcing, where


the community contributes to developing libraries.
o Importing Libraries: Use the import statement.

o Aliasing: Use the as keyword to assign a short, conventional name to an


imported library (e.g., as plt or as pd).

• Matplotlib:

o A set of libraries used for data visualization.

o [Link] is the specific library imported for creating plots and


charts.

o Convention: import [Link] as plt.

o Functions Used: [Link]() (to create a bar chart), [Link]() (to display the
chart), [Link](), [Link](), and [Link]() (to add descriptive labels and a
title).

VI. Data Analysis with Pandas and DataFrames

• Pandas Library: The most popular and highly used library for Data Analysts.
Pandas is used to perform various data operations, such as loading and
manipulating data.

o Convention: import pandas as pd.

• DataFrames (DF): A DataFrame is the most important data structure for data
analysts. It represents data in a two-dimensional table format (like an Excel sheet
or a database table).

• Reading Data: Data can be loaded using functions like pd.read_csv().

• Exploratory Data Analysis (EDA) Functions: EDA is the first step performed when
receiving new data.

o [Link]() (Function): Displays the top 5 records of the DataFrame. You can
specify the number of rows (e.g., [Link](10) displays the top 10).

o [Link]() (Function): Displays the bottom 5 records.

o [Link] (Property): Returns a tuple showing the total rows and total
columns. Note: Because shape is a property, it does not use parentheses ().

o [Link] (Property): Displays the names of all the columns (headings) in


the DataFrame.
o [Link]() (Function): Provides details about the DataFrame, including column
names, data types, and the count of non-null values.

o Selecting Specific Columns: To view selected columns, you must pass a


list of column names using double square brackets (e.g.,
df[["PassengerId", "Survived", "Name"]]).

o [Link]() (Function): Calculates and displays descriptive statistics


(count, mean, standard deviation, min, max, and quartiles) for all numeric
columns.

VII. Next Steps and Assignments

• NumPy Online Course: Students are required to complete a 31-minute online


course on NumPy basics (a foundational library used for array operations in
Python). This must be completed before the next session.

• File Upload: Students must upload the shared CSV file (Supermarket or Titanic
data) to their Google Drive to prepare for data loading exercises in the next session.

• SQL Server with ChatGPT Course: A free coupon will be shared in the next session
for an SQl Server with ChatGPT course.

Analogy for DataFrames:

A DataFrame in Pandas is like a Swiss Army Knife for data analysts. It’s not just a simple
table (like a spreadsheet); it’s equipped with all the tools (functions like .head(), .describe(),
and .info()) built right into it, allowing you to instantly inspect, clean, and summarize data
without needing separate tools for each task.

You might also like