0% found this document useful (0 votes)
3 views369 pages

Python Notes

This document provides an overview of Python programming, highlighting its features, libraries, and the functioning of its interpreter. It covers key concepts such as identifiers, keywords, statements, expressions, variables, and various types of operators. Additionally, it explains the differences between statements and expressions, as well as the precedence of operators in Python.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views369 pages

Python Notes

This document provides an overview of Python programming, highlighting its features, libraries, and the functioning of its interpreter. It covers key concepts such as identifiers, keywords, statements, expressions, variables, and various types of operators. Additionally, it explains the differences between statements and expressions, as well as the precedence of operators in Python.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 1

Python Basic Concepts and Programming


The ABC of Python
• Python is a general-purpose, dynamically typed, high-level, compiled and
interpreted, garbage-collected, and purely object-oriented programming
language that supports procedural, object-oriented, and functional
programming.
• Python is a widely used programming language that offers several unique
features and advantages compared to languages like Java and C++.
Features of Python
• Easy to use and Read
• Dynamically Typed
• High-level
• Compiled and Interpreted
• Garbage Collected
• Purely Object-Oriented
• Cross-platform Compatibility
• Rich Standard Library
• Open Source
Python Popular Frameworks and Libraries

Libraries Usage
Django Flask, Pyramid, CherryPy Web development (Server-side)

Tkinter, PyGTK, PyQt, PyJs, GUIs based applications

TensorFlow, PyTorch, Scikit-learn, Matplotlib, Scipy, Machine Learning


etc.
NumPy, Pandas, etc. Mathematics

Pygame Game Development

Pytest testing framework for Python Django

NLTK Natural Language Processing


Interpreters
Python Interpreter
• Python is one of the most high-level languages used today
because of its massive versatility and portable library &
framework features.
• It is an interpreted language because it executes line-by-line
instructions.
• There are actually two way to execute python code one is in
Interactive mode and another thing is having Python prompts
which is also called script mode
How interpreters function in Python?
• Python interpreter is written in C programming language as we all
know that C is considered as mother of all programming languages,
Python interpreter called "CPython".
• Initially in source code analysis, Python get source and check for
some Indentation rule and check for errors. if there are any errors
Python stops its execution and make to modify to prevent errors. This
process is called Lexical analysis in python which states that dividing
the source code into list of individual tokens.
• In Byte code generation, once the python interpreter receives the
tokens, it generates AST (Abstract Structure tree) which coverts to
byte code and then byte code can be saved in (.py) extension.
• At last, by using Python Virtual Machine, interpreter loads the
machine language and PVM converts in into 0s and 1 s which prints
the results.
• Advantages of using Interpreter
• It is flexible and error localization is easier.
• It is smaller in size.
• It executes line by line execution.

• Disadvantages of using Interpreters


• It takes lot of time to translate.
• It is slower.
• It uses lot of storage.
Parts of Python Programming Language
• Identifiers • Data Types
• Keywords • Indentation
• Statements and Expressions • Comments
• Program Execution
• Variables • Reading Input
• Operators, Precedence and Associativity, • Print Output
Identifiers
• Identifier is a user-defined name given to a variable, function, class,
module, etc.
• The identifier is a combination of character digits and an underscore.
• They are case-sensitive i.e., ‘num’ and ‘Num’ and ‘NUM’ are three
different identifiers in python.
• It is a good programming practice to give meaningful names to
identifiers to make the code understandable.
Rules for Naming Python Identifiers
• It cannot be a reserved python keyword.

• It should not contain white space.

• It can be a combination of A-Z, a-z, 0-9, or underscore.

• It should start with an alphabet character or an underscore ( _ ).

• It should not contain any special character other than an underscore ( _ ).


Examples of Python Identifiers
• Valid identifiers: Invalid Identifiers
• !var1
• var1
• 1var
• _var1
• 1_var
• _1_var
• var#1
• var_1
• var 1
Keywords
• Python keywords are unique words reserved with defined meanings and
functions that we can only apply for those functions.
• You'll never need to import any keyword into your program because they're
permanently present.
• Assigning a particular meaning to Python keywords means you can't use them
for other purposes in our code.
• You'll get a message of SyntaxError if you attempt to do the same.
• If you attempt to assign anything to a built-in method or type, you will not
receive a SyntaxError message; however, it is still not a smart idea.
• Python contains thirty-five keywords in the most recent version, i.e., Python
3.8.
Keywords
False await else import pass

None break except in raise

True class finally is return

and continue for lambda try

as def from nonlocal while

assert del global not with

async elif if or yield


Demo Program Keywords
# Importing the keyword module to display Python keywords
import keyword
# Display all Python keywords
print("List of Python Keywords:")
print([Link])

# Using some keywords in a sample program


def demo_keywords():
# Example of keywords in action
if True: # 'if' and 'True' are keywords
print("This demonstrates the use of 'if' and 'True' keywords.")
for i in range(3): # 'for' and 'in' are keywords
print(f"Looping... Current value: {i}")
return None # 'return' and 'None' are keywords

# Call the function


demo_keywords()
Statements and Expressions
• A statement is an instruction that the Python interpreter can execute.
Python statements perform actions, such as assigning values, looping,
or making decisions.
• Characteristics of Statements
• Statements do not produce a value.
• They perform an action or define something (e.g., variables,
functions).
Program to demonstrate Statements
# Assignment statement
x = 10

# Print statement
print("Hello, World!")

# Conditional statement
if x > 5:
print("x is greater than 5")

# Loop statement
for i in range(5):
print(i)

# Function definition statement


def greet():
print("Welcome to Python!")
Expressions
• An expression is a combination of variables, values, operators, and
function calls that the interpreter evaluates to produce a result.
• An expression always returns a value.
• Characteristics of Expressions
• Expressions compute values.
• They can be part of statements (e.g., assignment statements).
Program to demonstrate Expressions
# Mathematical expression
x = 5 + 3 # '5 + 3' is an expression

# Boolean expression
y = x > 5 # 'x > 5' is an expression

# Function call expression


z = len("Python") # 'len("Python")' is an expression

# Combined expression
result = (x * 2) + (y * 3) # Combination of multiple
expressions
Key Differences Between Statements and
Expressions
Variables
• A variable is the name given to a memory location. A value-holding
Python variable is also known as an identifier.
• Since Python is an infer language that is smart enough to determine
the type of a variable, we do not need to specify its type in Python.
• Variable names must begin with a letter or an underscore, but they can
be a group of both letters and digits.
• The name of the variable should be written in lowercase. Both India
and india are distinct variables.
Operators
• Operators are special symbols that perform operations on variables
and values. For example,
• print(5 + 6) # 11
Types of Python Operators
[Link] Operators
[Link] Operators
[Link] Operators
[Link] Operators
[Link] Operators
[Link] Operators
Arithmetic Operators
• Arithmetic operators are used to perform mathematical operations
like addition, subtraction, multiplication, etc.
Operator Operation Example

+ Addition 5+2=7

- Subtraction 4-2=2

* Multiplication 2*3=6

/ Division 4/2=2
// Floor Division 10 // 3 = 3
% Modulo 5%2=1

** Power 4 ** 2 = 16
Arithmetic Operators in Python
a=7
b=2
# addition
print ('Sum: ', a + b)
Sum: 9
# subtraction
Subtraction: 5
print ('Subtraction: ', a - b)
Multiplication: 14
# multiplication Division: 3.5
print ('Multiplication: ', a * b) Floor Division: 3
Modulo: 1
# division Power: 49
print ('Division: ', a / b)

# floor division
print ('Floor Division: ', a // b)

# modulo
print ('Modulo: ', a % b)

# a to the power b
print ('Power: ', a ** b)
Assignment Operators
• Assignment operators are used to assign values to variables. For
example,
• # assign 5 to x
• x=5
• Here, = is an assignment operator that assigns 5 to x.

• Here's a list of different assignment operators available in Python.


Operator Name Example

= Assignment Operator a=7

+= Addition Assignment a += 1 # a = a + 1

-= Subtraction Assignment a -= 3 # a = a - 3

*= Multiplication Assignment a *= 4 # a = a * 4

/= Division Assignment a /= 3 # a = a / 3

%= Remainder Assignment a %= 10 # a = a % 10

**= Exponent Assignment a **= 10 # a = a ** 10


Assignment Operators
• # assign 10 to a
• a = 10

• # assign 5 to b
• b=5

• # assign the sum of a and b to a


• a += b # a = a + b

• print(a)

• # Output: 15
Comparison Operators
• Comparison operators compare two values/variables and return a
boolean result: True or False. Operator Meaning Example

a=5 3 == 5 gives us
== Is Equal To
b=2 False

print (a > b) # True != Not Equal To 3 != 5 gives us True

> Greater Than 3 > 5 gives us False

< Less Than 3 < 5 gives us True

Greater Than or
>= 3 >= 5 give us False
Equal To

Less Than or Equal


<= 3 <= 5 gives us True
To
Comparison Operators Example Program
a=5
b=2
# equal to operator
print('a == b =', a == b)

# not equal to operator


print('a != b =', a != b)

# greater than operator


print('a > b =', a > b)

# less than operator


print('a < b =', a < b)

# greater than or equal to operator


print('a >= b =', a >= b)

# less than or equal to operator


print('a <= b =', a <= b)
Logical Operators
• Logical operators are used to check whether an expression is True or
False. Operator Example Meaning
a=5 Logical AND:
b=6 True only if both
and a and b
the operands
print((a > 2) and (b >= 6)) # True are True

Logical OR:
True if at least
or a or b
one of the
operands is True
Logical NOT:
True if the
not not a
operand is False
and vice-versa.
Logical Operators Program
• # logical AND
• print(True and True) # True
• print(True and False) # False

• # logical OR
• print(True or False) # True

• # logical NOT
• print(not True) # False
Bitwise operators
• Bitwise operators act on operands as if they were strings of binary
digits. They operate bit by bit, hence the name.
• For example, 2 is 10 in binary, and 7 is 111.
• In the table below: Let x = 10 Operator Meaning Example
• (0000 1010 in binary) and & Bitwise AND
x & y = 0 (0000
0000)
• y = 4 (0000 0100 in binary)
x | y = 14 (0000
| Bitwise OR
1110)

~x = -11 (1111
~ Bitwise NOT
0101)

x ^ y = 14 (0000
^ Bitwise XOR
1110)

Bitwise right x >> 2 = 2 (0000


>>
shift 0010)

<< Bitwise left shift x 0010 1000)


Special operators
• Python language offers some special types of operators like
the identity operator and the membership operator.
• They are described below with examples.
• Identity operators
• In Python, is and is not are used to check if two values are located at the same
memory location.

• It's important to note that having two variables with equal values doesn't
necessarily mean they are identical.
Operator Meaning Example

True if the operands


is are identical (refer to x is True
the same object)

True if the operands


are not identical (do
is not x is not True
not refer to the same
object)
Identity operators in Python
• x1 = 5
• y1 = 5
• x2 = 'Hello'
• y2 = 'Hello'
• x3 = [1,2,3]
• y3 = [1,2,3]

• print(x1 is not y1) # prints False

• print(x2 is y2) # prints True

• print(x3 is y3) # prints False


Membership operators

• In Python, in and not in are the membership operators.


• They are used to test whether a value or variable is found in a
sequence (string, list, tuple, set and dictionary).
• In a dictionary, we can only test for the presence of a key, not the
value.
Operator Meaning Example

True if
value/variable is
in 5 in x
found in the
sequence

True if
value/variable is
not in 5 not in x
not found in the
sequence
Membership operators in Python
• message = 'Hello world'
• dict1 = {1:'a', 2:'b'}

• # check if 'H' is present in message string


• print('H' in message) # prints True

• # check if 'hello' is present in message string


• print('hello' not in message) # prints True

• # check if '1' key is present in dict1


• print(1 in dict1) # prints True

• # check if 'a' key is present in dict1


• print('a' in dict1) # prints False
Precedence of Python Operators
• The combination of values, variables, operators, and function calls is
termed as an expression.
• The Python interpreter can evaluate a valid expression.
• For example:
• >>> 5 - 7
• -2
For example, multiplication has higher
precedence than subtraction.
• # Multiplication has higher precedence than subtraction
• >>> 10 - 4 * 2
•2
• But we can change this order using parentheses () as it has higher
precedence than multiplication.
• # Parentheses () has higher precedence
• >>> (10 - 4) * 2
• 12
Operators Meaning

() Parentheses

** Exponent

+x, -x, ~x Unary plus, Unary minus, Bitwise NOT

*, /, //, % Multiplication, Division, Floor division, Modulus

+, - Addition, Subtraction

<<, >> Bitwise shift operators

& Bitwise AND

^ Bitwise XOR

| Bitwise OR

==, !=, >, >=, <, <=, is, is not, in, not in Comparisons, Identity, Membership operators

not Logical NOT

and Logical AND

or Logical OR
• # Precedence of or & and
• meal = "fruit"

• money = 0

• if meal == "fruit" or meal == "sandwich" and money >= 2:


• print("Lunch being delivered")
• else:
• print("Can't deliver lunch")
# Precedence of or & and
meal = "fruit"

money = 0

if (meal == "fruit" or meal == "sandwich") and money >= 2:


print("Lunch being delivered")
else:
print("Can't deliver lunch")
Associativity of Python Operators
• We have seen in the previous concept that more than one operator
exists in the same group. These operators have the same precedence.
• When two operators have the same precedence, associativity helps to
determine the order of operations.
• Associativity is the order in which an expression is evaluated that has
multiple operators of the same precedence. Almost all the operators
have left-to-right associativity.
• For example, multiplication and floor division have the same
precedence. Hence, if both of them are present in an expression, the
left one is evaluated first.
# Left-right associativity
# Output: 3
print(5 * 2 // 3)

# Shows left-right associativity


# Output: 0
print(5 * (2 // 3))

Output
3
0
Note: Exponent operator ** has right-to-left associativity in Python.

# Shows the right-left associativity of **


# Output: 512, Since 2**(3**2) = 2**9
print(2 ** 3 ** 2)

# If 2 needs to be exponated first, need to use ()


# Output: 64
print((2 ** 3) ** 2)
Non associative operators

• Some operators like assignment operators and comparison operators


do not have associativity in Python. There are separate rules for
sequences of this kind of operator and cannot be expressed as
associativity.
• For example, x < y < z neither means (x < y) < z nor x < (y < z). x < y < z
is equivalent to x < y and y < z, and is evaluated from left-to-right.
# Initialize x, y, z
x=y=z=1

# Expression is invalid
# (Non-associative operators)
# SyntaxError: invalid syntax

x = y = z+= 2

File "<string>", line 8


x = y = z+= 2
^
SyntaxError: invalid syntax
Data Types
• Python Data types are the classification or categorization of
data items.
• It represents the kind of value that tells what operations can be
performed on a particular data.
• Since everything is an object in Python programming, Python
data types are classes and variables are instances (objects) of
these classes.
Data Types
Data Types
• Numeric – int, float, complex
• Sequence Type – string, list, tuple
• Mapping Type – dict
• Boolean – bool
• Set Type – set, frozenset
• Binary Types – bytes, bytearray, memoryview
1. Numeric Data Types in Python
• The numeric data type in Python represents the data that has a numeric value. A
numeric value can be an integer, a floating number, or even a complex number.
These values are defined as Python int , Python float , and Python
complex classes in Python .
• Integers – This value is represented by int class. It contains positive or negative
whole numbers (without fractions or decimals). In Python, there is no limit to
how long an integer value can be.

• Float – This value is represented by the float class. It is a real number with a
floating-point representation. It is specified by a decimal point. Optionally, the
character e or E followed by a positive or negative integer may be appended to
specify scientific notation.

• Complex Numbers – A complex number is represented by a complex class. It is


specified as (real part) + (imaginary part)j . For example – 2+3j
•a=5
• print("Type of a: ", type(a))
Type of a: <class 'int'>

• b = 5.0 Type of b: <class 'float'>

• print("\nType of b: ", type(b)) Type of c: <class 'complex'>

• c = 2 + 4j
• print("\nType of c: ", type(c))
Sequence Data Types in Python
• The sequence Data Type in Python is the ordered collection of
similar or different Python data types.
• Sequences allow storing of multiple values in an organized and
efficient fashion.
• There are several sequence data types of Python:
• Python String
• Python List
• Python Tuple
String Data Type
• Strings in Python are arrays of bytes representing Unicode
characters. A string is a collection of one or more characters put in a
single quote, double-quote, or triple-quote.
• In Python, there is no character data type Python, a character is a
string of length one. It is represented by str class.
• Creating String
• Strings in Python can be created using single quotes, double quotes, or
even triple quotes.
• Example: This Python code showcases various string creation
methods. It uses single quotes, double quotes, and triple quotes to
create strings with different content and includes a multiline string.
The code also demonstrates printing the strings and checking their
data types.
s1 = 'Welcome to the World of AI'
print("String with Single Quotes: ", s1)

# check data type


print(type(s1))

s2 = “Techno Space"
print("String with Double Quotes: ", s2)

s3 = '''I'm a Geek and I live in a world of "Geeks"'''


print("String with Triple Quotes: ", s3)

s4 = ‘’MCA
MBA
MTech''
print("Multiline String: ", s4)
Accessing elements of String
• In Python programming , individual characters of a String can
be accessed by using the method of Indexing. Negative
Indexing allows negative address references to access
characters from the back of the String, e.g. -1 refers to the last
character, -2 refers to the second last character, and so on.
• Example: This Python code demonstrates how to work with a
string named ‘ String1′ . It initializes the string
with “GeeksForGeeks” and prints it. It then showcases how to
access the first character ( “G” ) using an index of 0 and the
last character ( “s” ) using a negative index of -1.
s = “Unity in Diversity"

# accessing first character of string


print(s[0])

# accessing last character of string


print(s[-1])
List Data Type
• Lists are just like arrays, declared in other languages which is an
ordered collection of data.
• It is very flexible as the items in a list do not need to be of the same
type.
• Creating a List in Python
• Lists in Python can be created by just placing the sequence inside the square
brackets[].
• Example: This Python code demonstrates list creation and manipulation. It
starts with an empty list and prints it. It creates a list containing a single
string element and prints it. It creates a list with multiple string elements
and prints selected elements from the list.
• # Empty list
• a = []

• # list with int values


• a = [1, 2, 3]
• print(a)

• # list with mixed int and string


• b = [“Go", "For", “It", 4, 5]
• print(b)
Python Access List Items
• In order to access the list items refer to the index number. Use the
index operator [ ] to access an item in a list.
• In Python, negative sequence indexes represent positions from the
end of the array.
• Instead of having to compute the offset as in List[len(List)-3], it is
enough to just write List[-3]. Negative indexing means beginning from
the end, -1 refers to the last item, -2 refers to the second-last item,
etc.
a = [“Invented", "For", “Life"]
print("Accessing element from the list")
print(a[0])
print(a[2])
print("Accessing element using negative indexing")
print(a[-1])
print(a[-3])
Tuple Data Type
• Just like a list, a tuple is also an ordered collection of Python
objects.
• The only difference between a tuple and a list is that tuples are
immutable i.e. tuples cannot be modified after it is created.
• It is represented by a tuple class.
• Creating a Tuple in Python
• In Python Data Types, tuples are created by placing a sequence of values
separated by a ‘comma’ with or without the use of parentheses for grouping
the data sequence. Tuples can contain any number of elements and of any
datatype (like strings, integers, lists, etc.).
• Note: Tuples can also be created with a single element, but it is a bit tricky.
Having one element in the parentheses is not sufficient, there must be a
trailing ‘comma’ to make it a tuple.
# initiate empty tuple
t1 = ()

t2 = (Keep', ‘Reinvennting')
print("\nTuple with the use of String: ", t2)
Access Tuple Items
• In order to access the tuple items refer to the index number.
Use the index operator [ ] to access an item in a tuple.
• The index must be an integer. Nested tuples are accessed
using nested indexing.
• The code creates a tuple named ‘ tuple1′ with five elements: 1,
2, 3, 4, and 5 . Then it prints the first, last, and third last
elements of the tuple using indexing.
t1 = tuple([1, 2, 3, 4, 5])
print("First element of tuple")
print(t1[0])
print("\nLast element of tuple")
print(t1[-1])

print("\nThird last element of tuple")


print(t1[-3])
Boolean Data Type
• Python Data type with one of the two built-in values, True or
False. Boolean objects that are equal to True are truthy (true),
and those equal to False are falsy (false).
• However non-Boolean objects can be evaluated in a Boolean
context as well and determined to be true or false. It is denoted
by the class bool.
• Note – True and False with capital ‘T’ and ‘F’ are valid booleans
otherwise python will throw an error.
print(type(True))
print(type(False))
print(type(true))
4. Set Data Type in Python
• In Python Data Types, a Set is an unordered collection of data types
that is iterable, mutable, and has no duplicate elements. The order of
elements in a set is undefined though it may consist of various
elements.
• Create a Set in Python
• Sets can be created by using the built-in set() function with an iterable object
or a sequence by placing the sequence inside curly braces, separated by a
‘comma’. The type of elements in a set need not be the same, various mixed-
up data type values can also be passed to the set.
• Example: The code is an example of how to create sets using different
types of values, such as strings , lists , and mixed values
# initializing empty set
s1 = set()

s1 = set(“One State Many World")


print("Set with the use of String: ", s1)

s2 = set([" One ", " State ", " Many “, “World”])


print("Set with the use of List: ", s2)
Access Set Items
• Set items cannot be accessed by referring to an index, since
sets are unordered the items have no index. But you can loop
through the set items using a for loop, or ask if a specified
value is present in a set, by using the in the keyword.
• Example: This Python code creates a set named set1 with the
values “Go” , “For” and “It” . The code then prints the initial
set, the elements of the set in a loop, and checks if the
value “Geeks” is in the set using the ‘ in’ operator
set1 = set([“Go", "For", “It"])
print("\nInitial set")
print(set1)
print("\nElements of set: ")
for i in set1:
print(i, end=" ")
print("Go" in set1)
5. Dictionary Data Type in Python
• A dictionary in Python is an unordered collection of data values,
used to store data values like a map, unlike other Python Data
Types that hold only a single value as an element, a Dictionary
holds a key: value pair.
• Key-value is provided in the dictionary to make it more optimized.
Each key-value pair in a Dictionary is separated by a colon : ,
whereas each key is separated by a ‘comma’.

Create a Dictionary in Python
• In Python, a Dictionary can be created by placing a sequence of
elements within curly {} braces, separated by ‘comma’.
• Values in a dictionary can be of any datatype and can be duplicated,
whereas keys can’t be repeated and must be immutable.
• The dictionary can also be created by the built-in function dict().
• An empty dictionary can be created by just placing it in curly
braces{}. Note – Dictionary keys are case sensitive, the same name
but different cases of Key will be treated distinctly.
• Example: This code creates and prints a variety of dictionaries. The
first dictionary is empty.
• The second dictionary has integer keys and string values. The third
dictionary has mixed keys, with one string key and one integer key.
• The fourth dictionary is created using the dict() function, and the
fifth dictionary is created using the [(key, value)] syntax
# initialize empty dictionary
d = {}

d = {1: ‘Welcome', 2: ‘To', 3: ‘Python'}


print(d)

# creating dictionary using dict() constructor


d1 = dict({1: Welcome ', 2: To ', 3: Python '})
print(d1)
Accessing Key-value in Dictionary
• In order to access the items of a dictionary refer to its key name. Key
can be used inside square brackets.
• There is also a method called get() that will also help in accessing the
element from a dictionary.
• Example
• The code in Python is used to access elements in a dictionary.
• Here’s what it does,
• It creates a dictionary Dict with keys and values as { 1: ‘Welcome’, ‘To’: ‘Our’,
3: ‘Python’} .
• It prints the value of the element with the key ‘To’ , which is ‘Our’ .
• It prints the value of the element with the key 3, which is ‘Geeks’ .
d = {1: Welcome', ‘To': ‘Our', 3: ‘Python'}

# Accessing an element using key


print(d[‘To'])

# Accessing a element using get


print([Link](3))
Indentation
• Indentation refers to the spaces at the beginning of a
code line.
• Where in other programming languages the indentation
in code is for readability only, the indentation in Python
is very important.
• Python uses indentation to indicate a block of code.
if 5 > 2: if 5 > 2:
print("Five is greater than two!") print("Five is greater than two!")

if 5 > 2:
if 5 > 2:
print("Five is greater than two!")
print("Five is greater than two!")
if 5 > 2:
print("Five is greater than two!")
print("Five is greater than two!")
Comments
• Comments can be used to explain Python code.
• Comments can be used to make the code more
readable.
• Comments can be used to prevent execution when
testing code.
Creating a Comment
• Comments starts with a #, and Python will ignore them:
• #This is a comment
print("Hello, World!")
• Comments can be placed at the end of a line, and
Python will ignore the rest of the line:
• print("Hello, World!") #This is a comment


Multiline Comments
• Python does not really have a syntax for multiline comments.
• To add a multiline comment you could insert a # for each line:
• #This is a comment
#written in
#more than just one line
print("Hello, World!")
• """
This is a comment
written in
more than just one line
"""
print("Hello, World!")
Program Execution
• Python program execution is the process of running Python code,
transforming it from high-level instructions into machine-executable
operations.
• Here’s an overview of how this works:
1. Source Code:You write your program in a .py file using Python syntax.
• print("Hello, World!")
2. Compilation:
• Python Interpreter first compiles the source code into bytecode (an intermediate
representation).
• Bytecode files have the extension .pyc and are stored in a __pycache__ directory.
• This step is transparent to the user and happens automatically when the program is
executed.
• Bytecode is platform-independent.
• 3. Execution in the Python Virtual Machine (PVM)
• The bytecode is then executed by the Python Virtual Machine (PVM).
• PVM is a part of the Python interpreter and is responsible for converting bytecode
into machine code for the host system.
• This step involves interpreting bytecode instructions one by one and executing them.
• 4. Runtime
• During execution, the interpreter manages memory, handles exceptions, and
interacts with operating system resources (like files, databases, or networks).
• Key components at runtime include:
• Memory Management (for variables, data structures, etc.)
• Dynamic Typing (Python determines types at runtime)
• Garbage Collection (automatic cleanup of unused objects)
Execution Modes
• Direct Execution (Interpreter Mode):
• Run the file directly using the python command
• python [Link]
• Interactive Mode:
• Start the interpreter and run code interactively:bashCopy code
• python
• >>> print("Hello")
• Integrated Development Environment (IDE):
• Use an IDE like PyCharm, VS Code, or Jupyter Notebook to write and execute code.
• Compiled Executable:
• Tools like pyinstaller can package a Python program into an executable file for
distribution.
Reading Input
• Developers often have a need to interact with users, either to
get data or to provide some sort of result.
• Most programs today use a dialog box as a way of asking the
user to provide some type of input.
• While Python provides us with two inbuilt functions to read
the input from the keyboard.

• input ( prompt )
• raw_input ( prompt )
• input ():
• This function first takes the input from the user and converts it into a
string. The type of the returned object always will be <class ‘str’>.
• It does not evaluate the expression it just returns the complete
statement as String. For example, Python provides a built-in function
called input which takes the input from the user.
• When the input function is called it stops the program and waits for
the user’s input. When the user presses enter, the program resumes
and returns what the user typed. inp = input('STATEMENT')
Example:
1. >>> name = input('What is your name?\n') # \n --->
newline ---> It causes a line break
>>> What is your name?
Ram
>>> print(name)
Ram

# ---> comment in python


How the input function works in
Python :
• When input() function executes program flow will be stopped
until the user has given input.
• The text or message displayed on the output screen to ask a
user to enter an input value is optional i.e. the prompt, which
will be printed on the screen is optional.
• Whatever you enter as input, the input function converts it into
a string. if you enter an integer value still input() function
converts it into a string. You need to explicitly convert it into an
integer in your code using typecasting .
Demo Program
# Program to check input
# type in Python

num = input ("Enter number :")


print(num)
name1 = input("Enter name : ")
print(name1)

# Printing type of input value


print ("type of number", type(num))
print ("type of name", type(name1))
raw_input()
• This function works in older version (like Python 2.x).
• This function takes exactly what is typed from the keyboard,
converts it to string, and then returns it to the variable in which
we want to store it.

# Python program showing


# a use of raw_input()

g = raw_input("Enter your name : ")


print g
Print Output
• Python print() function prints the message to the screen or any other standard output device.
• print() Function Syntax
• Syntax : print(value(s), sep= ‘ ‘, end = ‘\n’, file=file, flush=flush)
• Parameters:
• value(s): Any value, and as many as you like. Will be converted to a string before printed

• sep=’separator’ : (Optional) Specify how to separate the objects, if there is more than [Link] :’ ‘

• end=’end’: (Optional) Specify what to print at the [Link] : ‘\n’

• file : (Optional) An object with a write method. Default :[Link]

• flush : (Optional) A Boolean, specifying if the output is flushed (True) or buffered (False). Default: False


How print() works in Python?
• You can pass variables, strings, numbers, or other data types
as one or more parameters when using the print() function.
• Then, these parameters are represented as strings by their
respective str() functions.
• To create a single output string, the transformed strings are
concatenated with spaces between them.

name = “Arjun"
age = 25

print("Hello, my name is", name, "and I am", age, "years old.")


Type Conversion
• In programming, type conversion is the process of converting data of
one type to another. For example: converting int data to str.
• There are two types of type conversion in Python.
• Implicit Conversion - automatic type conversion
• Explicit Conversion - manual type conversion
Implicit Type Conversion
• Python automatically converts one data type to another. This is
known as implicit type conversion.
• Converting integer to float
integer_number = 123
float_number = 1.23

new_number = integer_number + float_number

# display new value and resulting data type


print("Value:",new_number)
print("Data Type:",type(new_number))
Note
• We get TypeError, if we try to add str and int. For example, '12' + 23.
Python is not able to use Implicit Conversion in such conditions.
• Python has a solution for these types of situations which is known as
Explicit Conversion.
Explicit Type Conversion
• In Explicit Type Conversion, users convert the data type of an object
to required data type.
• We use the built-in functions like int(), float(), str(), etc to perform
explicit type conversion.
• This type of conversion is also called typecasting because the user
casts (changes) the data type of the objects.
Example: Addition of string and integer
Using Explicit Conversion
num_string = '12'
num_integer = 23
print("Data type of num_string before Type Casting:",type(num_string))
# explicit type conversion
num_string = int(num_string)
print("Data type of num_string after Type Casting:",type(num_string))
num_sum = num_integer + num_string
print("Sum:",num_sum)
print("Data type of num_sum:",type(num_sum))
Points to Remember
[Link] Conversion is the conversion of an object from one data type to
another data type.
[Link] Type Conversion is automatically performed by the Python
interpreter.
[Link] avoids the loss of data in Implicit Type Conversion.
[Link] Type Conversion is also called Type Casting, the data types of
objects are converted using predefined functions by the user.
[Link] Type Casting, loss of data may occur as we enforce the object to a
specific data type.
Python Type conversion using ord(),
hex(), oct()
• ord(): This function is used to convert a character
to an integer.
hex(): This function is to convert an integer to a
hexadecimal string.
oct(): This function is to convert an integer to an
octal string.
# initializing integer
s = '4'

# printing character converting to integer


c = ord(s)
print ("After converting character to integer : ",end="")
print (c)

# printing integer converting to hexadecimal string


c = hex(56)
print ("After converting 56 to hexadecimal string : ",end="")
print (c)

# printing integer converting to octal string


c = oct(56)
print ("After converting 56 to octal string : ",end="")
print (c)
Python Type conversion using tuple(),
set(), list()
• tuple(): This function is used to convert to a tuple.
set(): This function returns the type after converting to set.
list(): This function is used to convert any data type to a list
type
# initializing string
s = 'geeks'

# printing string converting to tuple


c = tuple(s)
print ("After converting string to tuple : ",end="")
print (c)

# printing string converting to set


c = set(s)
print ("After converting string to set : ",end="")
print (c)

# printing string converting to list


c = list(s)
print ("After converting string to list : ",end="")
print (c)
type() function
• The type() function is mostly used for debugging purposes.
• Two different types of arguments can be passed to type()
function, single and three arguments.
• If a single argument type(obj) is passed, it returns the type of
the given object. If three argument types (object, bases, dict)
are passed, it returns a new type object.

Python type() function Syntax

• Syntax: type(object, bases, dict)


• Parameters :
• object: Required. If only one parameter is specified, the type()
function returns the type of this object
• bases : tuple of classes from which the current class derives. Later
corresponds to the __bases__ attribute.
• dict : a dictionary that holds the namespaces for the class. Later
corresponds to the __dict__ attribute.
• Return: returns a new type class or essentially a metaclass.
How type() Function Works in Python?
• In the given example, we are printing the type of variable x. We will
determine the type of an object in Python.
• x = 10
• print(type(x))
• Output
• <class 'int'>
• The is operator in Python is used to compare the identity of two
objects, not their value. It checks whether two references point to the
same object in memory.
• Syntax
• a is b
• Returns True if a and b reference the same object.
• Returns False if they do not.
• x = [1, 2, 3]
• y = [1, 2, 3]
• z=x

• print(x is y) # False, because `x` and `y` are two different objects with
the same content
• print(x is z) # True, because `z` is a reference to `x`
Key Points:
• is vs ==:
• is compares the identity of objects (memory location).
• == compares the value of objects.
• a = [1, 2, 3]
• b = [1, 2, 3]

• print(a == b) # True, because their values are the same


• print(a is b) # False, because they are different objects
Used with Singleton Objects:
• The is operator is commonly used for checking None, as there is only
one instance of None in Python.
• x = None
• if x is None:
• print("x is None")
Control Flow Statements
• Control flow statements in Python dictate the flow of execution in a
program based on conditions or loops.
• These statements are used to execute specific blocks of code depending on
various conditions or to repeat code multiple times.
if Statement in Python
• If the simple code of block is to be performed if the condition
holds true then the if statement is used.
• Here the condition mentioned holds then the code of the block
runs otherwise not.
• Syntax: if condition:
• # Statements to execute if
• # condition is true

Flowchart of if Statement in Python
Basic Conditional Check with if
Statement
# if statement example
if 10 > 5:
print("10 greater than 5")

print("Program ended")
if else Statement
• In conditional if Statement the additional block of code is
merged as else statement which is performed when if
condition is false.
• Python if-else Statement Syntax
• Syntax: if (condition): # Executes this block if # condition is true
• else: # Executes this block if # condition is false
Flow Chart of if-else Statement
if else statement Example
# if..else statement example
x=3
if x == 4:
print("Yes")
else:
print("No")
Nested if Statement
• if statement can also be checked inside other if statement.
• This conditional statement is called a nested if statement.
• This means that inner if condition will be checked only if outer
if condition is true and by this, we can see multiple conditions
to be satisfied.

Syntax
• if (condition1): # Executes when condition1 is true
• if (condition2): # Executes when condition2 is true
• # if Block is end here
• # if Block is end here
Flow chart of Nested If Statement
Nested if Example
# Nested if statement example
num = 10

if num > 5:
print("Bigger than 5")

if num <= 15:


print("Between 5 and 15")
if-elif Statement
• The if-elif statement is shortcut of if..else chain.
• While using if-elif statement at the end else block is added
which is performed if none of the above if-elif statement is
true.
• Syntax
• if (condition):
• statementelif (condition):
• statement..else: statement
Flow Chart of Python if-elif Statement
Example if-elif-else Structure
# if-elif statement example
letter = "A"

if letter == "B":
print("letter is B")

elif letter == "C":


print("letter is C")

elif letter == "A":


print("letter is A")

else:
print("letter isn't A, B or C")
Can We Use Elif in Nested If?
Are You Allowed to Nest If Statements
Inside Other If Statements in Python?
• What is the Difference Between if-else and Nested If Statements in
Python?
What is the Maximum Number of Elif Clauses
You Can Have in a Conditional?
Looping Statements
• Loops are used to repeat a block of code multiple times.
• Statements used to control loops and change the course of iteration
are called control statements.
• All the objects produced within the local scope of the loop are
deleted when execution is completed.

While Loop in Python
• A while loop is used to execute a block of statements
repeatedly until a given condition is satisfied.
• When the condition becomes false, the line immediately after
the loop in the program is executed.
• Python While Loop Syntax:

• while expression:

• statement(s)
While Loop Example
count = 0
while (count < 3):
count = count + 1
print(“Luck Favors Brave")
Using else statement with While Loop
in Python
• The else clause is only executed when your while condition
becomes false.
• If you break out of the loop, or if an exception is raised, it won’t
be executed.
• Syntax of While Loop with else statement:
while condition:
# execute these statements
else:
# execute these statements
count = 0
while (count < 3):
count = count + 1
print("Hello Geek")
else:
print("In Else Block")
Infinite While Loop
• If we want a block of code to execute infinite number of time,
we can use the while loop in Python to do so.
• The code uses a ‘while' loop with the condition (count == 0). This loop
will only run as long as count is equal to 0.
• Since count is initially set to 0, the loop will execute indefinitely
because the condition is always true.
count = 0
while (count == 0):
print(“Game on")
for loop
• Python's for loop is designed to repeatedly execute a code
block while iterating through a list, tuple, dictionary, or other
iterable objects of Python.
• The process of traversing a sequence is known as iteration.
• Syntax of the for Loop
• for value in sequence:
• { code block }
# Python program to show how the for loop works

# Creating a sequence which is a tuple of numbers


numbers = [4, 2, 6, 7, 3, 5, 8, 10, 6, 1, 9, 2]

# variable to store the square of the number


square = 0

# Creating an empty list


squares = []

# Creating a for loop


for value in numbers:
square = value ** 2
[Link](square)
print("The list of squares is", squares)
Using else Statement with for Loop
• As already said, a for loop executes the code block until
the sequence element is reached.
• The statement is written right after the for loop is
executed after the execution of the for loop is complete.
• Only if the execution is complete does the else
statement comes into play. It won't be executed if we
exit the loop or if an error is thrown.
# Python program to show how if-
else statements work

string = "Python Loop"

# Initiating a loop
for s in a string:
# giving a condition in if block
if s == "o":
print("If block")
# if condition is not satisfied then else block will
be executed
else:
print(s)
Break Statement
• The break statement in Python is used to terminate the loop or
statement in which it is present.
• After that, the control will pass to the statements that are
present after the break statement, if available.
• If the break statement is present in the nested loop, then it
terminates only those loops which contain the break
statement.

Syntax of Break Statement
for / while loop:
# statement(s)
if condition:
break
# statement(s)
# loop end
Working of Break Statement
Example Program
# Python program to demonstrate
# break statement
s = ‘karnataka'
# Using for loop
for letter in s:
print(letter)
# break the loop as soon it sees ‘a'
# or ‘k'
if letter == ‘a' or letter == ‘k':
break
print("Out of for loop")
print()
i=0
# Using while loop
while True:
print(s[i])

# break the loop as soon it sees ‘a'


# or ‘k'
if s[i] == ‘a' or s[i] == ‘k':
break
i += 1

print("Out of while loop")


Continue Statement
• Continue is also a loop control statement just like the break
statement. continue statement is opposite to that of the break
statement, instead of terminating the loop, it forces to execute
the next iteration of the loop.
• As the name suggests the continue statement forces the loop
to continue or execute the next iteration.
• When the continue statement is executed in the loop, the code
inside the loop following the continue statement will be
skipped and the next iteration of the loop will begin.
Syntax of Continue Statement
for / while loop:
# statement(s)
if condition:
continue
# statement(s)
Continue Statement
Continue Statement Example Program
# Python program to
# demonstrate continue
# statement

# loop from 1 to 10
for i in range(1, 11):

# If i is equals to 6,
# continue to next iteration
# without printing
if i == 6:
continue
else:
# otherwise print the value
# of i
print(i, end = " ")
Sequences – Strings
• Strings are sequences of characters enclosed in quotes (single ' ',
double " ", or triple ''' ''').

string1 = 'Hello'
string2 = "World"
string3 = '''Python Programming'''
String Operations:
Operation Description Example
a + b → "Hello" + "World" →
Concatenation Combine two strings
"HelloWorld"
Repetition Repeat a string a * 3 → "Hi" * 3 → "HiHiHi"

Indexing Access a specific character a[0] → "Python"[0] → 'P'

Slicing Access substring a[1:4] → "Python"[1:4] → 'yth'

Membership Check substring 'H' in "Hello" → True

Length Length of the string len(a) → len("Python") → 6


What MCA mean to you?

Answer in a word/phrase/sentence/
paragraph
Built-In Functions
Commonly Used Built-In Functions for Strings:

Function Description Example


len() Returns length of a string len("Hello") → 5
str() Converts to a string str(123) → "123"
Converts all characters to
lower() "HELLO".lower() → "hello"
lowercase
Converts all characters to
upper() "hello".upper() → "HELLO"
uppercase
Removes leading/trailing
strip() " hello ".strip() → "hello"
whitespaces
Splits the string based on a
split() "a,b,c".split(',') → ['a', 'b', 'c']
delimiter
join() Joins elements of a list into a string ','.join(['a', 'b', 'c']) → "a,b,c"
Commonly Used Modules
• A module is a file containing Python code (functions, classes, or
variables).
• Examples: math, os, sys, random, etc.
Important Modules and Common Functions:

Module Function Description Example


math sqrt(x) Returns square root [Link](16) → 4.0
Returns current working [Link]() →
os getcwd()
directory '/home/user'
Returns list of command- [Link][0] →
sys argv
line arguments script_name.py
Returns random integer [Link](1, 10) →
random randint(a, b)
between a and b 5
Function Definition and Calling the Function
• Functions are defined using the def keyword followed by the function
name, parentheses, and a colon.
• The function's body is indented, and it contains the code to be
executed when the function is called.
• Functions can also take parameters (inputs) and return values
(outputs).
Defining a Function in Python
def function_name(parameters):
"""
Optional docstring: Describes the function.
"""
# Function body: Write the logic here
return result # Optional
• function_name: The name you give to your function.
• parameters: Variables that the function accepts as input (optional).
• return: Sends back a value (optional).
• Example 1: Function Without Parameters and Return Value
• def greet():
• print("Hello, welcome to Python programming!")
• Calling the Function:
• greet() # Output: Hello, welcome to Python programming!
• Example 2: Function with Parameters
• def greet_person(name):
• print(f"Hello, {name}! Welcome to Python programming!")
• def greet_person(name):
• print(f"Hello, {name}! Welcome to Python programming!")
• Calling the Function:
• greet_person("Alice") # Output: Hello, Alice! Welcome to Python
programming!
Function with a Return Value
• def add_numbers(a, b):
return a + b
• Calling the Function:
• result = add_numbers(5, 3)
• print(result) # Output: 8
Function with Default Parameters
• def greet_person(name="Guest"):
• print(f"Hello, {name}! Welcome to Python programming!")
• Calling the Function:pythonCopy code
• greet_person() # Output: Hello, Guest! Welcome to Python
programming!
• greet_person("Charlie") # Output: Hello, Charlie! Welcome to Python
programming!
Using a Function with Multiple Parameters
• def calculate_area(length, width):
• return length * width
• Calling the Function:pythonCopy code
• area = calculate_area(10, 5)
• print(f"Area: {area}") # Output: Area: 50
The return Statement and void Function
• The return Statement
• The return statement is used in a function to send a value back to the caller.
Once a return statement is executed, the function stops execution.
• def function_name(parameters):
• # Logic
• return value # Sends value back to the caller
• def square(number):
• return number * number
• result = square(5)
• print(result) # Output: 25
• The function square takes a parameter number and returns the square of it.
• The returned value can be assigned to a variable or used directly.
A function can return multiple values as a
tuple.
• def get_coordinates():
• return 10, 20

• x, y = get_coordinates()
• print(x, y) # Output: 10 20
• If no return statement is present, the function implicitly returns None.
Void Functions
• A void function is a function that does not return a value.
• It is primarily used for performing an action (like printing output or modifying
a global variable).
• Syntax
• def function_name(parameters):
• # Logic without a return statement
• Example
• def print_message():
• print("Hello, this is a void function!")

• print_message()
• # Output: Hello, this is a void function!
• The function print_message performs an action (printing a message) but does
not return anything.
Important Points
• Calling a void function does not produce a return value.
• By default, void functions return None.
• Example Showing Implicit None:
• def do_nothing():
• pass
• print(do_nothing()) # Output: None
Comparison of return Statement and Void
Functions

Aspect Function with return Void Function


Purpose Returns a value to the caller Performs an action, no return

Return Value Explicit value Implicitly returns None

Calculations, data Printing, updating


Usage Example
manipulation variables
Scope and Lifetime of Variables
• In Python, variables are the containers for storing data values.
Unlike other languages like C/C++/JAVA, Python is not “statically
typed”.
• We do not need to declare variables before using them or declare
their type.
• A variable is created the moment we first assign a value to it.
• The location where we can find a variable and also access it if
required is called the scope of a variable.
Python Local variable
• Local variables are those that are initialized within a function
and are unique to that function.
• It cannot be accessed outside of the function.

def f():
# local variable
s = “Welcome to the World of AI"
print(s)
# Driver code
f()

Output
Welcome to the World of AI
If we will try to use this local variable outside
the function then let’s see what will happen.
def f():
# local variable
s = "I love Geeksforgeeks"
print("Inside Function:", s)
# Driver code
f()
print(s)

Output
NameError: name 's' is not defined
Python Global variables
• Global variables are the ones that are defined and declared
outside any function and are not specified to any function. They
can be used by any part of the program.
• # This function uses global variable s
def f():
print(s)
# Global scope
s = “ML is the subset of AI"
f()

Output
ML is the subset of AI
Global and Local Variables with the
Same Name
# This function has a variable with
# name same as s.
def f():
s = “Blockchain." Blockchain.
Neural Network
print(s)
# Global scope
s = “Neural Network"
f()
print(s)
Lifetime of Variables
• The lifetime of a variable refers to the duration the variable exists in
memory during program execution.
• Lifetime of Local Variables
• Local variables are created when a function is called.
• They are destroyed when the function completes execution.
def my_function():
local_var = 10 # Created
print(local_var)

my_function()
# local_var is destroyed after the function ends.
Lifetime of Global Variables
• Global variables persist throughout the program's execution and are
destroyed when the program ends.
• global_var = "I exist until the program ends"
• Garbage Collection
• In Python, unused variables are automatically cleaned up by the
garbage collector to free memory.

•Local Scope: Variables inside functions.


•Global Scope: Variables outside functions, accessible everywhere unless shadowed.
•Nonlocal Scope: Variables in outer functions, accessible to inner functions.
•Lifetime: Local variables exist during function execution, while global variables persist until the
program ends.
Python Nonlocal keyword
• In Python, the nonlocal keyword is used in the case of nested
functions.
• This keyword works similarly to the global, but rather than
global, this keyword declares a variable to point to the variable
of an outside enclosing function, in case of nested functions.

# demonstrating without non local
# inner loop not changing the value of outer a
# Python program to demonstrate
# prints 5
# nonlocal keyword
print("Value of a without using nonlocal is : ", end="")
print("Value of a using nonlocal is : ", end="")
def outer():
a=5
def outer():
def inner():
a=5
nonlocal a
a = 10
def inner():
inner()
a = 10
print(a)
inner()
outer()
print(a)

outer()

Output
Value of a using nonlocal is : 10
Value of a without using nonlocal is : 5
Default Parameters
• In Python, you can specify default values for function parameters.
• If a value is not provided during the function call, the default value is used.
• Syntax
• def function_name(param1=default_value):
• # Function body
• Example
def greet(name="Guest"):
print(f"Hello, {name}!")

greet() # Output: Hello, Guest!


greet("Alice") # Output: Hello, Alice!
• Default parameters must come after non-default parameters in the function
signature.
Keyword Arguments
• Keyword arguments are passed to a function by explicitly naming the
parameters in the call.
• They allow arguments to be passed in any order.
def introduce(name, age):
print(f"My name is {name} and I am {age} years old.")

introduce(age=25, name="Alice")
# Output: My name is Alice and I am 25 years old.
*args (Arbitrary Positional Arguments)
• *args allows you to pass a variable number of positional arguments to
a function.
• It collects arguments into a tuple.
def add_numbers(*args):
return sum(args)

print(add_numbers(1, 2, 3)) # Output: 6


print(add_numbers(10, 20)) # Output: 30
Key Points:
•Use *args when the number of inputs is uncertain.
•It must be declared after regular parameters.
**kwargs (Arbitrary Keyword Arguments)
• **kwargs allows you to pass a variable number of keyword arguments.
• It collects arguments into a dictionary.

def display_info(**kwargs):
for key, value in [Link]():
print(f"{key}: {value}")

display_info(name="Alice", age=25, city="New York")


# Output:
# name: Alice
# age: 25
# city: New York
Command Line Arguments
• Command line arguments are inputs provided to a script when it is
executed from the command line.
• In Python, these arguments can be accessed using the sys module.
import sys
def main():
args = [Link] # List of command line
arguments
print(f"Arguments passed: {args}")
if __name__ == "__main__":
main()

Save the following script as [Link]


Run the script from the terminal:
• $ python [Link] arg1 arg2

Output
Arguments passed: ['[Link]', 'arg1', 'arg2']

[Link][0] contains the script name.


[Link][1:] contains the actual arguments.
Summary
• Default Parameters: Predefined parameter values used when no
argument is provided.
• Keyword Arguments: Explicitly name parameters in function calls.
• *args: Pass variable-length positional arguments as a tuple.
• **kwargs: Pass variable-length keyword arguments as a dictionary.
• Command Line Arguments: Input arguments passed when running a
script.
Module-2
Python Collection Objects, Classes
• Python provides several built-in collection types to store and manage
data efficiently. These include lists, tuples, sets, and dictionaries.
Each collection type is suited for specific use cases.
1. Lists
2. Tuples
3. Sets
4. Dictionaries
Strings
• Python string is the collection of the characters surrounded by single quotes, double quotes, or triple
quotes.
• The computer does not understand the characters; internally, it stores manipulated character as the
combination of the 0's and 1's.
• Creating String in Python
• We can create a string by enclosing the characters in single-quotes or double- quotes.
• Python also provides triple-quotes to represent the string, but it is generally used for multiline string
or docstrings.
#Using single quotes
str1 = 'Hello Python'
print(str1)
#Using double quotes
str2 = "Hello Python"
print(str2)

#Using triple quotes


str3 = '''''Triple quotes are generally used for
represent the multiline or
docstring'''
print(str3)
Strings indexing and splitting
• Like other languages, the indexing of the Python strings starts from 0. For
example, The string "HELLO" is indexed as given in the below figure.
Consider the following example:
str = "HELLO"
print(str[0])
print(str[1])
print(str[2])
print(str[3])
print(str[4])
# It returns the IndexError because 6th index doesn't exist
print(str[6])

• the slice operator [] is used to access the individual characters of the string.
However, we can use the : (colon) operator in Python to access the substring from
the given string. Consider the following example.

Here, we must notice that the upper range


given in the slice operator is always exclusive
i.e., if str = 'HELLO' is given, then str[1:3] will
always include str[1] = 'E', str[2] = 'L' and
nothing else.
Consider the following example:
# Given String
str = “MITTMCA"
# Start Oth index to end Output
print(str[0:]) MITTMCA
# Starts 1th index to 4th index ITTM
print(str[1:5]) TT
# Starts 2nd index to 3rd index MIT
print(str[2:4]) MCA
# Starts 0th to 2nd index
print(str[:3])
#Starts 4th to 6th index
print(str[4:7])
Basic String Operations
1. Concatenation
Joining two or more strings using the + operator 3. Membership Test
Example: Use the in operator to check if a substring exists in a string.
str1 = "Hello" str1 = "Python programming"
str2 = "World" print("Python" in str1) # Output: True
result = str1 + " " + str2
print(result) # Output: Hello World

2. Repetition
Repeat a string multiple times using the * Operator.
Example:
str1 = "Python "
print(str1 * 3) # Output: Python Python Python
Accessing Characters by Index Number

• Strings are indexed, starting from 0 for the first character and -1 for
the last character.
• Syntax
• string[index]
• Example
• str1 = "Hello"
• print(str1[0]) # Output: H
• print(str1[-1]) # Output: o

String Slicing and Joining
• String Slicing
• Extract a substring using slicing syntax:
Syntax:
• string[start:stop:step]
• Example
• str1 = "Hello, World!"
• print(str1[0:5]) # Output: Hello
• print(str1[::-1]) # Output: !dlroW ,olleH (reverse)
• Joining Strings
• Join multiple strings into one using join().
• Syntax
• [Link](iterable)
• Example
• words = ["Python", "is", "awesome"]
• sentence = " ".join(words)
• print(sentence) # Output: Python is awesome
String methods:

✓[Link]()
✓[Link]()
✓[Link]()
✓rstrip()
✓lstrip()
✓[Link]()
✓[Link]() : gives position of first occurrence of the string
passed
✓[Link]()
String Concatenation
• Concatenation means joining two or more strings together.
• To concatenate strings, we use + operator.
• When we work with numbers, + will be an operator for addition, but
when used with strings it is a joining operator.

Example:
s1 = “Hello”
s2 = “Good Morning!”
s3 = s1 + “ ” + s2
print(s3)

Output:
Hello Good Morning!
String repetition
• The * symbol used to represent multiplication, but when the
operand on the left side of the * is a list, it becomes the
repetition operator.
• The repetition operator makes multiple copies of a list and joins
them all together. Lists can be created using the repetition
operator, *.
Example:
• numbers = [1] * 5
• print(numbers)
Output: [1, 1, 1, 1, 1]
• n = [0, 1, 2] * 3
• print(n)
Output: [0, 1, 2, 0, 1, 2, 0, 1, 2]
Format Function

• format() is technique of the string category permits to try and do variable


substitutions and data formatting. It enables to concatenate parts of a string at
desired intervals through point data format.
• Formatters work by fixing one or a lot of replacement fields or placeholders
outlined by a pair of curly brackets “{}” — into a string and calling the [Link]()
technique.
• Pass the format() method the value you wish to concatenate with the string. This
value will be printed in the same place that your placeholder {} is positioned the
moment you run the program.
Single Formatter:
Single formatters can be defined as those where there is only one placeholder.

>>>print("today is {}".format('Monday'))
today is Monday

Multiple Formatter:
• Let’s say if there is another variable substitution required in a
sentence, this can be done by adding another set of curly brackets where we
want substitution and passing a second value into format().
• Python will then replace the placeholders by values that are passed as the
parameters.

>>>print("today is {} and its {}".format('Monday', 'raining'))


today is Monday and its raining
year = int(input("enter a year: "))
if(year % 4 == 0):
print(f"{year}, is a leap year")
else:
print(f"{year}, is not a leap year")
Output: enter a year: 2010
2010, is not a leap year
OR

year = int(input("enter a year: "))


if(year % 4 == 0):
print("{} is a leap year".format(year))
else:
print("{} is not a leap year".format(year))
Output: enter a year: 2020
2020 is a leap year
List:
➢List is a container that holds many objects under a single name.
➢List can be written as a list of comma-separated values (items)
between square brackets.
➢List is a collection of arrays which is changeable and ordered.
➢Lists are similar to arrays in C. One difference between them is that
all the items belonging to a list can be of different data type.
➢They have indexes same as strings.
➢Lists can be nested just like arrays, i.e., you can have a list of lists.
➢Lists are mutable.
• Syntax:
List_name = [item1 , item2 , item3]
List_name = []
List_name[index]
list1 = [ 'abcd', 786 , 2.23, 'john', 70.2 ]
list2= [123, 'john']

Print(list1) # Prints complete list


print (list1[0] ) # Prints first element of the list
print (list1[1:3]) # Prints elements starting from 2nd till 3rd
print (list1[2:] ) # Prints elements starting from 3rd element
print (list2 * 2) # Prints list two times
print (list1 + list2) # Prints concatenated lists
list1[0]=7634 # Prints the new value 7634 in place of ‘abcd’
Output:
['abcd', 786, 2.23, 'john', 70.2]
abcd
[786, 2.23]
[2.23, 'john', 70.2]
[123, 'john', 123, 'john']
['abcd', 786, 2.23, '[7634, 786, 2.23, 'john', 70.2]john', 70.2, 123, 'john']
List predefined Methods:

✓[Link](‘item’) : adds item to the end


✓[Link]() : adds multiple items to the end of list
✓[Link](‘item’) : removes first occurrence
✓[Link](i , item) : adding single item at a index
✓[Link]() : sorts the list
✓[Link](i) : remove the item at the given position
✓[Link]() : removes last item
✓[Link](‘item’) : gives index for first occurrence
✓[Link]() : removes all items from the list
✓[Link](item): return the no. of times item appeared in the
list
✓[Link]() : reverse the elements of the list in place
Example:
fruits = ['apple', 'banana', 'orange']
[Link]('kiwi')
Output: ['apple', 'banana', 'orange', 'kiwi']
• append() function will add new element to the end of the string
[Link]('apple')
Output: ['apple', 'banana', 'orange', 'kiwi', 'apple']

[Link]('apple')
Output: ['banana', 'orange', 'kiwi', 'apple']

• remove() function will remove the element from the beginning,


if the same element is repeated twice, the element at the
beginning of the list will be removed
fruits = ['banana', 'orange', 'kiwi', 'apple']
[Link](2,'strawberry')
Output: ['banana', 'orange', 'strawberry', 'kiwi', 'apple']
• to add an element at a particular position use insert where we can
specify the index position where the element should be added

[Link]()
Output: 'apple‘
[Link](2)
Output: 'strawberry‘
print(fruits)
Output: ['banana', 'orange', 'kiwi']
• pop will remove the last element by default, if we want to remove
particular element specify the index inside the pop function
fruits = ['banana', 'orange', 'kiwi']
[Link](['apple', 'strawberry'])
Output: ['banana', 'orange', 'kiwi', 'apple', 'strawberry']

• using append we can only add one element at a time to append.


but using extend function we can add multiple elements at the
same time
[Link](['apple', 'strawberry'])
Output: ['banana', 'orange', 'kiwi', ['apple', 'strawberry‘]]

[Link]('banana')
Output: 1
• If item is not present count() will return 0.
• Functions which can be used in lists:

min(list)
max(list)
len(list)
sum(list)
Example:
L = [23,12,34,5,58,16,18]
len(L)
Output: 7

[Link]()
Output: [5, 12, 16, 18, 23, 34, 58]

[Link](reverse=True)
Output: [58, 34, 23, 18, 16, 12, 5]

max(L)
Output: 58

min(L)
Output: 5
# nested list

l1=[1,2,3]
l2=[2,3,4]
[Link](l2)
print(l1)
• Output: [1, 2, 3, [2, 3, 4]]
l1[3]
• Output: [2, 3, 4]
l1[3][2]
• Output: 4
Tuple:
➢A tuple is another sequence data type that is similar to the list.
➢It is a collection of immutable objects.
➢Tuple is ordered and cannot be changed.
➢Duplicate values can be present.
➢The main differences between lists and tuples are:
Lists are enclosed in brackets ( [ ] ), and their elements and size can be
changed, while tuples are enclosed in parentheses ( ( ) ) and cannot be updated.
➢Tuples can be thought of as read-only lists.
➢Tuples can be defined without brackets.
➢Assign multiple values at a time.
t1 = ('abcd', 786 , 2.23, 'john', 70.2 )
t2 = (123, 'john')
print(t1) # Prints complete list
print(t1[0]) # Prints first element of the list
print(t1[1:3]) # Prints elements starting from 2nd till 3rd
print(t1[2:]) # Prints elements starting from 3rd element
print(t2 * 2) # Prints list two times
print(t1 + t2) # Prints concatenated lists
Output:
('abcd', 786, 2.23, 'john', 70.2)
abcd
(786, 2.23)
(2.23, 'john', 70.2)
(123, 'john', 123, 'john‘)
('abcd', 786, 2.23, 'john', 70.2, 123, 'john‘)
t1[0] = 7634

TypeError Traceback (most recent call last)


<ipython-input-103-33476bbf5b52> in <module> ----> 1
t1[0] = 7634
TypeError: 'tuple' object does not support item assignment

Only two methods are available in tuple : count and index

Example:
t = (1,2,3,2,1,2,4,5)
[Link](2)
Output: 3
t = (1,2,'hi')
[Link](1)
Output: 0
Dictionary:
➢Dictionaries are enclosed by curly braces ( { } ) and values can be assigned and
accessed using square braces ( [] ).

➢Dictionaries are unordered, changeable and can be indexed.

➢ Dictionary is a collection of key-value pairs.


Dictionary_name = {key1 : val1 , key2 : val2}

➢Keys can be used as indexes and are unique but values in the keys can be
duplicate.
Example:
>>> camera = {'sony':200, 'nikon': 200}
>>> [Link]({'canon':500})
>>> print(camera)
Output: {'sony': 200, 'nikon': 200, 'canon': 500}
>>> camera['xyz'] = 1000
>>> print(camera)
Output: {'sony': 200, 'nikon': 200, 'canon': 750, 'xyz': 1000}
>>> [Link]()
Output: dict_keys(['sony', 'nikon', 'canon', 'xyz'])
>>> [Link]()
Output: dict_values([200, 200, 750, 1000])
File Handling
• File handling is a mechanism by which we can read data of disk files in
python program or write back data from python program to disk files.
• File handling allows us to store data entered through python program
permanently in disk file and later we can read data back.
• Data files can be stored in 2 ways:
1. Text file: Stores information as a character. If data is “hello” it will
take 5 bytes and if data is floating value 12.45 it will take 5 bytes.
Each line is terminated by special character called EOL(‘\n’ or ‘\r’ or
combination of both.
2. Binary file: Data is stored according to its data type and hence no
translation occurs. It stores the information in the same format as in the
memory.
‘b’ appended to the mode opens the file in binary mode.
File input and output operations:
• To perform operation on file we have to perform the following
steps:
✓Open file
✓Use file to Read or Write
✓Close file

Syntax:
Filevariable = open(filename, mode)
# Do read and write operations
[Link]()
File - Modes

✓‘r’ : Default option. Used when the file will only be read
✓‘w’: used for only writing- an existing file with same name will be erased.
✓‘a’: opens the file for appending. Data written to the file is added to the end
✓‘a+’: same as ‘a’ but file position is at the beginning
✓‘r+’: opens the file for both reading and writing
✓‘w+’: opens for read and writing. Create files if does not exists otherwise truncate
✓‘x’: creates a new file and open for writing
• # open a file
• file1 = open("[Link]", "r")
• # read the file
• read_content = [Link]()
• print(read_content)
• # open file in current directory
• file1 = open("[Link]")
• file1 = open("[Link]") # equivalent to 'r' or 'rt'
• file1 = open("[Link]",'w') # write in text mode
• file1 = open("[Link]",'r+b') # read and write in binary mode
File- Read Methods
• read()
read is used to print the entire contents of file at a time
• readline()
The readline method reads one line from the file and returns it as a
string
• readlines()
The readlines method returns the contents of the entire file as a list
of strings, where each item in the list represents one line of the
file.
• seek(offset)
move cursor back to whichever position we want.
• tell()
determines current position of the file.
Class Definition
• A class in Python is a blueprint for creating objects. It encapsulates data
(attributes) and methods (functions) that operate on the data.
• Syntax
class ClassName:
# Class attributes and methods
• Example
class Animal:
def __init__(self, name):
[Link] = name

def speak(self):
print(f"{[Link]} makes a sound.")

dog = Animal("Dog")
[Link]() # Output: Dog makes a sound.
Constructors
• A constructor is a special method defined using __init__.
• It is automatically called when an object is created and is typically
used to initialize attributes.
• Syntax
class ClassName:
def __init__(self, parameters):
# Initialization
__init__() Function
• __init__() function is a constructor method in Python. It
initializes the object’s state when the object is created.
• If the child class does not define its own __init__() method, it
will automatically inherit the one from the parent class.
# Parent Class: Person
class Person:
def __init__(self, name, idnumber):
[Link] = name
[Link] = idnumber

# Child Class: Employee


class Employee(Person):
def __init__(self, name, idnumber, salary, post):
super().__init__(name, idnumber) # Calls Person's __init__()
[Link] = salary
[Link] = post
super() Function
• super() function is used to call the parent class’s methods.
• In particular, it is commonly used in the child class’s __init__()
method to initialize inherited attributes.
• This way, the child class can leverage the functionality of the
parent class.

# Parent Class: Person
class Person:
def __init__(self, name, idnumber):
[Link] = name
[Link] = idnumber

def display(self):
print([Link])
print([Link])

# Child Class: Employee


class Employee(Person):
def __init__(self, name, idnumber, salary, post):
super().__init__(name, idnumber) # Using super() to call Person's __init__()
[Link] = salary
[Link] = post
• Example of a contructor
class Person:
def __init__(self, name, age):
[Link] = name
[Link] = age

def display(self):
print(f"Name: {[Link]}, Age: {[Link]}")

john = Person("John", 25)


[Link]() # Output: Name: John, Age: 25
Inheritance
• Inheritance allows a class (child class) to inherit attributes and
methods from another class (parent class).
• It enables code reuse and enhances modularity.
• Syntax
class ParentClass:
# Parent class implementation

class ChildClass(ParentClass):
# Additional implementation for child class
Creating a Parent Class
# A Python program to demonstrate inheritance
class Person(object):
# Constructor
def __init__(self, name, id):
[Link] = name
[Link] = id
# To check if this person is an employee
def Display(self):
print([Link], [Link])
# Driver code
emp = Person("Satyam", 102) # An Object of Person
[Link]()
Creating a Child Class
class Emp(Person):
def Print(self):
print("Emp class called")
Emp_details = Emp("Mayank", 103)
# calling parent class function
Emp_details.Display()
# Calling child class function
Emp_details.Print()
Example of Inheritance
class Vehicle:
def __init__(self, brand):
[Link] = brand
def display(self):
print(f"Brand: {[Link]}")
class Car(Vehicle):
def __init__(self, brand, model):
super().__init__(brand) # Call parent constructor
[Link] = model
def display(self):
super().display()
print(f"Model: {[Link]}")
car = Car("Toyota", "Corolla")
[Link]()
# Output:
# Brand: Toyota
# Model: Corolla
Types of Python Inheritance
[Link] Inheritance: A child class inherits from one parent class.
[Link] Inheritance: A child class inherits from more than one
parent class.
[Link] Inheritance: A class is derived from a class which is also
derived from another class.
[Link] Inheritance: Multiple classes inherit from a single
parent class.
[Link] Inheritance: A combination of more than one type of
inheritance.
# 1. Single Inheritance
class Person:
def __init__(self, name):
[Link] = name

class Employee(Person): # Employee inherits from Person


def __init__(self, name, salary):
super().__init__(name)
[Link] = salary

# 2. Multiple Inheritance
class Job:
def __init__(self, salary):
[Link] = salary

class EmployeePersonJob(Employee, Job): # Inherits from both Employee and Job


def __init__(self, name, salary):
Employee.__init__(self, name, salary) # Initialize Employee
Job.__init__(self, salary) # Initialize Job
# 3. Multilevel Inheritance
class Manager(EmployeePersonJob): # Inherits from EmployeePersonJob
def __init__(self, name, salary, department):
EmployeePersonJob.__init__(self, name, salary) # Explicitly initialize EmployeePersonJob
[Link] = department

# 4. Hierarchical Inheritance
class AssistantManager(EmployeePersonJob): # Inherits from EmployeePersonJob
def __init__(self, name, salary, team_size):
EmployeePersonJob.__init__(self, name, salary) # Explicitly initialize EmployeePersonJob
self.team_size = team_size

# 5. Hybrid Inheritance (Multiple + Multilevel)


class SeniorManager(Manager, AssistantManager): # Inherits from both Manager and
AssistantManager
def __init__(self, name, salary, department, team_size):
Manager.__init__(self, name, salary, department) # Initialize Manager
AssistantManager.__init__(self, name, salary, team_size) # Initialize AssistantManager
# Creating objects to show inheritance

# Single Inheritance
emp = Employee("John", 40000)
print([Link], [Link])

# Multiple Inheritance
emp2 = EmployeePersonJob("Alice", 50000)
print([Link], [Link])

# Multilevel Inheritance
mgr = Manager("Bob", 60000, "HR")
print([Link], [Link], [Link])

# Hierarchical Inheritance
asst_mgr = AssistantManager("Charlie", 45000, 10)
print(asst_mgr.name, asst_mgr.salary, asst_mgr.team_size)

# Hybrid Inheritance
sen_mgr = SeniorManager("David", 70000, "Finance", 20)
print(sen_mgr.name, sen_mgr.salary, sen_mgr.department,
sen_mgr.team_size)
Overloading in Python
• Python does not support method overloading in the traditional sense.
• Instead, it allows a single method to handle multiple scenarios using
default arguments or variable arguments (*args and **kwargs).
• Method Overloading with Default Arguments
class Calculator:
def add(self, a, b=0, c=0):
return a + b + c

calc = Calculator()
print([Link](5)) # Output: 5
print([Link](5, 10)) # Output: 15
print([Link](5, 10, 15)) # Output: 30
Operator Overloading
• Operator overloading allows defining custom behavior for operators like +, -,
etc., by overriding special methods such as __add__ and __sub__.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y

def __add__(self, other):


return Point(self.x + other.x, self.y + other.y)

def __str__(self):
return f"({self.x}, {self.y})"

p1 = Point(1, 2)
p2 = Point(3, 4)
p3 = p1 + p2 # Overloaded '+' operator
print(p3) # Output: (4, 6)
Unit 3
Data Pre-processing and Data
Wrangling
Introduction
• Data pre-processing and wrangling are essential steps in the data
analytics pipeline.
• These processes involve cleaning, transforming, and organizing raw
data to make it suitable for analysis.
Data Pre-processing
• Data pre-processing is the initial step in preparing raw data
for analysis.
• It includes techniques to handle missing values, detect and
remove outliers, normalize data, and more.
• Steps in Data Pre-processing
1. Loading the Data
2. Handling Missing Values
3. Removing Duplicates
4. Outlier Detection and Removal
5. Data Normalization and Scaling
Steps in Data Pre-processing
• Loading the Data
Use Python libraries like pandas or numpy to load datasets.
import pandas as pd
df = pd.read_csv('[Link]')
print([Link]())
Steps in Data Pre-processing
• Handling Missing Values
Missing values can be handled by removing or imputing them.
# Removing rows with missing values
[Link](inplace=True)

# Imputing missing values


df['Age'].fillna(df['Age'].mean(), inplace=True)
Steps in Data Pre-processing
• Removing Duplicates
Duplicate rows can skew analysis results.
df.drop_duplicates(inplace=True)
Steps in Data Pre-processing
• Outlier Detection and Removal
Outliers can be detected using statistical methods like IQR or Z-score.
# Removing outliers using IQR
Q1 = df['column'].quantile(0.25)
Q3 = df['column'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['column'] >= Q1 - 1.5 * IQR) & (df['column'] <= Q3 + 1.5 * IQR)]
Steps in Data Pre-processing
• Data Normalization and Scaling
Standardize features to a consistent scale.
from [Link] import StandardScaler
scaler = StandardScaler()
df['scaled_column'] = scaler.fit_transform(df[['column']])
Data Wrangling
• Data wrangling focuses on reshaping and transforming data to make it
analysis-ready.
• It often involves filtering, merging, reshaping, and aggregating datasets.
• Steps in Data Wrangling
1. Filtering Data
2. Merging and Joining Datasets
3. Reshaping Data
4. Aggregating Data
5. Converting Data Types
• Filtering Data
Extract specific rows or columns.
• filtered_data = df[df['Age'] > 30]
• Merging and Joining Datasets
Combine multiple datasets into one.
• # Inner Join
• merged_df = [Link](df1, df2, on='common_column', how='inner’)
• Reshaping Data
Transform data using pivoting or melting.
# Pivoting
pivoted = df.pivot_table(values='value', index='category', columns='sub_category')

# Melting
melted = [Link](id_vars='category', value_vars=['col1', 'col2'])
• Aggregating Data
Summarize data using group-by operations.
aggregated = [Link]('category')['value'].sum()
print(aggregated)
• Converting Data Types
Ensure data is in the correct format.
df['date'] = pd.to_datetime(df['date'])
df['numeric_column'] = pd.to_numeric(df['numeric_column'])
Example Workflow: Data Pre-processing and Wrangling

import pandas as pd
from [Link] import StandardScaler
# Load the dataset
df = pd.read_csv('[Link]')
# Filter rows based on a condition
# Handle missing values filtered_df = df[df['Age'] > 30]
df['Age'].fillna(df['Age'].median(), inplace=True) # Group by and aggregate
# Remove duplicates aggregated = [Link]('Gender')['Income'].mean()
df.drop_duplicates(inplace=True) # Merge with another dataset
df2 = pd.read_csv('additional_data.csv')
# Normalize a column merged_df = [Link](df, df2, on='ID', how='inner')
scaler = StandardScaler() # Reshape data
df['Normalized_Income'] = melted_df = [Link](id_vars='ID', value_vars=['Age',
scaler.fit_transform(df[['Income']]) 'Income'], var_name='Metric', value_name='Value')
print(merged_df.head())
Key Python Libraries for Data Pre-processing
and Wrangling
• pandas: Provides DataFrames for data manipulation.
• numpy: Efficient operations on numerical data.
• scikit-learn: Tools for preprocessing, scaling, and encoding.
• matplotlib / seaborn: Visualization for identifying trends and outliers.
Acquiring Data with Python: Loading from CSV files,
Accessing SQL databases.
• Data acquisition is the process of gathering and importing data for
analysis.
• Python provides powerful tools and libraries to load data from CSV
files and access SQL databases efficiently.
• Loading Data from CSV Files
• CSV (Comma-Separated Values) is a common format for storing tabular data.
Python's pandas library is widely used to load and manipulate CSV files.
Loading Data from a CSV File Example

Syntax import pandas as pd


import pandas as pd
# Load the CSV file
df = pd.read_csv('[Link]') df = pd.read_csv('[Link]')
# Display the first 5 rows
print([Link]())
Specifying Delimiters
• If the file uses a delimiter other than a comma, specify it using the
delimiter parameter.
• df = pd.read_csv('[Link]', delimiter='\t') # Tab-separated file
• Handling Missing Values While Loading
• Use the na_values parameter to define placeholders for missing data.
• df = pd.read_csv('[Link]', na_values=['NA', 'NULL'])
• Skipping Rows
• Skip unnecessary rows while loading the data.
• df = pd.read_csv('[Link]', skiprows=2) # Skip the first two rows

Accessing SQL Databases
• Python provides libraries such as sqlite3, SQLAlchemy, and pandas to
interact with SQL databases.
• These tools allow you to connect, query, and retrieve data from
relational databases.
Using SQLite
• SQLite is a lightweight, file-based database often used for small
applications.
• The sqlite3 library enables Python to connect to SQLite databases.
• Steps to Access Data:
1. Connect to the database.
2. Execute SQL queries.
3. Fetch and load data into a DataFrame for analysis.
import sqlite3
import pandas as pd

# Connect to the SQLite database


conn = [Link]('[Link]')

# Write an SQL query


query = "SELECT * FROM employees"

# Load data into a pandas DataFrame


df = pd.read_sql_query(query, conn)

# Display the data


print([Link]())

# Close the connection


[Link]()
Using SQLAlchemy
• SQLAlchemy is a more advanced library that supports multiple
database systems like MySQL, PostgreSQL, and SQLite.
• Steps to Access Data:
• Install SQLAlchemy:bashCopy code
pip install sqlalchemy
• Create an engine to connect to the database.
from sqlalchemy import create_engine
import pandas as pd

# Create a connection engine


engine = create_engine('sqlite:///[Link]')

# Execute an SQL query and load the data into a DataFrame


df = pd.read_sql_query('SELECT * FROM employees', engine)

# Display the data


print([Link]())
Using MySQL or PostgreSQL
• To connect to MySQL or PostgreSQL databases, you can use libraries like
mysql-connector or psycopg2.
• Steps for MySQL:
• Install the required library
pip install mysql-connector-python
Connect to the database and execute a query.
import [Link]
import pandas as pd

# Connect to the MySQL database


conn = [Link](
host="localhost",
user="username",
password="password",
database="testdb"
)

# Query the data


query = "SELECT * FROM employees"
df = pd.read_sql(query, conn)

# Display the data


print([Link]())

# Close the connection


[Link]()
Comparison of CSV and SQL Data Sources
Aspect CSV Files SQL Databases

Storage Type Flat file Structured, relational data

Size Suitable for small datasets Handles large datasets

Access Speed Slower for large data Optimized for querying

Flexibility Static, predefined structure Dynamic, supports relations


Advantages of Using Python for Data
Acquisition
• Flexibility: Support for multiple file formats and databases.
• Integration: Easily integrates with data manipulation tools like pandas.
• Scalability: Handles small to large datasets efficiently.
• Automation: Enables automated data retrieval and processing.
Cleansing Data with Python
• Data cleansing is a critical step in preparing raw data for analysis.
• It involves removing irrelevant information, normalizing values, and
formatting the data to ensure consistency and reliability.
1. Stripping Out Extraneous Information
• Extraneous information includes unnecessary or irrelevant data that
does not contribute to the analysis.
• Stripping it out ensures a cleaner dataset.
• Removing Leading and Trailing Whitespace
• Use the strip() method to clean strings.
# Cleaning whitespace in a column
import pandas as pd

data = {'Name': [' Alice ', ' Bob ', ' Charlie ']}
df = [Link](data)

df['Name'] = df['Name'].[Link]()
print(df)
# Output:
# Name
# 0 Alice
# 1 Bob
# 2 Charlie
Dropping Irrelevant Columns
• Remove unnecessary columns from a DataFrame.
df = [Link](columns=['IrrelevantColumn'], axis=1)
• Removing Rows with Noise
Filter rows that contain irrelevant or noisy data.

# Drop rows where 'Name' contains noise like "unknown"


df = df[~df['Name'].[Link]('unknown', case=False)]
2. Normalizing Data
• Normalization involves scaling data to a standard range (e.g., 0–1) or
ensuring a consistent format for categorical data.
• Numerical Normalization
Scale numerical values to a uniform range.
from [Link] import MinMaxScaler

data = {'Score': [50, 200, 150]}


df = [Link](data)

scaler = MinMaxScaler()
df['Normalized_Score'] = scaler.fit_transform(df[['Score']])
print(df)
# Output:
# Score Normalized_Score
# 0 50 0.00
# 1 200 1.00
# 2 150 0.67
Standardizing Case in Text Data
• Ensure uniform case (lowercase or uppercase) for consistency.
df['Name'] = df['Name'].[Link]()
print(df)
# Output:
# Name
# 0 alice
# 1 bob
# 2 charlie
Encoding Categorical Variables
• Convert categories to numeric values.

df['Category'] = df['Category'].map({'A': 1, 'B': 2, 'C': 3})


3. Formatting Data
• Data formatting involves ensuring values are in the correct type or
structure required for analysis.
• Converting Data Types
Use pandas to convert columns to appropriate types.

df['Date'] = pd.to_datetime(df['Date'])
df['Amount'] = pd.to_numeric(df['Amount'])

Formatting Dates
Convert dates into a standard format.
df['Formatted_Date'] = df['Date'].[Link]('%Y-%m-%d')
Padding or Trimming Values
Ensure fixed-length values, such as adding leading zeros.
df['ID'] = df['ID'].[Link](5) # Pad ID to 5 digits

Replacing Invalid Data


Replace invalid or inconsistent data with a default value.
df['Age'] = df['Age'].replace('unknown', df['Age'].median())
Example: Full Workflow # Step 1: Strip out extraneous information
df['Name'] = df['Name'].[Link]()
import pandas as pd df = df[~df['Name'].[Link]('unknown', case=False)]
from [Link] import MinMaxScaler
# Step 2: Normalize data
# Sample data scaler = MinMaxScaler()
data = { df['Score'] = df['Score'].fillna(df['Score'].mean()) # Handle
'Name': [' Alice ', ' Bob ', 'unknown', ' Charlie '], missing values
'Score': [50, 200, None, 150], df['Normalized_Score'] = scaler.fit_transform(df[['Score']])
'Date': ['2024-12-01', '2024-12-02', '2024/12/03', None],
'Category': ['A', 'B', 'C', 'B'] # Step 3: Format data
} df['Date'] = pd.to_datetime(df['Date'], errors='coerce') #
Convert to datetime
# Create DataFrame df['Category'] = df['Category'].map({'A': 1, 'B': 2, 'C': 3}) #
df = [Link](data) Encode categories

print(df)
Output
Name Score Date Category Normalized_Score
0 Alice 50.0 2024-12-01 1 0.000000
1 Bob 200.0 2024-12-02 2 1.000000
3 Charlie 150.0 2024-12-03 2 0.666667
Key Python Libraries for Data Cleansing
• pandas: For data manipulation and cleaning.
• numpy: For handling numerical data.
• scikit-learn: For normalization and scaling.
Combining and Merging Data Sets in Python
• Combining and merging data sets is a critical step in data analysis to
consolidate data from multiple sources into a unified structure.
• Python's pandas library provides powerful tools for these operations.
1. Combining Data Sets
• Combining data sets refers to stacking data either vertically (adding
rows) or horizontally (adding columns).
• Concatenation
• Concatenation is used to append data frames either row-wise or
column-wise.
• Syntax
• [Link]([df1, df2], axis=0) # Vertical stacking (default axis=0)
• [Link]([df1, df2], axis=1) # Horizontal stacking
Example (Vertical Stacking):
import pandas as pd

df1 = [Link]({'ID': [1, 2], 'Name': ['Alice', 'Bob']})


df2 = [Link]({'ID': [3, 4], 'Name': ['Charlie', 'David']})

combined_df = [Link]([df1, df2], axis=0)


print(combined_df)
Output
ID Name
0 1 Alice
1 2 Bob
0 3 Charlie
1 4 David
Example (Horizontal Stacking)
df1 = [Link]({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = [Link]({'Age': [25, 30]})

combined_df = [Link]([df1, df2], axis=1)


print(combined_df)

ID Name Age
0 1 Alice 25
1 2 Bob 30
Appending Data
• Appending is similar to concatenation but is specifically used to add
rows to a DataFrame.
• Syntax
[Link](df2, ignore_index=True)

• Example
df1 = [Link]({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
Output
df2 = [Link]({'ID': [3], 'Name': ['Charlie']})
ID Name
0 1 Alice
appended_df = [Link](df2, ignore_index=True)
1 2 Bob
print(appended_df)
2 3 Charlie
2. Merging Data Sets
• Merging combines datasets based on common keys or columns. This
is equivalent to SQL-style joins (inner, outer, left, and right joins).
• Syntax
• [Link](df1, df2, on='common_column', how='join_type')
• Join Types:
• Inner Join (default): Includes rows with matching keys in both DataFrames.
• Left Join: Includes all rows from the left DataFrame and matching rows from
the right.
• Right Join: Includes all rows from the right DataFrame and matching rows
from the left.
• Outer Join: Includes all rows from both DataFrames, filling missing values with
NaN.
Example 1: Inner Join
• df1 = [Link]({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
• df2 = [Link]({'ID': [1, 3], 'Age': [25, 35]})

• merged_df = [Link](df1, df2, on='ID', how='inner')


• print(merged_df)

Output
ID Name Age
0 1 Alice 25
Example 2: Left Join
• merged_df = [Link](df1, df2, on='ID', how='left')
• print(merged_df)

Output
ID Name Age
0 1 Alice 25.0
1 2 Bob NaN
Example 3: Outer Join
• merged_df = [Link](df1, df2, on='ID', how='outer')
• print(merged_df)

Output
ID Name Age
0 1 Alice 25.0
1 2 Bob NaN
2 3 NaN 35.0
Merging on Multiple Keys
• Merging can also be done on multiple columns.
df1 = [Link]({'ID': [1, 2], 'Dept': ['HR', 'Finance'], 'Name': ['Alice', 'Bob']})
df2 = [Link]({'ID': [1, 2], 'Dept': ['HR', 'Finance'], 'Salary': [5000, 6000]})

merged_df = [Link](df1, df2, on=['ID', 'Dept'], how='inner')


print(merged_df)

Output
ID Dept Name Salary
0 1 HR Alice 5000
1 2 Finance Bob 6000
3. Practical Workflow: Combining and Merging Example
# Merge products with sales details
import pandas as pd merged_data = [Link](products, sales_details,
on='ProductID', how='inner')
# DataFrames
sales_2023 = [Link]({'Month': ['Jan', 'Feb'], 'Sales': print("Combined Sales:")
[1000, 1500]}) print(combined_sales)
sales_2024 = [Link]({'Month': ['Mar', 'Apr'], 'Sales': print("\nMerged Data:")
[2000, 2500]}) print(merged_data)
Output
products = [Link]({'ProductID': [1, 2], 'Product': ['A',
Combined Sales:
'B']})
Month Sales
sales_details = [Link]({'ProductID': [1, 2], 'Sales': [3000,
0 Jan 1000
4000]})
1 Feb 1500
0 Mar 2000
# Combine sales data for 2023 and 2024
1 Apr 2500
combined_sales = [Link]([sales_2023, sales_2024], axis=0)
Merged Data:
ProductID Product Sales
0 1 A 3000
1 2 B 4000
Key Takeaways
• concat: For stacking DataFrames vertically or horizontally.
• append: For appending rows to an existing DataFrame.
• merge: For combining datasets based on keys using SQL-like joins.
• Flexibility: Use these methods to handle various data sources and
formats seamlessly.
Reshaping and Pivoting in Python
• Reshaping and pivoting are key data manipulation techniques used to
reorganize data for better analysis and visualization.
• Python's pandas library provides powerful tools for these operations,
including melt(), pivot(), pivot_table(), and others.
Reshaping Data
• Reshaping involves changing the structure of the dataset, such as
transforming data from wide to long format or vice versa.
• 1.1 Melting
• Melting transforms a dataset from a wide format (columns as variables) to a
long format (rows as variables).
• Syntax
• [Link](frame, id_vars=None, value_vars=None, var_name=None,
value_name=None)
id_vars: Columns to keep unchanged.
value_vars: Columns to unpivot.
var_name: Name for the new "variable" column.
value_name: Name for the new "value" column.
import pandas as pd

# Original data in wide format


Output
data = { ID Subject Score
'ID': [1, 2], 0 1 Math 90
'Math': [90, 85], 1 2 Math 85
2 1 Science 95
'Science': [95, 80], 3 2 Science 80
'English': [85, 88] 4 1 English 85
} 5 2 English 88
df = [Link](data)

# Melting the data


melted_df = [Link](df, id_vars=['ID'],
var_name='Subject', value_name='Score')
print(melted_df)
2. Pivoting Data
• Pivoting reshapes data by converting rows into columns or
summarizing data based on an index.
• 2.1 Using pivot()
• The pivot() method is used to reorganize data from a long format to a wide
format. It requires unique combinations of index and columns.
• [Link](index='index_column', columns='column_to_pivot',
values='values_column')
# Long format data
Output
data = { Subject Math Science
'ID': [1, 1, 2, 2], ID
'Subject': ['Math', 'Science', 'Math', 'Science'], 1 90 95
2 85 80
'Score': [90, 95, 85, 80]
}
df = [Link](data)

# Pivot the data


pivoted_df = [Link](index='ID',
columns='Subject', values='Score')
print(pivoted_df)
2.2 Using pivot_table()
• Unlike pivot(), pivot_table() can handle duplicate values by applying
an aggregation function. By default, it uses the mean.
• Syntax
• df.pivot_table(index='index_column', columns='column_to_pivot',
values='values_column', aggfunc='aggregation_function')

Example:
Output
data = {
Subject Math Science
'ID': [1, 1, 2, 2],
ID
'Subject': ['Math', 'Math', 'Science', 'Science'],
1 95 NaN
'Score': [90, 95, 80, 85]
2 NaN 85
}
df = [Link](data)

# Pivot table with aggregation


pivot_table_df = df.pivot_table(index='ID', columns='Subject',
values='Score', aggfunc='max')
print(pivot_table_df)
3. Reshaping with MultiIndex

• For more complex datasets with hierarchical structures, pandas


allows reshaping data with multi-level indices.

Example
data = { Output
'Region': ['North', 'North', 'South', 'South'], Year 2021 2022
'Year': [2021, 2022, 2021, 2022], Region
'Sales': [200, 250, 300, 400] North 200 250
} South 300 400
df = [Link](data)

# Pivot with multi-index


multi_index_df = [Link](index='Region', columns='Year',
values='Sales')
print(multi_index_df)
Practical Example: Reshaping and Pivoting
• Scenario: A retail company tracks sales data across different stores
and products.
Example:
import pandas as pd Output:
# Sales data Long Format:
data = { Store Attribute Value
'Store': ['A', 'A', 'B', 'B'], 0 A Product Apples
'Product': ['Apples', 'Bananas', 'Apples', 'Bananas'], 1 A Product Bananas
'Sales': [100, 150, 200, 250] 2 B Product Apples
} 3 B Product Bananas
df = [Link](data)
Wide Format:
# Reshape to long format using melt Product Apples Bananas
long_format = [Link](df, id_vars=['Store'], Store
var_name='Attribute', value_name='Value') A 100 150
print("Long Format:\n", long_format) B 200 250

# Reshape to wide format using pivot


wide_format = [Link](index='Store', columns='Product',
values='Sales')
print("\nWide Format:\n", wide_format)
Key Takeaways
• melt(): Convert wide data to long format for easier analysis.
• pivot(): Transform long data to wide format with unique values.
• pivot_table(): Handle duplicate values using aggregation functions like
mean, sum, etc.
• Multi-indexing: Useful for hierarchical or grouped data reshaping.
• Reshaping and pivoting are essential for tailoring data to meet specific
analysis or visualization needs.
• With pandas, these operations are intuitive and powerful.
Data Transformation in Python
• Data transformation involves converting data into a more useful and
understandable format.
• It is a key step in preparing data for analysis and machine learning.
• Common transformation tasks include mapping values, encoding
categorical data, aggregating, scaling, and feature engineering.

Why Transform Data?


•To normalize data for better analysis.
•To encode categorical variables for machine learning models.
•To aggregate or summarize data for visualization.
•To extract or derive new features for predictive models.
2. Common Data Transformation Techniques
• Scaling and Normalization
• Encoding Categorical Variables
• Mapping Values
• Binning
• Aggregation and Grouping
• Log Transformation
• Feature Extraction
2.1 Scaling and Normalization
• Scaling adjusts the range of data to a specific range (e.g., 0–1).
• Normalization transforms data to have a mean of 0 and a standard
deviation of 1.
import pandas as pd
from [Link] import MinMaxScaler, StandardScaler

# Sample data
data = {'Age': [25, 35, 45], 'Salary': [50000, 70000, 100000]} Age Salary Scaled_Salary Normalized_Age
df = [Link](data) 0 25 50000 0.000000 -1.224745
1 35 70000 0.333333 0.000000
# Min-Max Scaling 2 45 100000 1.000000 1.224745
scaler = MinMaxScaler()
df['Scaled_Salary'] = scaler.fit_transform(df[['Salary']])

# Standardization
std_scaler = StandardScaler()
df['Normalized_Age'] = std_scaler.fit_transform(df[['Age']])

print(df)
2.2 Encoding Categorical Variables
• Categorical variables need to be converted into numeric format for
machine learning models.
• One-Hot Encoding: Creates binary columns for each category.
• Label Encoding: Assigns integer values to categories.
from [Link] import OneHotEncoder, LabelEncoder
One-Hot Encoded:
# Data [[1. 0. 0.]
df = [Link]({'City': ['New York', 'Paris', 'London']}) [0. 1. 0.]
[0. 0. 1.]]
# One-Hot Encoding Label Encoded:
encoder = OneHotEncoder() City City_Label
one_hot = encoder.fit_transform(df[['City']]).toarray() 0 New York 2
1 Paris 1
# Label Encoding 2 London 0
label_encoder = LabelEncoder()
df['City_Label'] = label_encoder.fit_transform(df['City'])

print("One-Hot Encoded:\n", one_hot)


print("Label Encoded:\n", df)
2.3 Mapping Values
• Use map() or replace() to transform column values based on a
mapping.
City City_Code
import pandas as pd 0 New York 1
1 Paris 2
# Sample DataFrame 2 London 3
data = {'City': ['New York', 'Paris', 'London', 'Paris', 'New York']} 3 Paris 2
df = [Link](data) 4 New York 1

# Mapping City names to codes using a dictionary


city_mapping = {'New York': 1, 'Paris': 2, 'London': 3}
df['City_Code'] = df['City'].map(city_mapping)

# Displaying the updated DataFrame


print(df)
2.4 Binning
• Binning groups continuous data into discrete intervals or categories.
# Data
Age Age_Group
df = [Link]({'Age': [25, 35, 45, 55, 65]})
0 25 Young
1 35 Middle-Aged
# Define bins
2 45 Middle-Aged
bins = [0, 30, 50, 70]
3 55 Senior
labels = ['Young', 'Middle-Aged', 'Senior']
4 65 Senior
df['Age_Group'] = [Link](df['Age'], bins=bins, labels=labels)
print(df)
2.5 Aggregation and Grouping
• Summarize data by aggregating rows based on groups.
# Data Salary
Department
df = [Link]({ HR 3500.0
'Department': ['HR', 'IT', 'HR', 'IT'], IT 5500.0
'Salary': [3000, 5000, 4000, 6000]
})

# Group by and aggregate


grouped = [Link]('Department').agg({'Salary': 'mean'})
print(grouped)
2.6 Log Transformation
• Apply logarithmic transformations to reduce skewness in data.

import pandas as pd Name Salary Log_Salary


import numpy as np 0 Alice 50000 10.819778
1 Bob 70000 11.156251
# Sample DataFrame 2 Charlie 100000 11.512925
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Salary': [50000, 70000, 100000]}
df = [Link](data)

# Applying log transformation to the 'Salary' column


df['Log_Salary'] = [Link](df['Salary'])

# Displaying the updated DataFrame


print(df)
2.7 Feature Extraction
• Create new features based on existing ones.
Example
# Data
df = [Link]({'Date': ['2024-12-01', '2024-12-02']}) Output
Date Year Month Day
# Extract year, month, and day 0 2024-12-01 2024 12 1
df['Year'] = pd.to_datetime(df['Date']).[Link] 1 2024-12-02 2024 12 2
df['Month'] = pd.to_datetime(df['Date']).[Link]
df['Day'] = pd.to_datetime(df['Date']).[Link]

print(df)
3. Practical Workflow for Data Transformation
import pandas as pd
from [Link] import MinMaxScaler, LabelEncoder df = [Link](data)
# Step 1: Scaling numerical columns
# Sample data
scaler = MinMaxScaler()
data = { df['Scaled_Salary'] = scaler.fit_transform(df[['Salary']])
'ID': [1, 2, 3], # Step 2: Encoding categorical column
'Name': ['Alice', 'Bob', 'Charlie'],
label_encoder = LabelEncoder()
df['City_Code'] = label_encoder.fit_transform(df['City'])
'Age': [25, 35, 45], # Step 3: Feature extraction
'Salary': [50000, 70000, 100000], df['Age_Group'] = [Link](df['Age'], bins=[0, 30, 50],
'City': ['New York', 'Paris', 'London'] labels=['Young', 'Middle-Aged'])
print(df)
}
Output
ID Name Age Salary City Scaled_Salary City_Code
Age_Group
0 1 Alice 25 50000 New York 0.000000 2 Young
1 2 Bob 35 70000 Paris 0.333333 1 Middle-Aged
2 3 Charlie 45 100000 London 1.000000 0 Middle-
Aged
Key Libraries for Data Transformation
• pandas: For data manipulation and transformation.
• numpy: For numerical transformations.
• scikit-learn: For scaling, encoding, and feature engineering.
• Data transformation is a fundamental step in the data preparation
pipeline. By effectively applying these techniques, you can improve
the quality, consistency, and analytical utility of your datasets.
String Manipulation and Regular Expressions in
Python
• String manipulation and regular expressions (regex) are essential for
working with text data.
• Python provides built-in string methods and the re module for
efficient string processing.
• 2.1 Basic Functions in [Link](pattern, string)
Searches for the first occurrence of the pattern in the string.
import re
result = [Link](r"\d+", "The year is 2024")
print([Link]()) # 2024
• [Link](pattern, string)
• Matches the pattern only at the beginning of the string.
result = [Link](r"The", "The year is 2024")
print([Link]()) # The
• [Link](pattern, string)
• Returns all occurrences of the pattern in the string.
result = [Link](r"\d+", "There are 3 cats and 4 dogs.")
print(result) # ['3', '4']
• [Link](pattern, replacement, string)
• Replaces occurrences of the pattern with the replacement string.
result = [Link](r"\d+", "X", "There are 3 cats and 4 dogs.")
print(result) # There are X cats and X dogs.

• [Link](pattern, string)
• Splits the string based on the pattern.
result = [Link](r"\s", "Split this string")
print(result) # ['Split', 'this', 'string']
2.2 Regex Metacharacters and Special Sequences
Pattern Description Example
Matches any character except
. [Link](r"c.t", "cat")
newline.
^ Matches the start of a string. [Link](r"^The", "The cat")
$ Matches the end of a string. [Link](r"end$", "the end")
\d Matches any digit (0-9). [Link](r"\d", "A1B2C3")
Matches any word character
\w [Link](r"\w", "Hello_123")
(alphanumeric).
\s Matches any whitespace character. [Link](r"\s", "a b c")
* Matches 0 or more occurrences. [Link](r"ab*", "abc abbb")
+ Matches 1 or more occurrences. [Link](r"ab+", "abc abbb")
? Matches 0 or 1 occurrence. [Link](r"ab?", "abc ab")
{n} Matches exactly n occurrences. [Link](r"a{2}", "caaat")
[abc] Matches any one of a, b, or c. [Link](r"[aeiou]", "cat")
` ` Matches either pattern.
2.3 Using Regex for Validation
Example: Validate Email Address
email = "example@[Link]"
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
if [Link](pattern, email):
print("Valid email")
else:
print("Invalid email")
2.3 Using Regex for Validation
• Example: Validate Phone Number
phone = "123-456-7890"
pattern = r"^\d{3}-\d{3}-\d{4}$"
if [Link](pattern, phone):
print("Valid phone number")
else:
print("Invalid phone number")
2.4 Practical Example: Extracting Data
• Extract Hashtags from Text:
text = "Learn #Python and #Regex in 2024!"
hashtags = [Link](r"#\w+", text)
print(hashtags) # ['#Python', '#Regex']
• Extract URLs from Text:
text = "Visit [Link] or [Link]
urls = [Link](r"https?://[^\s]+", text)
print(urls) # ['[Link] '[Link]
3. Combining String Manipulation and Regex
• Example: Process a Log File
import re

log = "2024-12-01 ERROR: Failed to connect to server\n2024-12-02 INFO:


Connection successful"
errors = [Link](r"(\d{4}-\d{2}-\d{2}) ERROR: (.+)", log)
for date, message in errors:
print(f"Date: {date}, Error: {message}")

Output
Date: 2024-12-01, Error: Failed to connect to server
Conclusion
• String Manipulation: Built-in Python methods handle tasks like
splitting, joining, formatting, and cleaning strings.
• Regular Expressions: The re module is a powerful tool for pattern
matching and text extraction.
• Combining these techniques allows for robust text processing in data
analysis, validation, and parsing workflows.
Unit 4
Web Scraping And Numerical Analysis
What is Web Scraping?
• Web scraping, also called web data extraction, is an automated
process of collecting publicly available web data from targeted
websites.
• Instead of gathering data manually, web scraping software can be
used to acquire a vast amount of information automatically, making
the process much faster.
Why is Web Scraping Important?
• Some websites can contain a very large amount of invaluable data such as
stock prices, product details, sports stats, you name it.
• If you want to access this information, you either have to use whatever format
the website uses or copy and paste the information manually into a new
document.
• This can be pretty tedious when you want to extract a lot of information from
a website and here is where web scraping can help.
• Instead of scraping this data manually, in most cases, software tools called web
scrapers are preferred because they are less expensive compared to human
labor and they work at a faster rate.
• Web scrapers can run on your PC or in a data center.
How Does Web Scraping Work?
• Step 1: Retrieving content from a website

• Step 2: Extracting the required data

• Step 3: Store parsed data


Web Scraping
• Web scraping is the process of extracting structured data from
websites for analysis, automation, or integration.
• For example, you might scrape product details from an e-commerce
website or collect weather data from a forecasting site.
1. Concepts in Web Scraping
1. HTTP Requests
• Web scraping relies on the HTTP protocol to fetch webpages.
• Key request types include:
• GET: Retrieves data (e.g., loading a webpage).
• POST: Sends data to the server (e.g., form submission).
2. HTML Structure
• Websites are built using HTML. Scraping involves parsing this structure to extract the
desired elements like tags (<h1>, <p>, <table>), attributes, or classes.
3. Ethical Considerations
• Always follow a website's Terms of Service and check for a [Link] file, which
specifies the allowed scraping practices.
Fetching Web Pages
• Fetching a webpage is the first step in scraping.
• This involves sending an HTTP GET request to the URL and retrieving
the server's response.
• Python’s requests library simplifies HTTP communication.
• The server's response includes metadata (headers) and the content
(HTML).
• This content can then be parsed and processed.
Example Program: Fetching a Web Page
• import requests

• # Fetch a web page


• url = "[Link]
• response = [Link](url)

• # Print the HTML content of the page


• print([Link])
Submitting Forms
• Some web applications require user input via forms (e.g., search
queries, login forms).
• Scraping these pages involves automating form submissions.
• A form submission uses an HTTP POST request.
• Form data is sent as key-value pairs, representing input field names
and values.
Example Program: Submitting a Form
import requests
# URL of the form submission endpoint
url = "[Link]
# Data to submit
form_data = {
"username": "mca_student",
"password": "secure_password"
}
# Submit the form using POST
response = [Link](url, data=form_data)
# Print the server's response
print([Link]())
Downloading Web Pages After Form
Submission
• Some pages are only accessible after form submission (e.g.,
dashboards). Maintaining a session is critical for these scenarios.
• A session preserves cookies and authentication details across
requests.
• Use [Link]() for login and subsequent requests.
Example Program: Downloading Pages After Login
import requests
# Form submission URL
login_url = "[Link]
download_url = "[Link]
# Form data
login_data = {"username": "user", "password": "pass"}
# Create a session to maintain cookies
session = [Link]()
# Submit the form
[Link](login_url, data=login_data)
# Access another page after login
response = [Link](download_url)
# Save the downloaded content
with open("downloaded_page.html", "w") as file:
[Link]([Link])
Parsing HTML with CSS Selectors
• After fetching a webpage, you need to extract relevant data.
• CSS selectors are powerful tools to identify HTML elements based on
tags, classes, or IDs.
• CSS selectors can target:
• Tags: div, p, h1
• Classes: .class-name
• IDs: #id-name
• The BeautifulSoup library in Python parses HTML and enables CSS
selector-based extraction.
Example Program: Using CSS Selectors
from bs4 import BeautifulSoup
import requests

# Fetch the page


url = "[Link]
response = [Link](url)
soup = BeautifulSoup([Link], '[Link]')

# Extract elements using CSS selectors


titles = [Link]("h1, h2, h3")
for title in titles:
print([Link])
Numerical Analysis with NumPy
• Numerical analysis is a branch of mathematics focused on algorithms for
solving problems involving continuous variables.
• Python's NumPy library is a cornerstone for such computations.
• 1. Introduction to NumPy
• NumPy (Numerical Python) is a Python library designed for efficient numerical
computations. It supports:
• Multi-dimensional arrays and matrices.
• High-level mathematical functions like linear algebra, Fourier transforms, and
statistical operations.
• Key Features:
• Speed: NumPy is faster than Python lists due to its underlying implementation in C.
• Memory Efficiency: Stores data compactly.
2. Creating Arrays
• Arrays are central to NumPy. Unlike Python lists, NumPy arrays allow
for vectorized operations.
• A NumPy array is a grid of values of the same type, indexed by a tuple
of non-negative integers.
• Array dimensions are referred to as axes.
Example Program: Creating Arrays
import numpy as np

# 1D Array
arr = [Link]([1, 2, 3, 4, 5])
print("1D Array:", arr)

# 2D Array
arr_2d = [Link]([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr_2d)
Array Operations
• NumPy enables element-wise operations.
• For example:
• Adding 10 to each element: arr + 10
• Multiplying each element by 2: arr * 2
Example Program: Array Operations

import numpy as np

arr = [Link]([1, 2, 3, 4, 5])

# Element-wise addition
print("Add 10 to each element:", arr + 10)

# Element-wise multiplication
print("Multiply each element by 2:", arr * 2)
4. Statistical Analysis
• NumPy provides built-in functions for statistical analysis, including
mean, median, standard deviation, and variance.
• Mean: Average of elements.
• Standard Deviation: Measures the dispersion of data points.
Example Program: Statistics with NumPy
import numpy as np

arr = [Link]([1, 2, 3, 4, 5])

# Mean and Median


print("Mean:", [Link](arr))
print("Median:", [Link](arr))

# Standard Deviation
print("Standard Deviation:", [Link](arr))
5. Matrix Operations
• Matrices are 2D arrays used for linear algebra computations.
• Dot Product: Computes the product of two matrices.
• Transpose: Swaps rows and columns.
Example Program: Matrix Operations
import numpy as np

# Define matrices
matrix_a = [Link]([[1, 2], [3, 4]])
matrix_b = [Link]([[5, 6], [7, 8]])

# Matrix multiplication
result = [Link](matrix_a, matrix_b)
print("Matrix Multiplication:\n", result)

# Transpose
print("Transpose of matrix_a:\n", matrix_a.T)
Unit 5
Data Visualization
• Data visualization transforms raw data into graphical representations,
enabling easier interpretation and insights.
• Python's libraries like NumPy, Matplotlib, Seaborn, and Pandas provide
powerful tools for creating meaningful visualizations.
• Data visualization is an easier way of presenting the data, however
complex it is, to analyze trends and relationships amongst variables with
the help of pictorial representation.
• The following are the advantages of Data Visualization
• Easier representation of compels data
• Highlights good and bad performing areas
• Explores relationship between data points
• Identifies data patterns even for larger data points
Best Practices to be followed during Data
Visualization
• Ensure appropriate usage of shapes, colors, and size while building
visualization

• Plots/graphs using a co-ordinate system are more pronounced

• Knowledge of suitable plot with respect to the data types brings more
clarity to the information

• Usage of labels, titles, legends and pointers passes seamless


information the wider audience
Data Visualization with NumPy Arrays
• NumPy arrays are a foundational data structure for numerical
computation in Python.
• They integrate seamlessly with visualization libraries like Matplotlib
and Seaborn, enabling efficient and flexible data visualization.
• Visualizing data stored in NumPy arrays provides a better
understanding of patterns, distributions, and relationships.
Why Use NumPy Arrays for Visualization?
1. Efficient Data Handling: NumPy arrays are faster and more memory-
efficient than Python lists.
2. Vectorized Operations: Operations on NumPy arrays are element-
wise, making data manipulation easy before visualization.
3. Integration with Visualization Libraries: NumPy arrays are natively
supported by libraries like Matplotlib and Seaborn.
1. Setting Up NumPy for Visualization
• Install the necessary libraries
• pip install numpy matplotlib
• Import the libraries in your Python script
• import numpy as np
• import [Link] as plt
2. Basic Visualization with NumPy and
Matplotlib
• 2.1 Plotting NumPy Array Data
• Example: Line Plot
import numpy as np
import [Link] as plt

# Create a NumPy array


x = [Link](0, 10, 100) # 100 points between 0 and 10
y = [Link](x)

# Plot
[Link](x, y, label="Sine Wave")
[Link]("Line Plot of NumPy Array")
[Link]("X values")
[Link]("Y values")
[Link]()
[Link]()
2.2 Scatter Plot with NumPy Data
• Scatter plots visualize relationships between two variables.
• Example: Scatter Plot
# Random data using NumPy
x = [Link](50) # 50 random values
between 0 and 1
y = [Link](50)

# Scatter plot
[Link](x, y, color='blue', alpha=0.7)
[Link]("Scatter Plot of NumPy Array")
[Link]("X values")
[Link]("Y values")
[Link]()
3. Visualizing Array Transformations
• You can perform mathematical operations on NumPy arrays and
visualize the results.
• Example: Multiple Trigonometric Functions
# Create x values
x = [Link](0, 10, 100)
# Compute y values
y1 = [Link](x)
y2 = [Link](x)
# Plot both functions
[Link](x, y1, label="Sine")
[Link](x, y2, label="Cosine", linestyle="--")
[Link]("Sine and Cosine Functions")
[Link]("X-axis")
[Link]("Y-axis")
[Link]()
[Link]()
4. Statistical Visualization with NumPy Arrays
• NumPy provides functions for statistical analysis, such as mean, std,
and percentile, which are useful for creating visualizations.
• Example: Histogram of Random DatapythonCopy code
# Generate random data
data = [Link](1000) # 1000 random
values from a normal distribution

# Plot histogram
[Link](data, bins=30, color='green',
edgecolor='black')
[Link]("Histogram of Random Data")
[Link]("Value")
[Link]("Frequency")
[Link]()
5. Advanced Visualizations with NumPy Arrays
• Bar Charts with Aggregated Data
• Bar charts are useful for comparing categorical data.

# Categories and values


categories = [Link](['A', 'B', 'C', 'D'])
values = [Link]([23, 45, 56, 78])

# Plot bar chart


[Link](categories, values, color='orange')
[Link]("Bar Chart Example")
[Link]("Categories")
[Link]("Values")
[Link]()
Practical Workflow Example: Real-World Visualization
• Task: Compare Monthly Sales Data Using Line and Bar Charts
import numpy as np
import [Link] as plt
# Bar chart overlay
# Simulated sales data bar_width = 0.3
months = [Link](['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']) x_indices = [Link](len(months))
sales_2023 = [Link]([100, 150, 200, 250, 300, 350]) [Link](x_indices - bar_width / 2, sales_2023, width=bar_width,
sales_2024 = [Link]([120, 160, 210, 260, 310, 370]) label="2023 Sales")
[Link](x_indices + bar_width / 2, sales_2024, width=bar_width,
# Line plot label="2024 Sales")
[Link](months, sales_2023, label="2023 Sales",
marker='o') [Link]("Monthly Sales Comparison")
[Link](months, sales_2024, label="2024 Sales", [Link]("Months")
marker='s', linestyle='--') [Link]("Sales")
[Link]()
[Link](x_indices, months) # Align bar labels with months
[Link]()
Data Visualization with Matplotlib
• Matplotlib is the foundational library for data visualization in Python,
offering a wide range of customizable plots.
• Matplotlib: A versatile library for creating static, animated, and
interactive visualizations.
• pyplot: A module in Matplotlib that mimics MATLAB-like plotting
functionalities.
• Basic Workflow:
1. Create data (e.g., using NumPy).
2. Use [Link]() or other functions to create plots.
3. Customize the graph with titles, labels, legends, etc.
1.2 Plotting Graphs
• Line Plot
• A line plot is used to visualize trends over a continuous range.
• Example Program: Line Plot
import [Link] as plt
import numpy as np

# Data
x = [Link](0, 10, 100) # 100 points between 0 and 10
y = [Link](x)

# Plot
[Link](x, y, label="Sine Wave")
[Link]("Line Plot Example")
[Link]("X-axis")
[Link]("Y-axis")
[Link]()
[Link]()
1.3 Controlling Graph Appearance
• Matplotlib allows control over graph styles, line properties, and
marker attributes.
• Example: Customizing Line Style and Markers
[Link](x, y, color='red', linestyle='--', linewidth=2, marker='o')
[Link]("Customized Line Plot")
[Link]()
1.4 Adding Text to Graphs
• You can annotate plots by adding titles, labels, and custom text.
• Example: Adding Annotations
[Link](x, y)
[Link]("Adding Annotations")
[Link]("X-axis")
[Link]("Y-axis")
[Link](5, 0, "Midpoint", fontsize=12, color='blue')
[Link]()
1.5 More Graph Types
• Bar Chart
• Bar charts are used to represent categorical data.
categories = ['A', 'B', 'C']
values = [10, 15, 7]

[Link](categories, values, color='green')


[Link]("Bar Chart Example")
[Link]()
1.5 More Graph Types
• Histogram
• Histograms are used to represent data distributions.
data = [Link](1000)

[Link](data, bins=30, color='purple', edgecolor='black')


[Link]("Histogram Example")
[Link]()
1.5 More Graph Types
• Scatter Plot
• Scatter plots visualize relationships between two variables.
x = [Link](50)
y = [Link](50)

[Link](x, y, color='orange')
[Link]("Scatter Plot Example")
[Link]()
1.6 Patches
• Patches in Matplotlib allow you to add shapes like circles, rectangles,
and polygons to plots.
• Example: Adding a Rectangle
from [Link] import Rectangle

fig, ax = [Link]()
ax.add_patch(Rectangle((0.1, 0.2), 0.5, 0.3, color='cyan'))
[Link](0, 1)
[Link](0, 1)
[Link]("Rectangle Patch Example")
[Link]()
2. Advanced Data Visualization with Seaborn
• Seaborn is a higher-level library built on Matplotlib that provides a
more aesthetic interface for creating advanced visualizations.
• Seaborn is a powerful Python library built on top of Matplotlib,
designed to simplify the creation of attractive and informative
statistical graphics.
Key Features of Seaborn
• High-Level Interface: Seaborn provides a more convenient and user-
friendly interface compared to Matplotlib, enabling users to create
complex visualizations with fewer lines of code.
• Built-in Themes and Color Palettes: The library includes aesthetically
pleasing default styles and color palettes, which can be customized to
enhance visual appeal.
• Integration with Pandas: Seamless integration with Pandas
DataFrames allows for easy manipulation and visualization of
structured datasets.
Advanced Visualization Techniques
1. Pair Plots
• Pair plots are an effective way to visualize relationships between multiple
variables in a dataset.
• They create a matrix of scatter plots for each pair of variables, allowing
quick identification of correlations.
import seaborn as sns
import pandas as pd

# Load sample dataset


iris = sns.load_dataset('iris')

# Create pair plot


[Link](iris, hue='species')
[Link]('Pair Plot of Iris Dataset')
[Link]()
2. Heatmaps
• Heatmaps are ideal for visualizing matrix-like data where values are
represented by colors.
• They are particularly useful for displaying correlations between multiple
variables.

# Compute correlation matrix


correlation_matrix = [Link]()

# Create heatmap
[Link](correlation_matrix, annot=True, cmap='coolwarm')
[Link]('Heatmap of Correlations in Iris Dataset')
[Link]()
3. Facet Grids
• Facet grids allow the creation of a grid of plots based on subsets of your
dataset.
• This technique is useful for visualizing the distribution of data across different
categories.
g = [Link](iris, col='species')
[Link]([Link], 'sepal_length')
[Link]()
• Statistical Visualization
• Seaborn simplifies the process of performing and visualizing statistical
analyses:
• Regression Plots: Use regplot() or lmplot() to visualize linear relationships between
variables along with regression lines.
[Link](x='sepal_length', y='sepal_width', data=iris)
[Link]('Regression Plot of Sepal Length vs Width')
[Link]()

• Box Plots and Violin Plots: These plots provide insights into the distribution
and frequency of data points across different categories.
Customization in Seaborn
• Customization Options
• Seaborn offers extensive customization options to enhance the aesthetics and
clarity of visualizations:
• Custom Color Palettes: Users can define custom color palettes using
sns.set_palette().
• Style Settings: Adjust overall styles using sns.set_style() for a polished look.
• You can enhance Seaborn plots by adding themes and color palettes.
• Example: Using Themes
• sns.set_theme(style="darkgrid")
• [Link](x=x, y=y, color="red")
• [Link]("Scatter Plot with Seaborn Theme")
• [Link]()
3. Time Series Analysis with Pandas
• Time series analysis involves statistical techniques to analyze time-
ordered data points, often collected at regular intervals.
• The Pandas library in Python provides robust tools for manipulating
and analyzing time series data, making it a popular choice for data
scientists and analysts.
• Time series analysis is used to examine data points collected over
time intervals. Pandas makes time series analysis easier with its
datetime capabilities.
Creating Time Series Data
• To create a time series in Pandas, you can use the pd.date_range()
function to generate a range of dates and then create a Series or
DataFrame.
import pandas as pd

# Create a date range


date_range = pd.date_range(start='2020-01-01', periods=10,
freq='D')

# Create a time series


ts = [Link](range(len(date_range)), index=date_range)
print(ts)
Resampling Time Series Data
• Resampling involves changing the frequency of the time series data.
• This can be useful for aggregating data over different time periods
(e.g., from daily to monthly).

# Resample to get the mean for every 3 days


resampled_ts = [Link]('3D').mean()
print(resampled_ts)
Time Series Manipulation

• Pandas provides several functions to manipulate time series data:


• Shifting Data: The shift() function allows you to shift the data forward or backward in
time, which is useful for calculating lagged values.
ts_shifted = [Link](1) # Shift by one period
• Rolling Windows: Use rolling windows to compute statistics over a specified window
size, such as moving averages.
• rolling_mean = [Link](window=3).mean()
• Handling Missing Data
• Pandas allows you to handle missing values in time series using methods like fillna(),
which can fill missing values with specified values or methods (e.g., forward fill).

Visualization of Time Series Data
• Visualizing time series data is essential for understanding trends and
patterns.
• You can use Matplotlib or Seaborn alongside Pandas for effective
visualization.
import [Link] as plt

# Plot the original time series


[Link](title='Time Series Example')
[Link]('Date')
[Link]('Values')
[Link]()
Practical Application: Combining Libraries
• Task: Create a multi-layered plot using Seaborn and Matplotlib
import seaborn as sns
import [Link] as plt
import numpy as np

# Data
x = [Link](0, 10, 100)
y = [Link](x)
noise = y + [Link](scale=0.2, size=100)

# Seaborn plot
[Link](x=x, y=noise, label="Noisy Sine", color="blue")

# Add a Matplotlib line


[Link](x, y, label="True Sine", color="red", linestyle="--")

[Link]("Multi-Layered Plot with Seaborn and Matplotlib")


[Link]()
[Link]()

You might also like