0% found this document useful (0 votes)
16 views64 pages

Python-Sem5 (Pranv Sir)

The document provides an introduction to Python, highlighting its features as a high-level, interpreted, and object-oriented programming language suitable for beginners. It covers Python's history, applications, environment setup, keywords, identifiers, indentation, comments, variables, constants, literals, and data types. Additionally, it discusses the importance of Python's readability and its extensive libraries for various programming tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views64 pages

Python-Sem5 (Pranv Sir)

The document provides an introduction to Python, highlighting its features as a high-level, interpreted, and object-oriented programming language suitable for beginners. It covers Python's history, applications, environment setup, keywords, identifiers, indentation, comments, variables, constants, literals, and data types. Additionally, it discusses the importance of Python's readability and its extensive libraries for various programming tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Programing in Python

BCA Semester - 5
Programing in Python

Unit-1
Introduction to Python

Page 1 of 63
Overview of Python
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed to
be highly readable. It uses English keywords frequently where as other languages use punctuation, and it has
fewer syntactical constructions than other languages.
 Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.
 Python is Interactive − You can actually sit at a Python prompt and interact with the interpreter
directly to write your programs.
 Python is Object-Oriented − Python supports Object-Oriented style or technique of programming
that encapsulates code within objects.
 Python is a Beginner's Language − Python is a great language for the beginner-level programmers
and supports the development of a wide range of applications from simple text processing to WWW
browsers to games.

History of Python
Python was developed by Guido van Rossum in the late eighties and early nineties at the National Research
Institute for Mathematics and Computer Science in the Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68, SmallTalk, and Unix
shell and other scripting languages. Python is copyrighted. Like Perl, Python source code is now available
under the GNU General Public License (GPL).Python is now maintained by a core development team at the
institute, although Guido van Rossum still holds a vital role in directing its progress.

Release Dates of Different Versions


Version Release Data

Python 1.0 (first standard release) January 1994


Python 1.6 (Last minor version) September 5, 2000

Python 2.0 (Introduced list comprehensions) October 16, 2000


Python 2.7 (Last minor version) July 3, 2010

Python 3.0 (Emphasis on removing duplicative constructs and module) December 3, 2008
Python 3.5 (Last updated version) September 13, 2015

Python Features
A simple language which is easier to learn

Python has a very simple and elegant syntax. It's much easier to read and write Python programs compared
to other languages like: C++, Java, C#. Python makes programming fun and allows you to focus on the
solution rather than syntax. If you are a newbie, it's a great choice to start your journey with Python.

Free and open-source: You can freely use and distribute Python, even for commercial use. Not only can you
use and distribute software written in it, you can even make changes to the Python's source code. Python
has a large community constantly improving it in each iteration.

Portability: You can move Python programs from one platform to another, and run it without any changes. It
runs seamlessly on almost all platforms including Windows, Mac OS X and Linux.

Extensible and Embeddable: Suppose an application requires high performance. You can easily combine
pieces of C/C++ or other languages with Python code. This will give your application high performance as
well as scripting capabilities which other languages may not provide out of the box.

Page 2 of 63
A high-level, interpreted language: Unlike C/C++, you don't have to worry about daunting tasks like memory
management, garbage collection and so on. Likewise, when you run Python code, it automatically converts your
code to the language your computer understands. You don't need to worry about any lower-level operations.

Large standard libraries to solve common tasks: Python has a number of standard libraries which makes life of a
programmer much easier since you don't have to write all the code yourself. For example: Need to connect MySQL
database on a Web server? You can use MySQLdb library using import MySQLdb . Standard libraries in Python are
well tested and used by hundreds of people. So you can be sure that it won't break your application.

Object-oriented: Everything in Python is an object. Object oriented programming (OOP) helps you solve a complex
problem intuitively. With OOP, you are able to divide these complex problems into smaller sets by creating objects.

Python Environment Setup: Python is available on a wide variety of platforms including Linux and Mac OS X. Let's
understand how to set up our Python environment.

Applications of Python

Web Applications: You can create scalable Web Apps using frameworks and CMS (Content Management System)
that are built on Python. Some of the popular platforms for creating Web Apps are: Django, Flask, Pyramid, Plone,
Django [Link] like Mozilla, Reddit, Instagram and PBS are written in Python.

Scientific and Numeric Computing: There are numerous libraries available in Python for scientific and numeric
computing. There are libraries like: SciPy and NumPy that are used in general purpose computing. And, there are
specific libraries like: EarthPy for earth science, AstroPy for Astronomy and so [Link], the language is heavily used in
machine learning, data mining and deep learning.

Creating software Prototypes: Python is slow compared to compiled languages like C++ and Java. It might not be a
good choice if resources are limited and efficiency is a [Link], Python is a great language for creating
prototypes. For example: You can use Pygame (library for creating games) to create your game's prototype first. If
you like the prototype, you can use language like C++ to create the actual game.

Good Language to Teach Programming: Python is used by many companies to teach programming to kids and
[Link] is a good language with a lot of features and capabilities. Yet, it's one of the easiest languages to learn
because of its simple easy-to-use syntax.

Getting Python
The most up-to-date and current source code, binaries, documentation, news, etc., is available on the official
website of Python [Link] You can download Python documentation from
[Link] The documentation is available in HTML, PDF, and PostScript formats.

Setting up PATH
Programs and other executable files can be in many directories, so operating systems provide a search path that lists
the directories that the OS searches for [Link] path is stored in an environment variable, which is a named
string maintained by the operating system. This variable contains information available to the command shell and
other programs.

The path variable is named as PATH in Unix or Path in Windows (Unix is case sensitive; Windows is not).In Mac OS,
the installer handles the path details. To invoke the Python interpreter from any particular directory, you must add
the Python directory to your path.

Python Environment Variables


Here are important environment variables, which can be recognized by Python –

Page 3 of 63
[Link]. Variable & Description

1 PYTHONPATH
It has a role similar to PATH. This variable tells the Python interpreter where to locate the module files imported
into a program. It should include the Python source library directory and the directories containing Python
source code. PYTHONPATH is sometimes preset by the Python installer.

2 PYTHONSTARTUP
It contains the path of an initialization file containing Python source code. It is executed every time you start the
interpreter. It is named as .[Link] in Unix and it contains commands that load utilities or modify
PYTHONPATH.

3 PYTHONCASEOK
It is used in Windows to instruct Python to find the first case-insensitive match in an import statement. Set this
variable to any value to activate it.

4 PYTHONHOME
It is an alternative module search path. It is usually embedded in the PYTHONSTARTUP or PYTHONPATH
directories to make switching module libraries easy.

Integrated Development Environment


You can run Python from a Graphical User Interface (GUI) environment as well, if you have a GUI application
on your system that supports Python.
 Unix − IDLE is the very first Unix IDE for Python.
 Windows − PythonWin is the first Windows interface for Python and is an IDE with a GUI.
 Macintosh − The Macintosh version of Python along with the IDLE IDE is available from the main
website, downloadable as either MacBinary or BinHex'd files.
If you are not able to set up the environment properly, then you can take help from your system admin.
Make sure the Python environment is properly set up and working perfectly fine. Note − All the examples
given in subsequent chapters are executed with Python 2.4.3 version available on CentOS flavor of [Link]
already have set up Python Programming environment online, so that you can execute all the available
examples online at the same time when you are learning theory. Feel free to modify any example and
execute it online.

Starting The Interpreter


After installation, the python interpreter lives in the installed directory. By default it is
/usr/local/bin/pythonX.X in Linux/Unix and C:\PythonXX in Windows, where the 'X' denotes the version
number. To invoke it from the shell or the command prompt we need to add this location in the search
[Link] path is a list of directories (locations) where the operating system searches for executables. For
example, in Windows command prompt, we can type set path=%path%;c:\python33 (python33 means
version 3.3, it might be different in your case) to add the location to path for that particular session. In Mac
OS we need not worry about this as the installer takes care about the search path.

Now there are various ways to start Python.

1. Immediate mode: Typing python in the command line will invoke the interpreter in immediate mode.
We can directly type in Python expressions and press enter to get the output. >>>is the Python prompt.
It tells us that the interpreter is ready for our input. Try typing in 1 + 1 and press enter. We get 2 as the
output. This prompt can be used as a calculator. To exit this mode type exit() or quit() and press enter.
2. Script mode: This mode is used to execute Python program written in a file. Such a file is called a script.
Scripts can be saved to disk for future use. Python scripts have the extension .py, meaning that the
filename ends with .[Link] execute this file in script mode we simply write python [Link] at the
command [Link] can use any text editing software to write a Python script file.
Page 4 of 63
We just need to save it with the .py extension. But using an IDE can make our life a lot easier. IDE is a piece
of software that provides useful features like code hinting, syntax highlighting and checking, file explorers
etc. to the programmer for application development. Using an IDE can get rid of redundant tasks and
significantly decrease the time required for application development. IDLE is a graphical user interface (GUI)
that can be installed along with the Python programming language and is available from the official website.

Python Keywords
Keywords are the reserved words in Python. We cannot use a keyword as variable name, function name or
any other identifier. They are used to define the syntax and structure of the Python language. In Python,
keywords are case sensitive. There are 33 keywords in Python 3.3. This number can vary slightly in course of
time. All the keywords except True, False and None are in lowercase and they must be written as it is. The
list of all the keywords are given below.
Keywords in Python programming language

False class finally is return

None continue for lambda try

True def from nonlocal while

and del global not with

as elif if or yield

assert else import pass

break except in raise

Looking at all the keywords at once and trying to figure out what they mean might be overwhelming. If you
want to have an overview, here is the complete list of all the keywords with examples.

Python Identifiers
Identifier is the name given to entities like class, functions, variables etc. in Python. It helps differentiating
one entity from another.

Rules for writing identifiers


1. Identifiers can be a combination of letters in lowercase (a to z) or uppercase (A to Z) or digits (0 to 9)
or an underscore (_). Names like myClass, var_1 and print_this_to_screen, all are valid example.
2. An identifier cannot start with a digit. 1variable is invalid, but variable1 is perfectly fine.
3. Keywords cannot be used as identifiers.
4. We cannot use special symbols like !, @, #, $, % etc. in our identifier.
5. Identifier can be of any length.

Things to care about


Python is a case-sensitive language. This means, Variable and variable are not the same. Always name
identifiers that make sense. While, c = 10 is valid. Writing count = 10 would make more sense and it would
be easier to figure out what it does even when you look at your code after a long gap. Multiple words can be
separated using an underscore, this_is_a_long_variable. We can also use camel-case style of writing, i.e.,
capitalize every first letter of the word except the initial word without any spaces. For example:
camelCaseExample

Python Indentation
Most of the programming languages like C, C++, Java use braces { } to define a block of code. Python uses
indentation. A code block (body of a function, loop etc.) starts with indentation and ends with the first
Page 5 of 63
unindebted line. The amount of indentation is up to you, but it must be consistent throughout that block.
Generally four whitespaces are used for indentation and is preferred over tabs. Here is an example.

for i in range(1,11):
print(i)
if i == 5:
break

The enforcement of indentation in Python makes the code look neat and clean. This results into Python
programs that look similar and consistent. Indentation can be ignored in line continuation. But it's a good
idea to always indent. It makes the code more readable.

Python Comments
Comments are very important while writing a program. It describes what's going on inside a program so that
a person looking at the source code does not have a hard time figuring it out. You might forget the key
details of the program you just wrote in a month's time. So taking time to explain these concepts in form of
comments is always fruitful. In Python, we use the hash (#) symbol to start writing a comment. It extends up
to the newline character. Comments are for programmers for better understanding of a program. Python
Interpreter ignores comment.

Multi-line comments
If we have comments that extend multiple lines, one way of doing it is to use hash (#) in the beginning of
each line. Another way of doing this is to use triple quotes, either ''' or """. These triple quotes are generally
used for multi-line strings. But they can be used as multi-line comment as well. Unless they are not
docstrings, they do not generate any extra code.

Docstring in Python
Docstring is short for documentation string. It is a string that occurs as the first statement in a module,
function, class, or method definition. We must write what a function/class does in the docstring. Triple
quotes are used while writing docstrings. For example:
def double(num):
"""Function to double the value"""
return 2*num

Docstring is available to us as the attribute __doc__ of the function. Issue the following code in shell once
you run the above program.
print(double.__doc__)
Variable
In most of the programming languages a variable is a named location used to store data in the memory. Each
variable must have a unique name called identifier. It is helpful to think of variables as container that hold
data which can be changed later throughout programming. None technically, you can suppose variable as a
bag to store books in it and those books can be replaced at any [Link]: In Python we don't assign values
to the variables, whereas Python gives the reference of the object (value) to the variable.

Constants
A constant is a type of variable whose value cannot be changed. It is helpful to think of constants as
containers that hold information which cannot be changed [Link] technically, you can think of constant
as a bag to store some books and those books cannot be replaced once placed inside the bag.

Literals
Literal is a raw data given in a variable or constant. In Python, there are various types of literals they are as
follows:

Numeric Literals
Numeric Literals are immutable (unchangeable). Numeric literals can belong to 3 different numerical types
Integer, Float and Complex.

Page 6 of 63
String literals
A string literal is a sequence of characters surrounded by quotes. We can use both single, double or triple
quotes for a string. And, a character literal is a single character surrounded by single or double quotes.

Boolean literals
A Boolean literal can have any of the two values: True or False.

Special literals
Python contains one special literal i.e. None. We use it to specify to that field that is not created.

Data types in Python


Every value in Python has a datatype. Since everything is an object in Python programming, data types are
actually classes and variables are instance (object) of these classes. There are various data types in Python.
Some of the important types are listed below.

Python List
List is an ordered sequence of items. It is one of the most used datatype in Python and is very flexible. All the
items in a list do not need to be of the same type. Declaring a list is pretty straight forward. Items separated
by commas are enclosed within brackets [ ].
a = [1, 2.2, 'python']

>>> a = [1,2,3]
>>> a[2]=4
>>> a
[1, 2, 4]

Python Tuple
Tuple is an ordered sequence of items same as list. The only difference is that tuples are immutable. Tuples
once created cannot be modified. Tuples are used to write-protect data and are usually faster than list as it
cannot change dynamically. It is defined within parentheses () where items are separated by commas.
t = (5,'program', 1+3j)

Python Strings
String is sequence of Unicode characters. We can use single quotes or double quotes to represent strings.
Multi-line strings can be denoted using triple quotes, ''' or """. Like list and tuple, slicing operator [ ] can be
used with string. Strings are immutable.

Python Set
Set is an unordered collection of unique items. Set is defined by values separated by comma inside braces { }.
Items in a set are not ordered. We can perform set operations like union, intersection on two sets. Set have
unique values. They eliminate duplicates.
>>> a = {1,2,2,3,3,3}
>>> a
{1, 2, 3}

Python Dictionary
Dictionary is an unordered collection of key-value pairs. It is generally used when we have a huge amount of
data. Dictionaries are optimized for retrieving data. We must know the key to retrieve the value. In Python,
dictionaries are defined within braces {} with each item being a pair in the form key:value. Key and value can
be of any type.
>>> d = {1:'value','key':2}
>>> type(d)
<class 'dict'>
We use key to retrieve the respective value. But not the other way around.

Page 7 of 63
Conversion between data types
We can convert between different data types by using different type conversion functions like int(), float(),
str() etc.

Python Output Using print() function


We use the print() function to output data to the standard output device (screen). We can also output data
to a file, but this will be discussed later.

Output formatting
Sometimes we would like to format our output to make it look attractive. This can be done by using the
[Link]() method. This method is visible to any string object.

Python Input
Up till now, our programs were static. The value of variables were defined or hard coded into the source
code. To allow flexibility we might want to take the input from the user. In Python, we have the input()
function to allow this. The syntax for input() is
input([prompt])

Python Import
When our program grows bigger, it is a good idea to break it into different modules.
A module is a file containing Python definitions and statements. Python modules have a filename and end
with the extension .py. Definitions inside a module can be imported to another module or the interactive
interpreter in Python. We use the import keyword to do this. For example, we can import the math module
by typing in import math.
import math
print([Link])
Now all the definitions inside math module are available in our scope. We can also import some specific
attributes and functions only, using the from keyword.

What are operators in python?


Operators are special symbols in Python that carry out arithmetic or logical computation. The value that the
operator operates on is called the operand.

Arithmetic operators
Arithmetic operators are used to perform mathematical operations like addition, subtraction, multiplication
etc.

Operator Meaning Example

+ Add two operands or unary plus x+y

- Subtract right operand from the left or unary minus x-y

* Multiply two operands x*y

/ Divide left operand by the right one (always results into float) x/y

x % y (remainder of
% Modulus - remainder of the division of left operand by the right
x/y)

Floor division - division that results into whole number adjusted to the
// x // y
left in the number line

** Exponent - left operand raised to the power of right x**y (x to the power y)

Page 8 of 63
Comparison operators
Comparison operators are used to compare values. It either returns True or False according to the condition.

Operator Meaning Example

> Greater that - True if left operand is greater than the right x>y

< Less that - True if left operand is less than the right x<y

== Equal to - True if both operands are equal x == y

!= Not equal to - True if operands are not equal x != y

>= Greater than or equal to - True if left operand is greater than or equal to the right x >= y

<= Less than or equal to - True if left operand is less than or equal to the right x <= y

Logical operators
Logical operators are the and, or, not operators.

Operator Meaning Example

and True if both the operands are true x and y

or True if either of the operands is true x or y

not True if operand is false (complements the operand) not x

Bitwise operators
Bitwise operators act on operands as if they were string of binary digits. It operates bit by bit, hence the
name. For example, 2 is 10 in binary and 7 is 111. In the table below: Let x = 10 (0000 1010 in binary) and y =
4 (0000 0100 in binary)

Operator Meaning Example

& Bitwise AND x& y = 0 (0000 0000)

| Bitwise OR x | y = 14 (0000 1110)

~ Bitwise NOT ~x = -11 (1111 0101)

^ Bitwise XOR x ^ y = 14 (0000 1110)

>> Bitwise right shift x>> 2 = 2 (0000 0010)

<< Bitwise left shift x<< 2 = 40 (0010 1000)

Assignment operators
Assignment operators are used in Python to assign values to variables. a = 5 is a simple assignment operator
that assigns the value 5 on the right to the variable a on the left. There are various compound operators in
Python like a += 5 that adds to the variable and later assigns the same. It is equivalent to a = a + 5.

Page 9 of 63
Operator Example Equivatent to

= x=5 x=5

+= x += 5 x=x+5

-= x -= 5 x=x-5

*= x *= 5 x=x*5

/= x /= 5 x=x/5

%= x %= 5 x=x%5

//= x //= 5 x = x // 5

**= x **= 5 x = x ** 5

&= x &= 5 x=x&5

|= x |= 5 x=x|5

^= x ^= 5 x=x^5

>>= x >>= 5 x = x >> 5

<<= x <<= 5 x = x << 5


Special operators
Python language offers some special type of operators like the identity operator or the membership
operator. They are described below with examples.

Identity operators
is and is not are the identity operators in Python. They are used to check if two values (or variables) are
located on the same part of the memory. Two variables that are equal does not imply that they are identical.

Operator Meaning Example

is True if the operands are identical (refer to the same object) x is True

is not True if the operands are not identical (do not refer to the same object) x is not True

Membership operators
in and not in are the membership operators in Python. They are used to test whether a value or variable is
found in a sequence (string, list, tuple, set and dictionary). In a dictionary we can only test for presence of
key, not the value.

Operator Meaning Example

In True if value/variable is found in the sequence 5 in x

not in True if value/variable is not found in the sequence 5 not in x

Page 10 of 63
What is a Namespace in Python?
So now that we understand what names are, we can move on to the concept of namespaces. To simply put
it, namespace is a collection of names. In Python, you can imagine a namespace as a mapping of every name,
you have defined, to corresponding objects. Different namespaces can co-exist at a given time but are
completely isolated. A namespace containing all the built-in names is created when we start the Python
interpreter and exists as long we don't exit. This is the reason that built-in functions like id(), print() etc. are
always available to us from any part of the program. Each module creates its own global namespace.

These different namespaces are isolated. Hence, the same name that may exist in different modules do not
collide. Modules can have various functions and classes. A local namespace is created when a function is
called, which has all the names defined in it. Similar, is the case with class. Following diagram may help to
clarify this concept.

Python Variable Scope


Although there are various unique namespaces defined, we may not be able to access all of them from every
part of the program. The concept of scope comes into play. Scope is the portion of the program from where
a namespace can be accessed directly without any prefix. At any given moment, there are at least three
nested scopes.

1. Scope of the current function which has local names


2. Scope of the module which has global names
3. Outermost scope which has built-in names
When a reference is made inside a function, the name is searched in the local namespace, then in the global
namespace and finally in the built-in namespace. If there is a function inside another function, a new scope is
nested inside the local scope.

Here, the variable a is in the global namespace. Variable b is in the local namespace of outer_function() and c
is in the nested local namespace of inner_function(). When we are in inner_function(), c is local to us, b is
nonlocal and a is global. We can read as well as assign new values to c but can only read b and a from
inner_function(). If we try to assign as a value to b, a new variable b is created in the local namespace which
is different than the nonlocal b. Same thing happens when we assign a value to a. However, if we declare a
as global, all the reference and assignment go to the global a. Similarly, if we want to rebind the variable b, it
must be declared as nonlocal.

What are if...else statement in Python?


Decision making is required when we want to execute a code only if a certain condition is satisfied. The
if…elif…else statement is used in Python for decision making.
if test expression:
statement(s)
Here, the program evaluates the test expression and will execute statement(s) only if the text expression is
True. If the text expression is False, the statement(s) is not executed. In Python, the body of theif statement
is indicated by the indentation. Body starts with an indentation and the first unindebted line marks the end.
Python interprets non-zero values as True. None and 0 are interpreted as False.

Python if...else Statement


if test expression:
Body of if
else:
Body of else

Python if...elif...else Statement


if test expression:
Body of if
elif test expression:
Body of elif
else:
Page 11 of 63
Body of else
The elif is short for else if. It allows us to check for multiple expressions. If the condition for if is False, it
checks the condition of the next elif block and so on. If all the conditions are False, body of else is executed.
Only one block among the several if...elif...else blocks is executed according to the condition. The if block can
have only one else block. But it can have multiple elif blocks.

What is for loop in Python?


The for loop in Python is used to iterate over a sequence (list, tuple, string) or other iterable objects.
Iterating over a sequence is called traversal. Syntax of for Loop

for val in sequence:


Body of for

Here, val is the variable that takes the value of the item inside the sequence on each iteration. Loop
continues until we reach the last item in the sequence. The body of for loop is separated from the rest of the
code using indentation.

The range() function


We can generate a sequence of numbers using range() function. range(10) will generate numbers from 0 to 9
(10 numbers). We can also define the start, stop and step size as range(start,stop,step size). step size
defaults to 1 if not provided. This function does not store all the values in memory, it would be inefficient. So
it remembers the start, stop, step size and generates the next number on the go. To force this function to
output all the items, we can use the function list().We can use the range() function in for loops to iterate
through a sequence of numbers. It can be combined with the len() function to iterate though a sequence
using indexing.

for loop with else


A for loop can have an optional else block as well. The else part is executed if the items in the sequence used
in for loop exhausts. break statement can be used to stop a for loop. In such case, the else part is ignored.
Hence, a for loop's else part runs if no break occurs. Here is an example to illustrate this.

What is while loop in Python?


The while loop in Python is used to iterate over a block of code as long as the test expression (condition) is
true. We generally use this loop when we don't know beforehand, the number of times to iterate. Syntax of
while Loop in Python
while test_expression:
Body of while
In while loop, test expression is checked first. The body of the loop is entered only if the test_expression
evaluates to True. After one iteration, the test expression is checked again. This process continues until the
test_expression evaluates to False. In Python, the body of the while loop is determined through indentation.
Body starts with indentation and the first unindented line marks the end. Python interprets any non-zero
value as True. None and 0 are interpreted as False.

What is the use of break and continue in Python?


In Python, break and continue statements can alter the flow of a normal loop. Loops iterate over a block of
code until test expression is false, but sometimes we wish to terminate the current iteration or even the
whole loop without checking test expression. The break and continue statements are used in these cases.

Python break statement


The break statement terminates the loop containing it. Control of the program flows to the statement
immediately after the body of the loop. If break statement is inside a nested loop (loop inside another loop),
break will terminate the innermost loop. Syntax of break
break

Python continue statement

Page 12 of 63
The continue statement is used to skip the rest of the code inside a loop for the current iteration only. Loop
does not terminate but continues on with the next iteration. Syntax of Continue
continue

What is pass statement in Python?


In Python programming, pass is a null statement. The difference between a comment and pass statement in
Python is that, while the interpreter ignores a comment entirely, pass is not ignored. However, nothing
happens when pass is executed. It results into no operation (NOP). Syntax of pass
pass
We generally use it as a placeholder. Suppose we have a loop or a function that is not implemented yet, but
we want to implement it in the future. They cannot have an empty body. The interpreter would complain.
So, we use the pass statement to construct a body that does nothing.

What is a function in Python?


In Python, function is a group of related statements that perform a specific task. Functions help break our
program into smaller and modular chunks. As our program grows larger and larger, functions make it more
organized and manageable. Furthermore, it avoids repetition and makes code reusable. Syntax of Function:

def function_name(parameters):
"""docstring"""
statement(s)
Above shown is a function definition which consists of following components.

1. Keyword def marks the start of function header.


2. A function name to uniquely identify it. Function naming follows the same rules of writing identifiers
in Python.
3. Parameters (arguments) through which we pass values to a function. They are optional.
4. A colon (:) to mark the end of function header.
5. Optional documentation string (docstring) to describe what the function does.
6. One or more valid python statements that make up the function body. Statements must have same
indentation level (usually 4 spaces).
7. An optional return statement to return a value from the function.

For example:

def greet(name):
"""This function greets to
the person passed in as
parameter"""
print("Hello, " + name + ". Good morning!")

How to call a function in python?


Once we have defined a function, we can call it from another function, program or even the Python prompt.
To call a function we simply type the function name with appropriate parameters.
>>> greet('Paul')
Hello, Paul. Good morning!
Docstring
The first string after the function header is called the docstring and is short for documentation string. It is
used to explain in brief, what a function does. Although optional, documentation is a good programming
practice. Unless you can remember what you had for dinner last week, always document your code. In the
above example, we have a docstring immediately below the function header. We generally use triple quotes
so that docstring can extend up to multiple lines. This string is available to us as __doc__ attribute of the
function.

Page 13 of 63
The return statement
The return statement is used to exit a function and go back to the place from where it was called. Syntax of
return:
return [expression_list]
This statement can contain expression which gets evaluated and the value is returned. If there is no
expression in the statement or the return statement itself is not present inside a function, then the function
will return the None object.

Scope and Lifetime of variables


Scope of a variable is the portion of a program where the variable is recognized. Parameters and variables
defined inside a function is not visible from outside. Hence, they have a local scope. Lifetime of a variable is
the period throughout which the variable exits in the memory. The lifetime of variables inside a function is as
long as the function executes. They are destroyed once we return from the function. Hence, a function does
not remember the value of a variable from its previous calls. Here is an example to illustrate the scope of a
variable inside a function.
def my_func():
x = 10
print("Value inside function:",x)

x = 20
my_func()
print("Value outside function:",x)

Output:
Value inside function: 10
Value outside function: 20
Here, we can see that the value of x is 20 initially. Even though the function my_func() changed the value of
x to 10, it did not effect the value outside the function. This is because the variable x inside the function is
different (local to the function) from the one outside. Although they have same names, they are two
different variables with different scope. On the other hand, variables outside of the function are visible from
inside. They have a global scope. We can read these values from inside the function but cannot change
(write) them. In order to modify the value of variables outside the function, they must be declared as global
variables using the keyword global.

Types of Functions
Basically, we can divide functions into the following two types:
 Built-in functions - Functions that are built into Python.
 User-defined functions - Functions defined by the users themselves.

What is recursion in Python?


Recursion is the process of defining something in terms of itself. A physical world example would be to place
two parallel mirrors facing each other. Any object in between them would be reflected recursively.

Python Recursive Function


We know that in Python, a function can call other functions. It is even possible for the function to call itself.
These type of construct are termed as recursive functions. Following is an example of recursive function to
find the factorial of an integer. Factorial of a number is the product of all the integers from 1 to that number.
For example, the factorial of 6 (denoted as 6!) is 1*2*3*4*5*6 = 720.

Example of recursive function


# An example of a recursive function to
# find the factorial of a number
def calc_factorial(x):
"""This is a recursive function
to find the factorial of an integer"""
if x == 1:
return 1
Page 14 of 63
else:
return (x * calc_factorial(x-1))
num = 4
print("The factorial of", num, "is", calc_factorial(num))

In the above example, calc_factorial() is a recursive functions as it calls itself. When we call this function with
a positive integer, it will recursively call itself by decreasing the number. Each function call multiples the
number with the factorial of number 1 until the number is equal to one. This recursive call can be explained
in the following steps.

Advantages of Recursion
1. Recursive functions make the code look clean and elegant.
2. A complex task can be broken down into simpler sub-problems using recursion.
3. Sequence generation is easier with recursion than using some nested iteration.
Disadvantages of Recursion
1. Sometimes the logic behind recursion is hard to follow through.
2. Recursive calls are expensive (inefficient) as they take up a lot of memory and time.
3. Recursive functions are hard to debug.

What are lambda functions in Python?


In Python, anonymous function is a function that is defined without a name. While normal functions are
defined using the def keyword, in Python anonymous functions are defined using the lambda keyword.
Hence, anonymous functions are also called lambda functions.A lambda function in python has the following
syntax:
lambda arguments: expression
Lambda functions can have any number of arguments but only one expression. The expression is evaluated
and returned. Lambda functions can be used wherever function objects are required.

Use of Lambda Function in python


We use lambda functions when we require a nameless function for a short period of time. In Python, we
generally use it as an argument to a higher-order function (a function that takes in other functions as
arguments). Lambda functions are used along with built-in functions like filter(), map() etc.

What are modules in Python?


Modules refer to a file containing Python statements and definitions. A file containing Python code, for e.g.:
[Link], is called a module and its module name would be example. We use modules to break down
large programs into small manageable and organized files. Furthermore, modules provide reusability of
code. We can define our most used functions in a module and import it, instead of copying their definitions
into different programs.

How to import modules in Python?


We can import the definitions inside a module to another module or the interactive interpreter in Python.
We use the import keyword to do this. To import our previously defined module example we type the
following in the Python prompt.
>>> import example

Import with renaming


We can import a module by renaming it as follows.
# import module by renaming it
import math as m
print("The value of pi is", [Link])

Python from...import statement


We can import specific names from a module without importing the module as a whole. Here is an example.
# import only pi from math module
from math import pi
Page 15 of 63
print("The value of pi is", pi)

Python Module Search Path


While importing a module, Python looks at several places. Interpreter first looks for a built-in module then (if
not found) into a list of directories defined in [Link]. The search is in this order. The current directory.
PYTHONPATH (an environment variable with a list of directory). The installation-dependent default directory

What are packages?


We don't usually store all of our files in our computer in the same location. We use a well-organized
hierarchy of directories for easier access. Similar files are kept in the same directory, for example, we may
keep all the songs in the "music" directory. Analogous to this, Python has packages for directories and
modules for files.

As our application program grows larger in size with a lot of


modules, we place similar modules in one package and
different modules in different packages. This makes a project
(program) easy to manage and conceptually clear. Similar, as
a directory can contain sub-directories and files, a Python
package can have sub-packages and modules.

A directory must contain a file named __init__.py in order for


Python to consider it as a package. This file can be left empty
but we generally place the initialization code for that package
in this file. Here is an example. Suppose we are developing a
game, one possible organization of packages and modules
could be as shown in the figure.

What is a file?
File is a named location on disk to store related information. It is used to permanently store data in a non-
volatile memory (e.g. hard disk). Since, random access memory (RAM) is volatile which loses its data when
computer is turned off, we use files for future use of the data. When we want to read from or write to a file
we need to open it first. When we are done, it needs to be closed, so that resources that are tied with the
file are freed. Hence, in Python, a file operation takes place in the following order.
1. Open a file
2. Read or write (perform operation)
3. Close the file

How to open a file?


Python has a built-in function open() to open a file. This function returns a file object, also called a handle, as
it is used to read or modify the file accordingly.
>>> f = open("[Link]") # open file in current directory
>>> f = open("C:/Python33/[Link]") # specifying full path
We can specify the mode while opening a file. In mode, we specify whether we want to read 'r', write 'w' or
append 'a' to the file. We also specify if we want to open the file in text mode or binary mode. The default is
reading in text mode. In this mode, we get strings when reading from the file. On the other hand, binary
mode returns bytes and this is the mode to be used when dealing with non-text files like image or exe files.

Python File Modes

Mode Description

'r' Open a file for reading. (default)

'w' Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists.

'x' Open a file for exclusive creation. If the file already exists, the operation fails.

Page 16 of 63
'a' Open for appending at the end of the file without truncating it. Creates a new file if it does not exist.

't' Open in text mode. (default)

'b' Open in binary mode.

'+' Open a file for updating (reading and writing)

f = open("[Link]") # equivalent to 'r' or 'rt'


f = open("[Link]",'w') # write in text mode
f = open("[Link]",'r+b') # read and write in binary mode

How to close a file Using Python?


When we are done with operations to the file, we need to properly close the file. Closing a file will free up
the resources that were tied with the file and is done using Python close() method. Python has a garbage
collector to clean up unreferenced objects but, we must not rely on it to close the file.

f = open("[Link]",encoding = 'utf-8')
# perform file operations
[Link]()

How to write to File Using Python?


In order to write into a file in Python, we need to open it in write 'w', append 'a' or exclusive creation 'x'
mode. We need to be careful with the 'w' mode as it will overwrite into the file if it already exists. All
previous data are erased. Writing a string or sequence of bytes (for binary files) is done using write()
method. This method returns the number of characters written to the file.

with open("[Link]",'w',encoding = 'utf-8') as f:


[Link]("my first file\n")
[Link]("This file\n\n")
[Link]("contains three lines\n")

How to read files in Python?


To read a file in Python, we must open the file in reading mode. There are various methods available for this
purpose. We can use the read(size) method to read in size number of data. If size parameter is not specified,
it reads and returns up to the end of the file.

>>> f = open("[Link]",'r',encoding = 'utf-8')


>>> [Link](4) # read the first 4 data
'This'
>>> [Link](4) # read the next 4 data
' is '
>>> [Link]() # read in the rest till end of file
'my first file\nThis file\ncontains three lines\n'
>>> [Link]() # further reading returns empty sting
''
Python File Methods
There are various methods available with the file object. Some of them have been used in above examples.
Here is the complete list of methods in text mode with a brief description.
Python File Methods

Method Description

close() Close an open file. It has no effect if the file is already closed.

Page 17 of 63
detach() Separate the underlying binary buffer from the TextIOBaseand return it.

fileno() Return an integer number (file descriptor) of the file.

flush() Flush the write buffer of the file stream.

isatty() Return True if the file stream is interactive.

Read atmost n characters form the file. Reads till end of file if it is
read(n)
negative or None.

readable() Returns True if the file stream can be read from.

Read and return one line from the file. Reads in at most nbytes if
readline(n=-1)
specified.

Read and return a list of lines from the file. Reads in at


readlines(n=-1)
most n bytes/characters if specified.

Change the file position to offset bytes, in reference to from (start,


seek(offset,from=SEEK_SET)
current, end).

seekable() Returns True if the file stream supports random access.

tell() Returns the current file location.

Resize the file stream to size bytes. If size is not specified, resize to
truncate(size=None)
current location.

writable() Returns True if the file stream can be written to.

write(s) Write string s to the file and return the number of characters written.

writelines(lines) Write a list of lines to the file.

What is Directory in Python?


If there are a large number of files to handle in your Python program, you can arrange your code within
different directories to make things more manageable. A directory or folder is a collection of files and sub
directories. Python has the os module, which provides us with many useful methods to work with directories
(and files as well).

Get Current Directory


We can get the present working directory using the getcwd() method. This method returns the current
working directory in the form of a string. We can also use the getcwdb() method to get it as bytes object.
>>> import os
>>> [Link]()
'C:\\Program Files\\PyScripter'
>>> [Link]()
b'C:\\Program Files\\PyScripter'

What are iterators in Python?


Iterators are everywhere in Python. They are elegantly implemented within for loops, comprehensions,
generators etc. but hidden in plain sight. Iterator in Python is simply an object that can be iterated upon. An
object which will return data, one element at a time. Technically speaking, Python iterator object must

Page 18 of 63
implement two special methods, __iter__() and __next__(), collectively called the iterator protocol. An
object is called iterable if we can get an iterator from it. Most of built-in containers in Python like: list, tuple,
string etc. are iterables. The iter() function (which in turn calls the __iter__() method) returns an iterator
from them.

Iterating Through an Iterator in Python


We use the next() function to manually iterate through all the items of an iterator. When we reach the end
and there is no more data to be returned, it will raise StopIteration. Following is an example.
# define a list
my_list = [4, 7, 0, 3]
# get an iterator using iter()
my_iter = iter(my_list)
## iterate through it using next()
#prints 4
print(next(my_iter))

#prints 7
print(next(my_iter))
## next(obj) is same as obj.__next__()
#prints 0
print(my_iter.__next__())
#prints 3
print(my_iter.__next__())
## This will raise error, no items left
next(my_iter)

What are generators in Python?


There is a lot of overhead in building an iterator in Python; we have to implement a class with __iter__() and
__next__() method, keep track of internal states, raise StopIteration when there was no values to be
returned etc. This is both lengthy and counter intuitive. Generator comes into rescue in such situations.
Python generators are a simple way of creating iterators. All the overhead we mentioned above are
automatically handled by generators in Python. Simply speaking, a generator is a function that returns an
object (iterator) which we can iterate over (one value at a time).

How to create a generator in Python?


It is fairly simple to create a generator in Python. It is as easy as defining a normal function with yield
statement instead of a return statement. If a function contains at least one yield statement (it may contain
other yield or return statements), it becomes a generator function. Both yield and return will return some
value from a function. The difference is that, while a return statement terminates a function entirely, yield
statement pauses the function saving all its states and later continues from there on successive calls.

Differences between Generator function and a Normal function


Here is how a generator function differs from a normal function.
 Generator function contains one or more yield statement.
 When called, it returns an object (iterator) but does not start execution immediately.
 Methods like __iter__() and __next__() are implemented automatically. So we can iterate through
the items using next().
 Once the function yields, the function is paused and the control is transferred to the caller.
 Local variables and their states are remembered between successive calls.
 Finally, when the function terminates, StopIteration is raised automatically on further calls.
Here is an example to illustrate all of the points stated above. We have a generator function named
my_gen() with several yield statements.
# A simple generator function
def my_gen():
n = 1
print('This is printed first')
# Generator function contains yield statements

Page 19 of 63
yield n
n += 1
print('This is printed second')
yield n
n += 1
print('This is printed at last')
yield n

Page 20 of 63
CS- 33Programing in Python

Unit-2
OOP Using Python
CS-33: Programming in Python

Python Errors and Built-in Exceptions


When writing a program, we, more often than not, will encounter errors. Error caused by not following the
proper structure (syntax) of the language is called syntax error or parsing error.
>>> if a < 3
File "<interactive input>", line 1
if a < 3
^
SyntaxError: invalid syntax
We can notice here that a colon is missing in the if statement. Errors can also occur at runtime and these are
called exceptions. They occur, for example, when a file we try to open does not exist (FileNotFoundError),
dividing a number by zero (ZeroDivisionError), module we try to import is not found (ImportError) etc.
Whenever these type of runtime error occur, Python creates an exception object. If not handled properly, it
prints a traceback to that error along with some details about why that error occurred.

Illegal operations can raise exceptions. There are plenty of built-in exceptions in Python that are raised when
corresponding errors occur. This will return us a dictionary of built-in exceptions, functions and attributes.
Some of the common built-in exceptions in Python programming along with the error that cause then are
tabulated below.

Python Built-in Exceptions

Exception Cause of Error

AssertionError Raised when assert statement fails.

AttributeError Raised when attribute assignment or reference fails.

EOFError Raised when the input() functions hits end-of-file condition.

FloatingPointError Raised when a floating point operation fails.

GeneratorExit Raise when a generator's close() method is called.

ImportError Raised when the imported module is not found.

IndexError Raised when index of a sequence is out of range.

KeyError Raised when a key is not found in a dictionary.

KeyboardInterrupt Raised when the user hits interrupt key (Ctrl+c or delete).

MemoryError Raised when an operation runs out of memory.

NameError Raised when a variable is not found in local or global scope.

NotImplementedError Raised by abstract methods.

OSError Raised when system operation causes system related error.

Raised when result of an arithmetic operation is too large to be


OverflowError
represented.

ReferenceError Raised when a weak reference proxy is used to access a garbage


Page 1 of 63
collected referent.

RuntimeError Raised when an error does not fall under any other category.

Raised by next() function to indicate that there is no further item to be


StopIteration
returned by iterator.

SyntaxError Raised by parser when syntax error is encountered.

IndentationError Raised when there is incorrect indentation.

TabError Raised when indentation consists of inconsistent tabs and spaces.

SystemError Raised when interpreter detects internal error.

SystemExit Raised by [Link]() function.

Raised when a function or operation is applied to an object of incorrect


TypeError
type.

Raised when a reference is made to a local variable in a function or


UnboundLocalError
method, but no value has been bound to that variable.

UnicodeError Raised when a Unicode-related encoding or decoding error occurs.

UnicodeEncodeError Raised when a Unicode-related error occurs during encoding.

UnicodeDecodeError Raised when a Unicode-related error occurs during decoding.

UnicodeTranslateError Raised when a Unicode-related error occurs during translating.

ValueError Raised when a function gets argument of correct type but improper value.

ZeroDivisionError Raised when second operand of division or modulo operation is zero.

We can also define our own exception in Python (if required). Visit this page to learn more about user-
defined exceptions. We can handle these built-in and user-defined exceptions in Python using try, except
and finally statements.

What are exceptions in Python?


Python has many built-in exceptions which forces your program to output an error when something in it
goes wrong. When these exceptions occur, it causes the current process to stop and passes it to the calling
process until it is handled. If not handled, our program will crash. For example, if function A calls function B
which in turn calls function C and an exception occurs in function C. If it is not handled in C, the exception
passes to B and then to A. If never handled, an error message is spit out and our program come to a sudden,
unexpected halt.

Catching Exceptions in Python


In Python, exceptions can be handled using a try statement. A critical operation which can raise exception is
placed inside the try clause and the code that handles exception is written in except clause. It is up to us,
what operations we perform once we have caught the exception. Here is a simple example.

Page 2 of 63
CS-33: Programming in Python
# import module sys to get the type of exception
import sys

randomList = ['a', 0, 2]

for entry in randomList:


try:
print("The entry is", entry)
r = 1/int(entry)
break
except:
print("Oops!",sys.exc_info()[0],"occured.")
print("Next entry.")
print()
print("The reciprocal of",entry,"is",r)

In this program, we loop until the user enters an integer that has a valid reciprocal. The portion that can
cause exception is placed inside try [Link] no exception occurs, except block is skipped and normal flow
continues. But if any exception occurs, it is caught by the except [Link], we print the name of the
exception using ex_info() function inside sys module and ask the user to try again. We can see that the
values 'a' and '1.3' causes ValueError and '0' causes ZeroDivisionError.

Catching Specific Exceptions in Python


In the above example, we did not mention any exception in the except [Link] is not a good
programming practice as it will catch all exceptions and handle every case in the same way. We can specify
which exceptions an except clause will catch.A try clause can have any number of except clause to handle
them differently but only one will be executed in case an exception [Link] can use a tuple of values to
specify multiple exceptions in an except clause. Here is an example pseudo code.
try:
# do something
pass
except ValueError:
# handle ValueError exception
pass
except (TypeError, ZeroDivisionError):
# handle multiple exceptions
# TypeError and ZeroDivisionError
pass
except:
# handle all other exceptions
pass
Raising Exceptions
In Python programming, exceptions are raised when corresponding errors occur at run time, but we can
forcefully raise it using the keyword raise. We can also optionally pass in value to the exception to clarify
why that exception was raised.

try...finally
The try statement in Python can have an optional finally clause. This clause is executed no matter what, and
is generally used to release external resources. For example, we may be connected to a remote data center
through the network or working with a file or working with a Graphical User Interface (GUI). In all these
circumstances, we must clean up the resource once used, whether it was successful or not. These actions
(closing a file, GUI or disconnecting from network) are performed in the finally clause to guarantee
execution. Here is an example of file operations to illustrate this.
try:
f = open("[Link]",encoding = 'utf-8')
# perform file operations
finally:

Page 3 of 63
[Link]()
This type of construct makes sure the file is closed even if an exception occurs.

Python Custom Exceptions


Python has many built-in exceptions which forces your program to output an error when something in it
goes wrong. However, sometimes you may need to create custom exceptions that serves your purpose.

In Python, users can define such exceptions by creating a new class. This exception class has to be derived,
either directly or indirectly, from Exception class. Most of the built-in exceptions are also derived form this
class. When we are developing a large Python program, it is a good practice to place all the user-defined
exceptions that our program raises in a separate file. Many standard modules do this. They define their
exceptions separately as [Link] or [Link] (generally but not always).

User-defined exception class can implement everything a normal class can do, but we generally make them
simple and concise. Most implementations declare a custom base class and derive others exception classes
from this base class. This concept is made clearer in the following example.

Example: User-Defined Exception in Python


In this example, we will illustrate how user-defined exceptions can be used in a program to raise and catch
[Link] program will ask the user to enter a number until they guess a stored number correctly. To help
them figure it out, hint is provided whether their guess is greater than or less than the stored number.
# define Python user-defined exceptions
class Error(Exception):
"""Base class for other exceptions"""
pass
class ValueTooSmallError(Error):
"""Raised when the input value is too small"""
pass
class ValueTooLargeError(Error):
"""Raised when the input value is too large"""
pass
# our main program
# user guesses a number until he/she gets it right
# you need to guess this number
number = 10
while True:
try:
i_num = int(input("Enter a number: "))
if i_num < number:
raise ValueTooSmallError
elif i_num > number:
raise ValueTooLargeError
break
except ValueTooSmallError:
print("This value is too small, try again!")
print()
except ValueTooLargeError:
print("This value is too large, try again!")
print()
print("Congratulations! You guessed it correctly.")
What is Assertion?
Assertions are statements that assert or state a fact confidently in your program. For example, while writing
a division function, you're confident the divisor shouldn't be zero, you assert divisor is not equal to zero.
Assertions are simply boolean expressions that checks if the conditions return true or not. If it is true, the
program does nothing and move to the next line of code. However, if it's false, the program stops and
throws an error.

Page 4 of 63
CS-33: Programming in Python
It is also a debugging tool as it brings the program on halt as soon as any error is occurred and shows on
which point of the program error has occurred. You can learn more about assertions in the article: The
benefits of programming with Assertions. We can be clear by looking at the flowchart below:

Python assert Statement


Python has built-in assert statement to use assertion condition in the program. assert statement has a
condition or expression which is supposed to be always true. If the condition is false assert halts the program
and gives an AssertionError. Syntax for using Assert in Pyhton:
assert <condition>
assert <condition>,<error message>
In Python we can use assert statement in two ways as mentioned above.
1. assert statement has a condition and if the condition is not satisfied the program will stop and give
AssertionError.
2. assert statement can also have a condition and a optional error message. If the condition is not satisfied
assert stops the program and gives AssertionError along with the error message.
Let's take an example, where we have a function which will calculate the average of the values passed by the
user and the value should not be an empty list. We will use assert statement to check the parameter and if
the length is of the passed list is zero, program halts.
Example 1: Using assert without Error Message
def avg(marks):
assert len(marks) != 0
return sum(marks)/len(marks)
mark1 = []
print("Average of mark1:",avg(mark1))
When we run the above program, the output will be:
AssertionError
We got an error as we passed an empty list mark1 to assert statement, the condition became false and
assert stops the program and give [Link] let's pass another list which will satisfy the assert
condition and see what will be our output.
def avg(marks):
assert len(marks) != 0,"List is empty."
return sum(marks)/len(marks)
mark2 = [55,88,78,90,79]
print("Average of mark2:",avg(mark2))
mark1 = []
print("Average of mark1:",avg(mark1))
We passed a non-empty list mark2 and also an empty list mark1 to the avg() function and we got output for
mark2 list but after that we got an error AssertionError: List is empty. The assert condition was satisfied by
the mark2 list and program to continue to run. However, mark1 doesn't satisfy the condition and gives an
AssertionError.
Page 5 of 63
Key Points to Remember
 Assertions are the condition or boolean expression which are always supposed to be true in the
code.
 assert statement takes an expression and optional message.
 assert statement is used to check types, values of argument and the output of the function.
 assert statement is used as debugging tool as it halts the program at the point where an error
occurs.

Introduction to OOPs in Python


Python is a multi-paradigm programming language. Meaning, it supports different programming approach.
One of the popular approach to solve a programming problem is by creating objects. This is known as
Object-Oriented Programming (OOP). An object has two characteristics:
 attributes
 behavior
Let's take an example:
Parrot is an object,
 name, age, color are attributes
 singing, dancing are behavior
The concept of OOP in Python focuses on creating reusable code. This concept is also known as DRY (Don't
Repeat Yourself). In Python, the concept of OOP follows some basic principles:
Inheritance A process of using details from a new class without modifying existing class.

Encapsulation Hiding the private details of a class from other objects.

Polymorphism A concept of using common operation in different ways for different data input.
Class
A class is a blueprint for the object. We can think of class as an sketch of a parrot with labels. It contains all
the details about the name, colors, size etc. Based on these descriptions, we can study about the parrot.
Here, parrot is an object. The example for class of parrot can be :
class Parrot:
pass
Here, we use class keyword to define an empty class Parrot. From class, we construct instances. An instance
is a specific object created from a particular class.

Object
An object (instance) is an instantiation of a class. When class is defined, only the description for the object is
defined. Therefore, no memory or storage is allocated. The example for object of parrot class can be:
obj = Parrot()
Here, obj is object of class Parrot. Suppose we have details of parrot. Now, we are going to show how to
build the class and objects of parrot. Example to Creating Class and Object in Python:
class Parrot:
# class attribute
species = "bird"
# instance attribute
def __init__(self, name, age):
[Link] = name
[Link] = age
# instantiate the Parrot class
blu = Parrot("Blu", 10)
woo = Parrot("Woo", 15)
# access the class attributes
print("Blu is a {}".format(blu.__class__.species))
print("Woo is also a {}".format(woo.__class__.species))

Page 6 of 63
CS-33: Programming in Python
# access the instance attributes
print("{} is {} years old".format( [Link], [Link]))
print("{} is {} years old".format( [Link], [Link]))
When we run the program, the output will be:
Blu is a bird
Woo is also a bird
Blu is 10 years old
Woo is 15 years old
In the above program, we create a class with name Parrot. Then, we define attributes. The attributes are a
characteristic of an object. Then, we create instances of the Parrot class. Here, blu and woo are references
(value) to our new objects.

Then, we access the class attribute using __class __.species. Class attributes are same for all instances of a
class. Similarly, we access the instance attributes using [Link] and [Link]. However, instance attributes
are different for every instance of a class. To learn more about classes and objects, go to Python Classes and
Objects

Methods
Methods are functions defined inside the body of a class. They are used to define the behaviors of an object.
Example to Creating Methods in Python:
class Parrot:
# instance attributes
def __init__(self, name, age):
[Link] = name
[Link] = age
# instance method
def sing(self, song):
return "{} sings {}".format([Link], song)
def dance(self):
return "{} is now dancing".format([Link])
# instantiate the object
blu = Parrot("Blu", 10)
# call our instance methods
print([Link]("'Happy'"))
print([Link]())
When we run program, the output will be:
Blu sings 'Happy'
Blu is now dancing
In the above program, we define two methods i.e sing() and dance(). These are called instance method
because they are called on an instance object i.e blu.

Inheritance
Inheritance is a way of creating new class for using details of existing class without modifying it. The newly
formed class is a derived class (or child class). Similarly, the existing class is a base class (or parent
class).Example to Use of Inheritance in Python:
# parent class
class Bird:
def __init__(self):
print("Bird is ready")
def whoisThis(self):
print("Bird")
def swim(self):
print("Swim faster")
# child class
class Penguin(Bird):
def __init__(self):
# call super() function
super().__init__()
Page 7 of 63
print("Penguin is ready")
def whoisThis(self):
print("Penguin")
def run(self):
print("Run faster")
peggy = Penguin()
[Link]()
[Link]()
[Link]()
When we run this program, the output will be:
Bird is ready
Penguin is ready
Penguin
Swim faster
Run faster
In the above program, we created two classes i.e. Bird (parent class) and Penguin (child class). The child class
inherits the functions of parent class. We can see this from swim() method. Again, the child class modified
the behavior of parent class. We can see this from whoisThis() method. Furthermore, we extend the
functions of parent class, by creating a new run() [Link], we use super() function before
__init__() method. This is because we want to pull the content of __init__() method from the parent class
into the child class.

Encapsulation
Using OOP in Python, we can restrict access to methods and variables. This prevent data from direct
modification which is called encapsulation. In Python, we denote private attribute using underscore as prefix
i.e single “ _ “ or double “ __“.Example for Data Encapsulation in Python:
class Computer:
def __init__(self):
self.__maxprice = 900
def sell(self):
print("Selling Price: {}".format(self.__maxprice))
def setMaxPrice(self, price):
self.__maxprice = price
c = Computer()
[Link]()
# change the price
c.__maxprice = 1000
[Link]()
# using setter function
[Link](1000)
[Link]()
When we run this program, the output will be:
Selling Price: 900
Selling Price: 900
Selling Price: 1000
In the above program, we defined a class Computer. We use __init__() method to store the maximum selling
price of computer. We tried to modify the price. However, we can’t change it because Python treats the
__maxprice as private attributes. To change the value, we used a setter function i.e setMaxPrice() which
takes price as parameter.

Polymorphism
Polymorphism is an ability (in OOP) to use common interface for multiple form (data types). Suppose, we
need to color a shape, there are multiple shape option (rectangle, square, circle). However we could use
same method to color any shape. This concept is called Polymorphism. Example for Using Polymorphism in
Python:

Page 8 of 63
CS-33: Programming in Python
class Parrot:
def fly(self):
print("Parrot can fly")
def swim(self):
print("Parrot can't swim")
class Penguin:
def fly(self):
print("Penguin can't fly")
def swim(self):
print("Penguin can swim")
# common interface
def flying_test(bird):
[Link]()
#instantiate objects
blu = Parrot()
peggy = Penguin()
# passing the object
flying_test(blu)
flying_test(peggy)
When we run above program, the output will be:
Parrot can fly
Penguin can't fly
In the above program, we defined two classes Parrot and Penguin. Each of them have common method fly()
method. However, their functions are different. To allow polymorphism, we created common interface i.e
flying_test() function that can take any object. Then, we passed the objects blu and peggy in the flying_test()
function, it ran effectively.

Key Points to Remember:


 The programming gets easy and efficient.
 The class is sharable, so codes can be reused.
 The productivity of programmars increases
 Data is safe and secure with data abstraction.

What are classes and objects in Python?


Python is an object oriented programming language. Unlike procedure oriented programming, where the
main emphasis is on functions, object oriented programming stress on objects. Object is simply a collection
of data (variables) and methods (functions) that act on those data. And, class is a blueprint for the object.
We can think of class as a sketch (prototype) of a house. It contains all the details about the floors, doors,
windows etc. Based on these descriptions we build the house. House is the object. As, many houses can be
made from a description, we can create many objects from a class. An object is also called an instance of a
class and the process of creating this object is called instantiation.

Defining a Class in Python


Like function definitions begin with the keyword def, in Python, we define a class using the keyword class.
The first string is called docstring and has a brief description about the class. Although not mandatory, this is
recommended. Here is a simple class definition.
class MyNewClass:
'''This is a docstring. I have created a new class'''
pass
A class creates a new local namespace where all its attributes are defined. Attributes may be data or
functions. There are also special attributes in it that begins with double underscores (__). For example,
__doc__ gives us the docstring of that class. As soon as we define a class, a new class object is created with
the same name. This class object allows us to access the different attributes as well as to instantiate new
objects of that class.

class MyClass:
"This is my second class"
Page 9 of 63
a = 10
def func(self):
print('Hello')
# Output: 10
print(MyClass.a)
# Output: <function [Link] at 0x0000000003079BF8>
print([Link])
# Output: 'This is my second class'
print(MyClass.__doc__)
When you run the program, the output will be:
10
<function 0x7feaa932eae8="" at="" [Link]="">
This is my second class
Creating an Object in Python
We saw that the class object could be used to access different attributes. It can also be used to create new
object instances (instantiation) of that class. The procedure to create an object is similar to a function call.
>>> ob = MyClass()
This will create a new instance object named ob. We can access attributes of objects using the object name
prefix. Attributes may be data or method. Method of an object are corresponding functions of that class. Any
function object that is a class attribute defines a method for objects of that class. This means to say, since
[Link] is a function object (attribute of class), [Link] will be a method object.
class MyClass:
"This is my second class"
a = 10
def func(self):
print('Hello')
# create a new MyClass
ob = MyClass()
# Output: <function [Link] at 0x000000000335B0D0>
print([Link])
# Output: <bound method [Link] of <__main__.MyClass object at
0x000000000332DEF0>>
print([Link])
# Calling function func()
# Output: Hello
[Link]()
You may have noticed the self parameter in function definition inside the class but, we called the method
simply as [Link]() without any arguments. It still worked. This is because, whenever an object calls its
method, the object itself is passed as the first argument. So, [Link]() translates into [Link](ob). In
general, calling a method with a list of n arguments is equivalent to calling the corresponding function with
an argument list that is created by inserting the method's object before the first argument.

For these reasons, the first argument of the function in class must be the object itself. This is conventionally
called self. It can be named otherwise but we highly recommend to follow the convention. Now you must be
familiar with class object, instance object, function object, method object and their differences.

Constructors in Python
Class functions that begins with double underscore (__) are called special functions as they have special
meaning. Of one particular interest is the __init__() function. This special function gets called whenever a
new object of that class is instantiated. This type of function is also called constructors in Object Oriented
Programming (OOP). We normally use it to initialize all the variables.
class ComplexNumber:
def __init__(self,r = 0,i = 0):
[Link] = r
[Link] = i
def getData(self):
print("{0}+{1}j".format([Link],[Link]))

Page 10 of 63
CS-33: Programming in Python
# Create a new ComplexNumber object
c1 = ComplexNumber(2,3)
# Call getData() function
# Output: 2+3j
[Link]()
# Create another ComplexNumber object
# and create a new attribute 'attr'
c2 = ComplexNumber(5)
[Link] = 10
# Output: (5, 0, 10)
print(([Link], [Link], [Link]))
# but c1 object doesn't have attribute 'attr'
# AttributeError: 'ComplexNumber' object has no attribute 'attr'
[Link]
In the above example, we define a new class to represent complex numbers. It has two functions, __init__()
to initialize the variables (defaults to zero) and getData() to display the number [Link] interesting thing
to note in the above step is that attributes of an object can be created on the fly. We created a new attribute
attr for object c2 and we read it as well. But this did not create that attribute for object c1.

What is Inheritance?
Inheritance is a powerful feature in object oriented programming. It refers to defining a new class with little
or no modification to an existing class. The new class is called derived (or child) class and the one from which
it inherits is called the base (or parent) class. Python Inheritance Syntax:
class BaseClass:
Body of base class
class DerivedClass(BaseClass):
Body of derived class
Derived class inherits features from the base class, adding new features to it. This results into re-usability of
code.

Example of Inheritance in Python


To demonstrate the use of inheritance, let us take an example. A polygon is a closed figure with 3 or more
sides. Say, we have a class called Polygon defined as follows.
class Polygon:
def __init__(self, no_of_sides):
self.n = no_of_sides
[Link] = [0 for i in range(no_of_sides)]
def inputSides(self):
[Link] = [float(input("Enter side "+str(i+1)+" : ")) for
i in range(self.n)]
def dispSides(self):
for i in range(self.n):
print("Side",i+1,"is",[Link][i])
This class has data attributes to store the number of sides, n and magnitude of each side as a list, sides.
Method inputSides() takes in magnitude of each side and similarly, dispSides() will display these properly. A
triangle is a polygon with 3 sides. So, we can created a class called Triangle which inherits from Polygon. This
makes all the attributes available in class Polygon readily available in Triangle. We don't need to define them
again (code re-usability).

Method Overriding in Python


In the above example, notice that __init__() method was defined in both classes, Triangle as well Polygon.
When this happens, the method in the derived class overrides that in the base class. This is to say, __init__()
in Triangle gets preference over the same in Polygon.

Generally when overriding a base method, we tend to extend the definition rather than simply replace it.
The same is being done by calling the method in base class from the one in derived class (calling
Polygon.__init__() from __init__() in Triangle).A better option would be to use the built-in function super().
Page 11 of 63
So, super().__init__(3) is equivalent to Polygon.__init__(self,3) and is preferred. You can learn more about
the super() function in Python.

Two built-in functions isinstance() and issubclass() are used to check inheritances. Function isinstance()
returns True if the object is an instance of the class or other classes derived from it. Each and every class in
Python inherits from the base class object.

What is operator overloading in Python?


Python operators work for built-in classes. But same operator behaves differently with different types. For
example, the + operator will, perform arithmetic addition on two numbers, merge two lists and concatenate
two strings. This feature in Python, that allows same operator to have different meaning according to the
context is called operator overloading. So what happens when we use them with objects of a user-defined
class? Let us consider the following class, which tries to simulate a point in 2-D coordinate system.
class Point:
def __init__(self, x = 0, y = 0):
self.x = x
self.y = y
Now, run the code and try to add two points in Python shell.
>>> p1 = Point(2,3)
>>> p2 = Point(-1,2)
>>> p1 + p2
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +: 'Point' and 'Point'
Overloading the + Operator in Python
To overload the + sign, we will need to implement __add__() function in the class. With great power comes
great responsibility. We can do whatever we like, inside this function. But it is sensible to return a Point
object of the coordinate sum.
class Point:
def __init__(self, x = 0, y = 0):
self.x = x
self.y = y
def __str__(self):
return "({0},{1})".format(self.x,self.y)
def __add__(self,other):
x = self.x + other.x
y = self.y + other.y
return Point(x,y)
Now let's try that addition again.
>>> p1 = Point(2,3)
>>> p2 = Point(-1,2)
>>> print(p1 + p2)
(1,5)

Python magic methods or special functions for operator overloading


Binary Operators:
OperatorMagic Method

+ __add__(self, other)

– __sub__(self, other)

* __mul__(self, other)

/ __truediv__(self, other)

// __floordiv__(self, other)

Page 12 of 63
CS-33: Programming in Python
% __mod__(self, other)

** __pow__(self, other)

>> __rshift__(self, other)

<< __lshift__(self, other)

& __and__(self, other)

| __or__(self, other)

^ __xor__(self, other)

Comparison Operators:
OperatorMagic Method

< __lt__(self, other)

> __gt__(self, other)

<= __le__(self, other)

>= __ge__(self, other)

== __eq__(self, other)

!= __ne__(self, other)

Assignment Operators:
Operator Magic Method

-= __isub__(self, other)

+= __iadd__(self, other)

*= __imul__(self, other)

/= __idiv__(self, other)

//= __ifloordiv__(self, other)

%= __imod__(self, other)

**= __ipow__(self, other)

Page 13 of 63
>>= __irshift__(self, other)

<<= __ilshift__(self, other)

&= __iand__(self, other)

|= __ior__(self, other)

^= __ixor__(self, other)

Unary Operators:
Operator Magic Method

– __neg__(self)

+ __pos__(self)

~ __invert__(self)

Search Algorithms
A search algorithm is a method for finding an item or group of items with specific properties within a
collection of items. We refer to the collection of items as a search space. The search space might be
something concrete, such as a set of electronic medical records, or something abstract, such as the set of all
integers. A large number of problems that occur in practice can be formulated as search problems.

Many of the algorithms presented earlier in this book can be viewed as search algorithms. We formulated
finding an approximation to the roots of a polynomial as a search problem, and looked at three algorithms—
exhaustive enumeration, bisection search, and Newton-Raphson—for searching the space of possible
answers. In this section, we will examine two algorithms for searching a list. Each meetsthe specification:

def search(L, e):


"""Assumes L is a list.
Returns True if e is in L and False otherwise"""

The astute reader might wonder if this is not semantically equivalent to the Python expression e in L. The
answer is yes, it is. And if one is unconcerned about the efficiency of discovering whether e is in L, one
should simply write that expression.

Linear Search and Using Indirection to Access Elements


Python uses the following algorithm to determine if an element is in a list:
Page 14 of 63
CS-33: Programming in Python
def search(L, e):
for i in range(len(L)):
if L[i] == e:
return True
return False

Binary Search and Exploiting Assumptions


Getting back to the problem of implementing search(L, e), is O(len(L)) the best we can do? Yes, if we know
nothing about the relationship of the values of the elements in the list and the order in which they are
stored. In the worst case, we have to look at each element in L to determine whether L contains e. But
suppose we know something about the order in which elements are stored, e.g., suppose we know that we
have a list of integers stored in ascending order. We could change the implementation so that the search
stops when it reaches a number larger than the number for which it is searching:

def search(L, e):


"""Assumes L is a list, the elements of which are in
ascending order.
Returns True if e is in L and False otherwise"""
for i in range(len(L)):
if L[i] == e:
return True
if L[i] > e:
return False
return False
This would improve the average running time. However, it would not change theworst-case complexity of
the algorithm, since in the worst case each element of Lis examined.

Sorting Algorithms
We have just seen that if we happen to know that a list is sorted, we can exploit that information to greatly
reduce the time needed to search a list. Does this mean that when asked to search a list one should first sort
it and then perform the search?

Let O(sortComplexity(L)) be the complexity of sorting a list. Since we know that we can always search a list in
O(len(L)) time, the question of whether we should first sort and then search boils down to the question, is
(sortComplexity(L) + log(len(L))) < len(L)? The answer, sadly, is no. One cannot sort a list without looking at
each element in the list at least once, so it is not possible to sort a list in sub-linear time.

Does this mean that binary search is an intellectual curiosity of no practical import? Happily, no. Suppose
that one expects to search the same list many times. It might well make sense to pay the overhead of sorting
the list once, and then amortize the cost of the sort over many searches. If we expect to search the list k
times, the relevant question becomes, is (sortComplexity(L) + k*log(len(L))) less than k*len(L)? As k becomes
large, the time required to sort the list becomes increasingly irrelevant.

How big k needs to be depends upon how long it takes to sort a list. If, for example, sorting were exponential
in the size of the list, k would have to be quite large.

Hash Tables
If we put merge sort together with binary search, we have a nice way to search lists. We use merge sort to
preprocess the list in O(n*log(n)) time, and then we use binary search to test whether elements are in the
list in O(log(n)) time. If we search the list k times, the overall time complexity is O(n*log(n) + k*log(n)). This is
good, but we can still ask, is logarithmic the best that we can do for search when we are willing to do some
preprocessing?

When we introduced the type dict we said that dictionaries use a technique called hashing to do the lookup
in time that is nearly independent of the size of the dictionary. The basic idea behind a hash table is simple.
We convert the key to an integer, and then use that integer to index into a list, which can be done in
Page 15 of 63
constant time. In principle, values of any immutable type can be easily converted to an integer. After all, we
know that the internal representation of each object is a sequence of bits, and any sequence of bits can be
viewed as representing an integer. For example, the internal representation of 'abc' is the string of bits
011000010110001001100011, which can be viewed as a representation of the decimal integer 6,382,179. Of
course, if we want to use the internal representation of strings as indices into a list, the list is going to have
to be pretty darn long.

Page 16 of 63
CS- 33Programing in Python

Unit-3
Plotting using PyLab
CS-33: Programming in Python

Often text is the best way to communicate information, but sometimes there is a lot of truth to the Chinese
proverb, “A picture's meaning can express ten thousand words”. Yet most programs rely on textual output
to communicate with their users. Why? Because in many programming languages presenting visual data is
too hard. Fortunately, it is simple to do in Python.

Plotting Using PyLab


PyLab is a Python standard library module that provides many of the facilities of MATLAB, “a high-level
technical computing language and interactive environment for algorithm development, data visualization,
data analysis, and numeric computation.”57 Later in the book, we will look at some of the more advanced
features of PyLab, but in this chapter we focus on some of its facilities for plotting data. A complete user’s
guide for PyLab is at the Web site [Link]/users/[Link]. There are also a number of
Web sites that provide excellent tutorials. We will not try to provide a user’s guide or a complete tutorial
here. Instead, in this chapter we will merely provide a few example plots and explain the code that
generated them. Other examples appear in later chapters.

Pylab is a programming environment, built on a set of unofficial python tools and libraries that turns Python
into a high-performance scientific computing platform. The name pylab comes in part from the resemblance
of the resulting environment to MATLAB. The components of pylab have developed largely independently,
so there's no unique or "official" distribution. Nonetheless, there are at least four core components (in
addition the standard python distribution) required to have an environment that can reasonably be
considered a pylab environment. These are:

NumPy: this is a set of high-performance libraries (implemented in Fortran and C) that implement
contiguous-memory multidimensional arrays, BLAS and LAPACK linear algebra routines and many other
useful numerical tools. This is listed first because all other components depend on it.

Matplotlib: this is pylab's plotting library. It is set up to seem familiar to users accustomed to Matlab's
plotting utilities, but it is in many ways much more powerful and flexible (It lets you choose between many
different backend renderers, for example, and allows you to build plots in an object-oriented manner).
Matplotlib is an excellent 2D and 3D graphics library for generating scientific figures. Some of the many
advantages of this library include:
 Easy to get started
 Support for LATEX formatted labels and texts
 Great control of every element in a figure, including figure size and DPI.
 High-quality output in many formats, including PNG, PDF, SVG, EPS.
 GUI for interactively exploring figures and support for headless generation of figure files (useful for
batch jobs).
One of the key features of matplotlib is that all aspects of the figure can be controlled programmatically (i.e.,
without needing to muck around with the GUI). This is important for reproducibility and convenient when
one needs to regenerate the figure with updated data or change its [Link] information at the
Matplotlib web page: [Link]

Matplotlib is automatically included as part of the interactive pylab namespace, but if you need to import it
in its own namespace (e.g., in a non-interactive script or module).

SciPy: this is a set of mostly distinct modules implementing a variety of really useful scientific computing
tasks, including signal processing, FFTs, optimization, statistics, interpolation, numerical integration, etc. It
contains a lot of stuff that you would expect to see in any good scientific programming environment. Though
it depends strongly on Numpy, it is not as completely integrated into the pylab environment as are the other
components. As a result, you'll need to explicitly import most modules from scipy even if you've already
imported the pylab namespace.

Page 1 of 63
Let’s start with a simple example that uses [Link] to produce two plots.

import pylab
[Link](1) #create figure 1
[Link]([1,2,3,4], [1,7,3,5]) #draw on figure 1
[Link]() #show figure on screen

will cause a window to appear on your computer monitor. Its exact appearance may depend on the
operating system on your machine, but it will look similar to the following:

Parts of a Chart:

Page 2 of 63
CS-33: Programming in Python
Functions used for plotting:

Plot():
The two parameters of [Link] must be sequences of the same length. The first specifies the x-
coordinates of the points to be plotted, and the second specifies the y-coordinates. Together, they provide a
sequence of four <x, y> coordinate pairs, [(1,1), (2,7), (3,3), (4,5)]. These are plotted in order. As each point is
plotted, a line is drawn connecting it to the previous point. plot() is a versatile command, and will take an
arbitrary number of arguments. For example, to plot x versus y. For every x, y pair of arguments, there is an
optional third argument which is the format string that indicates the color and line type of the plot. The
letters and symbols of the format string are from MATLAB, and you concatenate a color string with a line
style string. The default format string is ‘b-‘, which is a solid blue line.

Show():
[Link](), causes the window to appear on the computer screen. If that line were not present, the figure
would still have been produced, but it would not have been displayed. [Link]() causes the process
running Python to be suspended until the figure is closed. The usual workaround is to ensure that
[Link]() is the last line of code to be executed.

Xlabel():
[Link](), causes to set title of x-axis. One should pass the string value with the method that shows the
values on x-axis.

Ylabel():
[Link](), causes to set title of y-axis. One should pass the string value with the method that shows the
values on x-axis.

Title():
[Link](), causes to display title on the entire plot. One should pass the string value with the method, and
it should be relevant with the entire plot area.

Hist():
Plot a histogram. Compute and draw the histogram of x. The return value is a tuple (n, bins, patches) or ([n0,
n1, ...], bins, [patches0, patches1,...]) if the input contains multiple [Link] data can be provided via x
as a list of datasets of potentially different length ([x0, x1, ...]), or as a 2-D ndarray in which each column is a
dataset. Note that the ndarray form is transposed relative to the list form.

Legend():
Places a legend on the axes. To make a legend for lines which already exist on the axes (via plot for instance),
simply call this function with an iterable of strings, one for each legend item. For example:
[Link]([1, 2, 3])
[Link](['A simple line'])

Plotting Mortgages, an Extended Example


A collapse in U.S. housing prices helped trigger a severe economic meltdown in the fall of 2008. One of the
contributing factors was that many homeowners had taken on mortgages that ended up having unexpected
consequences. In the beginning, mortgages were relatively simple beasts. One borrowed money from a bank
and made a fixed-size payment each month for the life of the mortgage, which typically ranged from fifteen
to thirty years. At the end of that period, the bank had been paid back the initial loan (the principal) plus
interest, and the homeowner owned the house “free and clear.”

Towards the end of the twentieth century, mortgages started getting a lot more complicated. People could
get lower interest rates by paying “points” at the time they took on the mortgage. A point is a cash payment
of 1% of the value of the loan. People could take mortgages that were “interest-only” for a period of time.
That is to say, for some number of months at the start of the loan the borrower paid only the accrued
interest and none of the principal. Other loans involved multiple rates. Typically the initial rate (called a
Page 3 of 63
“teaser rate”) was low, and then it went up over time. Many of these loans were variable-rate—the rate to
be paid after the initial period would vary depending upon some index intended to reflect the cost to the
lender of borrowing on the wholesale credit market. We worked our way through a hierarchy of mortgages
as way of illustrating the use of subclassing.

We concluded that chapter by observing that “our program should be producing plots designed to show how
the mortgage behaves over time.” enhances class Mortgage by adding methods that make it convenient to
produce such plots. (The function findPayment, which is used in Mortgage)The methods plotPayments and
plotBalance are simple one-liners, but they do usea form of [Link] that we have not yet seen.
def findPayment(loan, r, m):
"""Assumes: loan and r are floats, m an int
Returns the monthly payment for a mortgage of size
loan at a monthly rate of r for m months"""
return loan*((r*(1+r)**m)/((1+r)**m - 1))
class Mortgage(object):
"""Abstract class for building different kinds of mortgages"""
def __init__(self, loan, annRate, months):
"""Create a new mortgage"""
[Link] = loan
[Link] = annRate/12.0
[Link] = months
[Link] = [0.0]
[Link] = [loan]
[Link] = findPayment(loan, [Link], months)
[Link] = None #description of mortgage
def makePayment(self):
"""Make a payment"""
[Link]([Link])
reduction = [Link] - [Link][-1]*[Link]
[Link]([Link][-1] - reduction)
def getTotalPaid(self):
"""Return the total amount paid so far"""
return sum([Link])
def __str__(self):
return [Link]

Fibonacci Sequences, Revisited


While this implementation of the recurrence is obviously correct, it is terribly inefficient. Try, for example,
running fib(120), but don’t wait for it to complete. The complexity of the implementation is a bit hard to
derive, but it is roughly O(fib(n)). That is, its growth is proportional to the growth in the value of the result,
and the growth rate of the Fibonacci sequence is substantial. For example, fib(120) is
8,670,007,398,507,948,658,051,921. If each recursive call took a nanosecond, fib(120) would take about
250,000 years to finish. Let’s try and figure out why this implementation takes so long. Given the tiny
amount of code in the body of fib, it’s clear that the problem must be the number of times that fib calls
itself. As an example, look at the tree of calls associated with the invocation fib(6).

Page 4 of 63
CS-33: Programming in Python
Notice that we are computing the same values over and over again. For example fib gets called with 3 three
times, and each of these calls provokes four additional calls of fib. It doesn’t require a genius to think that it
might be a good idea to record the value returned by the first call, and then look it up rather than compute it
each time it is needed. This is called memoization, and is the key idea behind dynamic programming.
def fastFib(n, memo = {}):
"""Assumes n is an int >= 0, memo used only by recursive calls
Returns Fibonacci of n"""
if n == 0 or n == 1:
return 1
try:
return memo[n]
except KeyError:
result = fastFib(n-1, memo) + fastFib(n-2, memo)
memo[n] = result
return result

If you try running fastFib, you will see that it is indeed quite fast: fib(120) returns almost instantly. What is
the complexity of fastFib? It calls fib exactly once for each value from 0 to n. Therefore, under the
assumption that dictionary lookup can be done in constant time, the time complexity of fastFib(n) is O(n).

Dynamic programming and the 0/1 Knapsack algorithm


One of the optimization problems we looked at in Chapter 17 was the 0/1 knapsack problem. Recall that we
looked at a greedy algorithm that ran in n log n time, but was not guaranteed to find an optimal solution. We
also looked at a brute-force algorithm that was guaranteed to find an optimal solution, but ran in
exponential time. Finally, we discussed the fact that the problem is inherently exponential in the size of the
input. In the worst case, one cannot find an optimal solution without looking at all possible answers.

Fortunately, the situation is not as bad as it seems. Dynamic programming provides a practical method for
solving most 0/1 knapsack problems in a reasonable amount of time. As a first step in deriving such a
solution, we begin with an exponential solution based on exhaustive enumeration. The key idea is to think
about exploring the space of possible solutions by constructing a rooted binary tree that enumerates all
states that satisfy the weight constraint. A rooted binary tree is an acyclic directed graph in which
• There is exactly one node with no parents. This is called the root.
• Each non-root node has exactly one parent.
• Each node has at most two children. A childless node is called a leaf.
Each node in the search tree for the 0/1 knapsack problem is labeled with a quadruple that denotes a partial
solution to the knapsack problem

Dynamic programming and divide and conquer


Like divide-and-conquer algorithms, dynamic programming is based upon solving independent subproblems
and then combining those solutions. There are, however, some important differences. Divide-and-conquer
algorithms are based upon finding subproblems that are substantially smaller than the original problem. For
example, merge sort works by dividing the problem size in half at each step. In contrast, dynamic
programming involves solving problems that are only slightly smaller than the original problem.

For example, computing the 19th Fibonacci number is not a substantially smaller problem than computing
the 20th Fibonacci number. Another important distinction is that the efficiency of divide-and-conquer
algorithms does not depend upon structuring the algorithm so that the same problems are solved
repeatedly. In contrast, dynamic programming is efficient only when the number of distinct subproblems is
significantly smaller than the total number of subproblems.

Page 5 of 63
CS- 33Programing in Python

Unit-4
Regular Expressions
RegEx Introduction:

A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. For example,
^a...s$
The above code defines a RegEx pattern. The pattern is: any five letter string starting with a and ending
withs.A pattern defined using RegEx can be used to match against a string.

Expression String Matched?

abs No match

alias Match

^a...s$ abyss Match

Alias No match

An abacus No match

Python has a module named re to work with RegEx. Here's an example:


import re
pattern = '^a...s$'
test_string = 'abyss'
result = [Link](pattern, test_string)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")

Here, we used [Link]() function to search pattern within the test_string. The method returns a match
object if the search is successful. If not, it returns [Link] are other several functions defined in the re
module to work with RegEx. Before we explore that, let's learn about regular expressions [Link] you
already know the basics of RegEx, jump to Python RegEx.

Specify Pattern Using RegEx


To specify regular expressions, metacharacters are used. In the above example, ^ and $ are metacharacters.

MetaCharacters
Metacharacters are characters that are interpreted in a special way by a RegEx engine. Here's a list of
metacharacters:[] . ^ $ * + ? {} () \ |

[] - Square brackets:Square brackets specifies a set of characters you wish to match.

Expression String Matched?

a 1 match
[abc]
ac 2 matches

Page 1 of 63
Unit – 4Regular Expressions

Expression String Matched?

Hey Jude No match

abc de ca 5 matches

. – Period:A period matches any single character (except newline '\n').

Expression String Matched?

a No match

ac 1 match
..
acd 1 match

acde 2 matches (contains 4 characters)


^ - Caret: The caret symbol ^ is used to check if a string starts with a certain character.

Expression String Matched?

a 1 match

^a abc 1 match

bac No match

abc 1 match
^ab
acb No match (starts with a but not followed by b)
$ - Dollar: The dollar symbol $ is used to check if a string ends with a certain character.

Expression String Matched?

a 1 match

a$ formula 1 match

cab No match
* - Star: The star symbol * matches zero or more occurrences of the pattern left to it.

Expression String Matched?

mn 1 match

ma*n man 1 match

maaan 1 match

Page 2 of 63
Expression String Matched?

main No match (a is not followed by n)

woman 1 match
+ - Plus: The plus symbol * matches one or more occurrences of the pattern left to it.

Expression String Matched?

mn No match (no a character)

man 1 match

ma+n maaan 1 match

main No match (a is not followed by n)

woman 1 match
? - Question Mark: The question mark symbol ? matches zero or one occurrence of the pattern left to it.

Expression String Matched?

mn 1 match

man 1 match

ma?n maaan No match (more than one a character)

main No match (a is not followed by n)

woman 1 match
{} – Braces: Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to
it.

Expression String Matched?

abc dat No match

abc daat 1 match (at daat)


a{2,3}
aabc daaat 2 matches (at aabc and daaat)

aabc daaaat 2 matches (at aabc and daaaat)


Let's try one more example. This RegEx [0-9]{2, 4} matches at least 2 digits but not more than 4 digits.

Expression String Matched?

[0-9]{2,4} ab123csde 1 match (match at ab123csde)

Page 3 of 63
Unit – 4Regular Expressions

Expression String Matched?

12 and 345673 2 matches (at 12 and 345673)

1 and 2 No match
| - Alternation: Vertical bar | is used for alternation (or operator).

Expression String Matched?

cde No match

a|b ade 1 match (match at ade)

acdbea 3 matches (at acdbea)


() – Group: Parentheses () is used to group sub-patterns. For example, (a|b|c)xz match any string that
matches either a or b or c followed by xz.

Expression String Matched?

ab xz No match

(a|b|c)xz abxz 1 match (match at abxz)

axz cabxz 2 matches (at axzbc cabxz)


\ - Backslash: Backlash \ is used to escape various characters including all metacharacters. For example, \$a
match if a string contains $ followed by a. Here, $ is not interpreted by a RegEx engine in a special way. If you
are unsure if a character has special meaning or not, you can put \ in front of it. This makes sure the
character is not treated in a special way.

Special Sequences: Special sequences make commonly used patterns easier to write. Here's a list of special
sequences: \A - Matches if the specified characters are at the start of a string.

Expression String Matched?

the sun Match


\Athe
In the sun No match
\b - Matches if the specified characters are at the beginning or end of a word.

Expression String Matched?

football Match

\bfoo a football Match

afootball No match

foo\b the foo Match

Page 4 of 63
Expression String Matched?

the afoo test Match

the afootest No match


\B - Opposite of \b. Matches if the specified characters are not at the beginning or end of a word.

Expression String Matched?

football No match

\Bfoo a football No match

afootball Match

the foo No match

foo\B the afoo test No match

the afootest Match


\d - Matches any decimal digit. Equivalent to [0-9]

Expression String Matched?

12abc3 3 matches (at 12abc3)


\d
Python No match

\D - Matches any non-decimal digit. Equivalent to [^0-9]

Expression String Matched?

1ab34"50 3 matches (at 1ab34"50)


\D
1345 No match

\s - Matches where a string contains any whitespace character. Equivalent to [


\t\n\r\f\v].

Expression String Matched?

\s Python RegEx 1 match

Page 5 of 63
Unit – 4Regular Expressions

Expression String Matched?

PythonRegEx No match

\S - Matches where a string contains any non-whitespace character. Equivalent to [^


\t\n\r\f\v].

Expression String Matched?

a b 2 matches (at a b)
\S
No match

\w - Matches any alphanumeric character (digits and alphabets). Equivalent to [a-zA-Z0-


9_]. By the way, underscore _ is also considered an alphanumeric character.

Expression String Matched?

12&": ;c 3 matches (at 12&": ;c)


\w
%"> ! No match

\W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_]

Expression String Matched?

1a2%c 1 match (at 1a2%c)


\W
Python No match

\Z - Matches if the specified characters are at the end of a string.

Expression String Matched?

Page 6 of 63
Expression String Matched?

I like Python 1 match

\ZPython I like Python No match

Python is fun. No match

Tip: To build and test regular expressions, you can use RegEx tester tools such as regex101. This tool not only
helps you in creating regular expressions, but it also helps you learn it. Now you understand the basics of
RegEx, let's discuss how to use RegEx in your Python code.

Python RegEx
Python has a module named re to work with regular expressions. To use it, we need to import the module.
Import re. The module defines several functions and constants to work with RegEx.

[Link](): The [Link]() method returns a list of strings containing all matches.

[Link]():The [Link] method splits the string where there is a match and returns a list of strings where the
splits have occurred.

[Link](): The syntax of [Link]() is:


[Link](pattern, replace, string)
The method returns a string where matched occurrences are replaced with the content of replace variable.

[Link](): The [Link]() is similar to [Link]() expect it returns a tuple of 2 items containing the new string
and the number of substitutions made.

[Link](): The [Link]() method takes two arguments: a pattern and a string. The method looks for the
first location where the RegEx pattern produces a match with the string. If the search is successful,
[Link]() returns a match object; if not, it returns None.
match = [Link](pattern, str)

Match object
You can get methods and attributes of a match object using dir() function. Some of the commonly used
methods and attributes of match objects are:

[Link](): The group() method returns the part of the string where there is a match.

[Link](), [Link]() and [Link](): The start() function returns the index of the start of the
matched substring. Similarly, end() returns the end index of the matched substring.

[Link] and [Link]: The re attribute of a matched object returns a regular expression object.
Similarly, string attribute returns the passed string. Using r prefix before RegEx. When r or R prefix is used
before a regular expression, it means raw string. For example, '\n' is a new line whereas r'\n' means two

Page 7 of 63
Unit – 4Regular Expressions
characters: a backslash \ followed by n. Backlash \ is used to escape various characters including all
metacharacters. However, using r prefix makes \ treat as a normal character.

Text Processing:
Comma Separated values
In python, we use [Link]() module to read the csv file. Here, we will show you how to read different
types of csv files with different delimiter like quotes(""), pipe(|) and comma(,). We have a csv file called
[Link] having default delimiter comma(,) with following data:
SN, Name, City
1, John, Washington
2, Eric, Los Angeles
3, Brad, Texas
Example: Read [Link] file, where delimiter is comma (,)
import csv
with open('[Link]', 'r') as csvFile:
reader = [Link](csvFile)
for row in reader:
print(row)
[Link]()
When we run the above program, the output will be
['SN', ' Name', ' City']
['1', ' John', ' Washington']
['2', ' Eric', ' Los Angeles']
['3', ' Brad', ' Texas']
In Python we use [Link]() module to write data into csv files. This module is similar to the [Link]()
module. Writing on Existing File, We have a [Link] file with following data.
SN, Name, City
1, John, Washington
2, Eric, Los Angeles
3, Brad, Texas
Now, we are going to modify [Link] file.
Example 1: Modifying existing rows of [Link]
import csv
row = ['2', ' Marie', ' California']
with open('[Link]', 'r') as readFile:
reader = [Link](readFile)
lines = list(reader)
lines[2] = row
with open('[Link]', 'w') as writeFile:
writer = [Link](writeFile)
[Link](lines)
[Link]()
[Link]()
When we open the [Link] file with text editor, then it will show:
SN, Name, City
1, John, Washington
2, Marie, California
3, Brad, Texas

Page 8 of 63
JavaScript Object Notation (JSON)
JSON (JavaScript Object Notation) is a popular data format used for representing structured data. It's
common to transmit and receive data between a server and web application in JSON format. In Python,
JSON exists as a string. For example:
p = '{"name": "Bob", "languages": ["Python", "Java"]}'
It's also common to store a JSON object in a file. Import json Module To work with JSON (string, or file
containing JSON object), you can use Python's json module. You need to import the module before you can
use it.
import json
Parse JSON in Python: The json module makes it easy to parse JSON strings and files containing JSON object.
Example 1: Python JSON to dict
You can parse a JSON string using [Link]() method. The method returns a dictionary.
import json
person = '{"name": "Bob", "languages": ["English", "Fench"]}'
person_dict = [Link](person)
# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}
print( person_dict)
# Output: ['English', 'French']
print(person_dict['languages'])

Python Convert to JSON string


You can convert a dictionary to JSON string using [Link]() method.
Example: Convert dict to JSON
import json
person_dict = {'name': 'Bob', 'age': 12, 'children': None}
person_json = [Link](person_dict)
# Output: {"name": "Bob", "age": 12, "children": null}
print(person_json)

Writing JSON to a file


To write JSON to a file in Python, we can use [Link]() method.
Example: Writing JSON to a file
import json
person_dict = {"name": "Bob", "languages": ["English", "Fench"],
"married": True, "age": 32}
with open('[Link]', 'w') as json_file:
[Link](person_dict, json_file)
In the above program, we have opened a file named [Link] in writing mode using 'w'. If the file doesn't
already exist, it will be created. Then, [Link]() transforms person_dict to a JSON string which will be
saved in the [Link] file.

Python and XML


XML is a tag-based language for defining structured information, commonly used to define documents and
data shipped over the Web and other applications. To get information from XML, developers need to parse
the XML There are different ways to parse a XML file in Python:
1. Parse XML using re Pattern Module.
2. Parse XML using DOM parser.
3. Parse XML using SAX parser.
4. Parse XML using ElementTree from etree package.

Page 9 of 63
Unit – 4Regular Expressions

Content of xml file([Link]) which we will be parsing using different parser in this tutorial.
<Emp>
<firstName>Rahul</firstName>
<lastName>Anand</lastName>
<dob>20/10/1990</dob>
<gender>Male</gender>
<department>Finance</department>
<department>Admin</department>
</Emp>
Parse XML using re Pattern Module. Python has inbuilt support of pattern matching which is available under
re Pattern Module. One can parse xml file using pattern matching as follow:
import re
content = open("[Link]").read();
#get all departments
departments = [Link]('<department>(.*)</department>', content)
for department in departments : print(department)
#get firstName
firstName = [Link]('<firstName>(.*)</firstName>', content)
print(firstName)

Mailmerge
When we want to send the same invitations to many people, the body of the mail does not change. Only the
name (and maybe address) needs to be changed. Mail merge is a process of doing this. Instead of writing
each mail separately, we have a template for body of the mail and a list of names that we merge together to
form all the mails. Source Code to Merge Mails:
# Python program to mail merger
# Names are in the file [Link]
# Body of the mail is in [Link]
# open [Link] for reading
with open("[Link]",'r',encoding = 'utf-8') as names_file:
# open [Link] for reading
with open("[Link]",'r',encoding = 'utf-8') as body_file:
# read entire content of the body
body = body_file.read()
# iterate over names
for name in names_file:
mail = "Hello "+name+body
# write the mails to individual files
with open([Link]()+".txt",'w',encoding = 'utf-8') as
mail_file:
mail_file.write(mail)
For this program, we have written all the names in separate lines in the file "[Link]". The body is in the
"[Link]" file. We open both the files in reading mode and iterate over each name using a for loop. A new
file with the name "[name].txt" is created, where name is the name of that person. We use strip() method to
clean up leading and trailing whitespaces (reading a line from the file also reads the newline '\n' character).
Finally, we write the content of the mail into this file using the write() method.
Case Study: Create Regular expressions (Custom)Process telephone numbers, Generate log data, HTML
Generators, Tweet Scrub, Amazone Screen Scrapper.

Page 10 of 63
CS- 33Programing in Python

Unit-5
Python and Data Analytics
Understand the problem By Understanding the Data
This chapter has two purposes. One is to familiarize you with data sets that will be used later as examples of
different types of problems to be solved using the algorithms. “Penalized Linear Regression,” and “Ensemble
Methods.” The other purpose is to demonstrate some of the tools available in Python for data exploration.

The Anatomy of a New Problem


The algorithms covered in this book start with a matrix (or table) full of numbers and perhaps some
character variables. The example in Table establishes some nomenclature and represents a small machine
learning data set in a two‐ dimensional table. The table will give you a mental image of a data set so that
references to “columns corresponding to attributes” or rows corresponding to individual examples will be
familiar. In this example, the predictive analytics problem is to predict how much money individuals will
spend buying books online over the next year.

The data are arranged into rows and columns. Each row represents an individual case (also called an
instance, example, or observation). The columns in Table are given designations that indicate the roles they
will play in the machine learning problem. The columns designated as attributes will be used to make
predictions of the dollars spent on books. In the column designated as labels, you’ll see how much each
customer spent last year on books.

Different Types of Attributes and Labels Drive Modeling Choices


The attributes shown in Table come in two different types: numeric variables and categorical (or factor)
variables. Attribute 1 (height) is a numeric variable and is the most usual type of attribute. Attribute 2 is
gender and is indicated by the entry Male or Female. This type of attribute is called a categorical or factor
variable. Categorical variables have the property that there’s no order relation between the various values.
There’s no sense to Male < Female (despite centuries of squabbling). Categorical variables can be
two‐valued, like Male Female, or multivalued, like states (AL, AK, AR . . . WY). Other distinctions can be
drawn regarding attributes (integer versus float, for example), but they do not have the same impact on
machine learning algorithms. The reason for this is that many machine learning algorithms take numeric
attributes only; they cannot handle categorical or factor variables. Penalized regression algorithms deal only
with numeric attributes. The same is true for support vector machines, kernel methods, and K‐nearest
neighbors. Chapter will cover methods for converting categorical variables to numeric variables. The nature
of the variables will shape your algorithm choices and the direction you take in developing a predictive
model, so it’s one of the things you need to pay attention to when you face a new problem.

Things to Notice about Your New Data Set


You’ll want to ascertain a number of other features of the data set as part of your initial inspection of the
data. The following is a checklist and a sequence of things to learn about your data set to familiarize yourself
with the data and to formulate the predictive model development steps that you want to follow. These are
simple things to check and directly impact your next steps. In addition, the process gets you moving around
the data and learning its properties.
 Items to Check
 Number of rows and columns
 Number of categorical variables and number of unique values for each
 Missing values
 Summary statistics for attributes and labels

Page 1 of 63
Unit – 5Python and Data Analytics
Classification Problems: Detecting Unexploded Mines Using Sonar
This section steps through several checks that you might make on a classification problem as you begin
digging into it. It starts with simple measurements of size and shape, reporting data types, counting missing
values, and so forth. Then it moves on to statistical properties of the data and interrelationships between
attributes and between attributes and the labels. The data set comes from the UC Irvine Data Repository
[Ref 1.]. The data result from some experiments to determine if sonar can be used to detect unexploded
mines left in harbors subsequent to military actions. The sonar signal is what’s called a chirped signal. That
means that the signal rises (or falls) in frequency over the duration of the

Physical Characteristics of the Rocks versus Mines Data Set


The first thing to do with a new data set is to determine its size and shape. Listing 2-1 shows code for
determining the size and shape of the “Rocks versus Mines” data set from the UC Irvine Data Repository: the
rocks versus mines data. Later in this chapter, you’ll learn more about this data set, and the book will use it
for example purposes as the algorithms are introduced. The process for determining the number of rows and
columns is pretty simple in this case. The file is comma delimited, with the data for one experiment
occupying oneline of text. This makes it a simple matter to read a line, split it on the commadelimiters, and
stack the resulting lists into an outer list containing the wholedata set.

Statistical Summaries of the Rocks versus Mines Data Set


After determining which attributes are categorical and which are numeric, you’llwant some descriptive
statistics for the numeric variables and a count of theunique categories in each categorical attribute.

Visualization of Outliers Using Quintile‐Quintile Plot


One way to study outliers in more detail is to plot the distribution of the datain question relative to some
reasonable distributions to see whether the relativenumbers match up. The resulting plot showshow the
boundaries associated with empirical percentiles in the data compareto the boundaries for the same
percentiles of a Gaussian distribution. If the databeing analyzed comes from a Gaussian distribution, the
point being plotted willlie on a straight line. That means that thetails of the rocks versus mines data contain
more examples than the tails of aGaussian density.

Statistical Characterization of Categorical Attributes


The process just described applies to numeric attributes. But what about categorical attributes? You want to
check to see how many categories they have and how many examples there are from each category. You
want to learn these things for a couple of reasons. The gender attribute has two possible values (Male and
Female), but if the attribute had been the state of the United States, there would have been 50 possible
categories. As the number of attributes grows, the complexity of dealing with them mounts. Most binary
tree algorithms, which are the basis for ensemble methods, have a cutoff on how many categories they can
handle. The popular Random Forests package written by Breiman and Cutler (the inventors of the algorithm)
has a cutoff of 32 categories. If an attribute has more than 32 categories, you’ll need to aggregate them.

How to Use Python Pandas to Summarize the Rocks versus Mines Data Set
The Python package Pandas can help automate the process of data inspection and handling. It proves
particularly useful for the early stages of data inspection and preprocessing. The Pandas package makes it
possible to read data into a specialized data structure called a data frame. The data frame is modeled after
the CRAN‐R data structure of the same [Link] can think of a data frame as a table or matrix‐like structure
as in Table The data frame is oriented with a row representing a single case (experiment, example,
measurement) and columns representing particular attributes. The structure is matrix‐like, but not a matrix
because the elements in various columns may be of different types. Formally, a matrix is defined over a field
(like the real numbers, binary numbers, complex numbers), and all the entries in a matrix are elements from
that field. For statistical problems, the matrix is too confining because statistical samples typically have a mix
of different types.

Predictive Model Building: Balancing Performance Complexity, and the Big Data
The goal of selecting and fitting a predictive algorithm is to achieve the best possible performance. Achieving
performance goals involves three factors: complexity of the problem, complexity of the algorithmic model
Page 2 of 63
employed, and the amount and richness of the data available. The chapter includes some visual examples
that demonstrate the relationship between problem and model complexity and then provides technical
guidelines for use in design and development.

The Basic Problem: Understanding Function Approximation


The algorithms covered in this book address a specific class of predictive problem. The problem statement
for these problems has two types of variables:
 The variable that you are attempting to predict (for example, whether a visitor to a website will click
an ad)
 Other variables (for example, the visitor’s demographics or past behavior on the site) that you can
use to make the prediction
Problems of this type are referred to as function approximation problems because the goal is to construct a
model generating predictions of the first of these as a function of the second. In a function approximation
problem, the designer starts with a collection of historical examples for which the correct answer is known.
For example, historical web log files will indicate whether a visitor clicked an ad when shown the ad. The
data scientist next has to find other data that can be used to build a predictive model. For example, to
predict whether a site visitor will click an ad, the data scientist might try using other pages that the visitor
viewed before seeing the ad. If the user is registered with the site, data on past purchases or pages viewed
might be available for making a prediction.

Working with Training Data


The data scientist starts algorithm development with a training set. The training set consists of outcome
examples and the assemblage of features chosen by the data scientist. The training set comprises two types
of data:
 The outcomes you want to predict
 The features available for making the prediction
Table provides an example of a training set. The leftmost column contains outcomes (whether a site visitor
clicked a link) and features to be used to make predictions about whether visitors will click the link in the
future.

Assessing Performance of Predictive Models


Good performance means using the attributes xi to generate a prediction that is close to yi, but close has
different meanings for different problems. For a regression problem where yi is a real number, performance
is measured in terms like the mean squared error (MSE) or the mean absolute error (MAE).

Factors Driving Algorithm Choices and Performance Complexity and Data


Several factors affect the overall performance of a predictive algorithm. Amongthese factors are the
complexity of the problem, the complexity of the modelused, and the amount of training data available. The
following sections describehow these factors interrelate to determine performance.

Contrast between a Simple Problem and a Complex Problem


The preceding section of this chapter described several ways to quantify performance and highlighted the
importance of performance on new data. The goal ofdesigning a predictive model is to make accurate
predictions on new examples(such as new visitors to your site). As a practicing data scientist, you will wantan
estimate of an algorithm’s performance so that you can set expectationswith your customer and compare
algorithms with one another. Best practicein predictive modeling requires that you hold out some data from
the training set. These held-out examples have labels associated with them and can becompared to
predictions produced by models training on the remaining [Link] refer to this technique as out-of-
sample error because it is an error ondata not used in training. (The section “Measuring Performance of
Page 3 of 63
Unit – 5Python and Data Analytics
PredictiveModels” later in this chapter goes into more detail about the mechanics of thisprocess.) The
important thing is that the only performance that counts is theperformance of the model when it is run
against new examples.

Contrast between a Simple Model and a Complex Model


The previous section showed visual comparisons between simple and complexproblems. This section
describes how the various models available to solve theseproblems differ from one another. Intuitively, it
seems that a complex modelshould be fit to a complex problem, but the visual example from the last
sectiondemonstrates that data set size may dictate that a simple model fits a complexproblem better than a
complex model.

Another important concept is that modern machine learning algorithmsgenerate families of models, not just
single models. The algorithms covered in this chapter each generate hundreds or even thousands of
different models.

Factors Driving Predictive Algorithm Performance


These results explain the excitement over large
volumes of data. Accurate predictions for
complicated problems require large volumes of
data. But the sizeisn’t quite a precise enough
measure. The shape of the data also matters.
Portrayed predictor data as a matrix having a
number of rows(height) and a number of columns
(width). The number of entries in the matrixis the
product of the number of rows and the number of
columns. An importantdifference exists between
the number of rows and the number of columns
whenthe data are being used for predictive
modeling. Adding a column means adding a new
attribute. Adding a new row means getting an
additional historicalexample of the existing
attributes. To understand how the effects of a new
rowdiffer from the effects of a new column,
consider a linear model relating theattributes from
Equation to the labels of Equation.

Choosing an Algorithm: Linear or Nonlinear?


The visual examples you have just seen give some idea of the performance tradeoffsbetween linear and
nonlinear predictive models. Linear models are preferablewhen the data set has more columns than rows or
when the underlying problemis simple. Nonlinear models are preferable for complex problems with
manymore rows than columns of data. An additional factor is training time. Fast lineartechniques train much
faster than nonlinear techniques.

Choosing a nonlinear model (say an ensemble method) entails training a numberof different models of
differing complexity. For example, the ensemble model thatgenerated the decision boundary in Figure was
one of roughly a thousand different models generated during the training process. These models had a
varietyof different complexities. Some of them would have given a much cruder approximation to the
boundaries that are visually apparent in Figure. The model thatgenerated the decision boundary in Figure 3-
6 was chosen because it performed thebest on out-of-sample data. This process holds for many modern
machine learning algorithms. Examples will be covered in covered in the section “Choosing a Modelto
Balance Problem Complexity, Model Complexity, and Data Set Size.”

Measuring the Performance of Predictive Models


This section covers two broad areas relating to performance measures for predictive models. The first one is
the different metrics that you can use for different types of problems (for example, using MSE for a
Page 4 of 63
regression problemand misclassification error for a classification problem). In the literature (andin machine
learning competitions), you will also see measures like receiveroperating curves (ROC curves) and area under
the curve (AUC). Besides that,these ideas are useful for optimizing [Link] second broad area
consists of techniques for gathering out-of-sampleerror estimates. Recall that out-of-sample errors are
meant to simulate errorson new data. It’s an important part of design practice to use these techniques
tocompare different algorithms and to select the best model complexity for a givenproblem complexity and
data set size. That process is discussed in detail laterin this chapter and is then used in examples.

Performance Measures for Different Types of Problems


Performance measures for regression problems are relatively straightforward. Ina regression problem, both
the target and the prediction are real numbers. Erroris naturally defined as the difference between the
target and the prediction. Itis useful to generate statistical summaries of the errors for comparisons and
fordiagnostics. The most frequently used summaries are the mean squared error(MSE) and the mean
absolute error (MAE). Listing 3-1 compares the calculation of the MSE, MAE, and root MSE (RMSE, which is
the square root of MSE).

Achieving Harmony between Model and Data


This section uses ordinary least squares (OLS) regression to illustrate severalthings. First, it illustrates how
OLS can sometimes overfit a problem. Over fitting means that there’s a significant discrepancy between
errors on the trainingdata and errors on the test data, such as you saw in the previous section where
OLS was used to solve the rocks-versus-mines classification problem. Second,it introduces two methods for
overcoming the overfit problem with OLS. In addition,the methods for overcoming overfitting have a
property that is common to mostmodern machine learning algorithms. Modern algorithms generate a
numberof models of varying complexity and then use out-of-sample performanceto balance model
complexity, problem complexity, and data set richness andthus determine which model to deploy. This
process will be used repeatedly.

Choosing a Model to Balance Problem Complexity,Model Complexity, and Data Set Size
A couple of examples will illustrate how modern machine learning techniquescan be tuned to best fit a given
problem and data set. The first example is amodification to ordinary least squares regression called forward
stepwiseregression. Here’s how it works. Recall Equations 3-1 and 3-2, which define theproblem being
solved (see Equations 3-10 and 3-11 here, which repeat thoseequations). The vector Y contains the labels.
And the matrix X contains theattributes available to predict the labels.

Page 5 of 63

You might also like