0% found this document useful (0 votes)
10 views199 pages

Python

The document provides an overview of Python programming, covering key concepts such as data types, variables, comments, and operators. It explains how to create and manipulate variables, use Python libraries, and implement basic syntax for coding. Additionally, it highlights the importance of comments and identifiers in programming for better code documentation and readability.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views199 pages

Python

The document provides an overview of Python programming, covering key concepts such as data types, variables, comments, and operators. It explains how to create and manipulate variables, use Python libraries, and implement basic syntax for coding. Additionally, it highlights the importance of comments and identifiers in programming for better code documentation and readability.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

[Link]

com/jobs/india-improves-skills-proficiency-in-technology-but-lags-in-data-science-
coursera-report/articleshow/[Link]
Data Science: is a branch of computer science where we study how to
store, use and analyze data for deriving information from it.
❖ A Python library is a collection of related modules. It contains bundles of code that can be used repeatedly in

different programs.

❖ It makes Python Programming simpler and convenient for the programmer.

❖ As we don't need to write the same code again and again for different programs.

❖ Python libraries are used to create applications and models in a variety of fields, for instance, machine
learning, data science, data visualization, image and data manipulation, and many more.
PYTHON BASIC SYNTAX
INTRODUCTION
COMMENTS…
Python Comments
• Python comments are strings that begin with the # (hash/pound sign).

• They are used to document code and to help other programmers understand the same.

• You can use Python comments inline, on independent lines, or on multiple lines to
include larger documentation.

• These comments are statements that are not part of your program.

• For this reason, comment statements are skipped while executing your program.

• Usually we use comments for making brief notes about a chunk of code.
Example
>>> #this is a Python comment.
>>> #print("I will not be executed")

>>> print("I will be executed")

Output:
Different Types of Python Comments

• In Python there are two types of comments :


• Single line comments
• Multiple lines comments.
Single line comments
• Single line commenting is commonly used for a brief
and quick comment (or to debug a program).

• Example :
Multiple Lines Comments
• Multiple lines comments to note down something
much more in details or to block out an entire chunk
of code.
• Multiple lines comments are slightly different.
• Simply use 3 single quotes before and after the part
you want to be commented.
• Example:
VARIABLES…
PYTHON VARIABLES
• Variables are nothing but reserved memory locations
to store values.
• This means that when you create a variable you
reserve some space in memory.
• Based on the data type of a variable, the interpreter
allocates memory and decides what can be stored in
the reserved memory.
• Therefore, by assigning different data types to
variables, you can store integers, decimals or
characters in these variables.
Valid and Invalid Identifiers
• Variables are the example of identifiers.
• An Identifier is used to identify the literals used in the program.

• The rules to name an identifier are given below:


• The first character of the variable must be an alphabet or underscore ( _ ).
• All the characters except the first character may be an alphabet of lower-
case(a-z), upper-case (A-Z), underscore, or digit (0-9).
• Identifier name must not contain any white-space, or special character (!,
@, #, %, ^, &, *).
• Identifier name must not be similar to any keyword defined in the
language.
• Identifier names are case sensitive; for example, myname, and MyName is
not the same.

• Examples of valid identifiers: a123, _n, n_9, etc.


• Examples of invalid identifiers: 1a, n%4, n 9, etc.
Assigning Values to Variables
• Python variables do not need explicit declaration to
reserve memory space.
• The declaration happens automatically when you
assign a value to a variable.
• The equal sign (=) is used to assign values to
variables.
• The operand to the left of the = operator is the name
of the variable and the operand to the right of the =
operator is the value stored in the variable.
Example
>>> count = 780 # An integer assignment
>>> miles = 5300.0 # A floating point
>>> name = “Simplilearn" # A string
>>> print count
>>> print miles
>>> print name
Output :
780
5300.0
NIELIT
Examples
# No need of declaring a variable.
a = b = c = 15 #Multiple assignment

# Two integer one string


a,b,c = 1,2,“nielit”

#To delete a variable


Syntax : del var
Example : del var_a, var_b
Variables
Variables are containers for storing data values.

Creating Variables
Python has no command for declaring a variable.
A variable is created the moment you first assign a value to it.
Variable Names
•A variable can have a short name (like x and y) or a more descriptive name (age, car
name, total_volume). Rules for Python variables:A variable name must start with a letter or
the underscore character
•A variable name cannot start with a number
•A variable name can only contain alpha-numeric characters & underscores (A-z, 0-9,& _ )
•Variable names are case-sensitive (age, Age and AGE are three different variables)

Example
Legal variable names: Example
myvar = "John" Illegal variable names:
my_var = "John" 2myvar = "John"
_my_var = "John" my-var = "John"
myVar = "John" my var = "John"
MYVAR = "John"
myvar2 = "John"
Many Values to Multiple Variables

Example
x, y, z = "Orange", "Banana", "Cherry"
print(x)
print(y)
Python Variables - Assign Multiple Values print(z)

One Value to Multiple Variables

fruits=["apple","banana","cherry"]
x, y, z = fruits
print(x)
print(y)
print(z)

Unpack a Collection

x = y = z = "Orange"
print(x)
print(y)
print(z)
x = "Python is awesome"
print(x)
Python - Output Variables

x = "Python"
The Python print() function is often used to output variables
y = "is"
z = "awesome"
print(x, y, z)

For numbers, the + character works as a mathematical


operator:

x = 5
y = 10
print(x + y)

In the print() function, when you try to combine a string and a


number with the + operator, Python will give you an error:

x = 5
y = "John"
print(x + y)
Global variables

Variables that are created outside of a function are


Create a variable outside of a function, and use it inside
known as global variables. the function

Global variables can be used by everyone, both


inside of functions and outside. x = "awesome"

def myfunc():
print("Python is " + x)

myfunc()
Python Data Types
In programming, data type is an important concept.
Variables can store data of different types, and different types can do different things.

Python has the following data types built-in by


default, in these categories:

Print the data type of the variable x:


x = 5
print(type(x))
Python Numbers
There are three numeric types in Python:
•int x = 1 # int print(type(x))
•float y = 2.8 # float print(type(y))
z = 1j # complex print(type(z))
•complex
Int
Int, or integer, is a whole number, positive or negative, without decimals, of unlimited length.

x = 1 print(type(x))
y = 35656222554887711 print(type(y))
z = -3255522 print(type(z))
Float
Float, or "floating point number" is a number, positive or negative, containing one or more decimals.

x = 1.10 print(type(x))
y = 1.0 print(type(y))
Import the random module, and display a
z = -35.59 print(type(z)) random number between 1 and 9:
Complex import random
Complex numbers are written with a "j" as the imaginary part:

print([Link](1, 10))
x = 3+5j print(type(x))
y = 5j print(type(y))
z = -5j print(type(z))
Python Strings
Strings in python are surrounded by either single quotation marks, or double quotation marks.
'hello' is the same as "hello". print("Hello")
You can display a string literal with the print() function: print('Hello')

Assign String to a Variable


Assigning a string to a variable is done with the variable name followed by an equal sign and the string:
a = "Hello"
Multiline Strings print(a)
You can assign a multiline string to a variable by using three quotes: a = ""“360 TMG."""
print(a)

a = ’‘’data science.'''
print(a)
String Length
To get the length of a string, use the len() function. a = "Hello, World!"
print(len(a))
Check String
To check if a certain phrase or character is present in a string, we can use the keyword in.

txt = "The best things in life are free!"


print("free" in txt)
Check if NOT
To check if a certain phrase or character is NOT present in a string, we can use the keyword not in.

txt = "The best things in life are free!"


print("expensive" not in txt)

Use it in an if statement:
txt = "The best things in life are free!"
if "expensive" not in txt:
print("No, 'expensive' is NOT present.")
Python - Slicing Strings
You can return a range of characters by using the slice syntax.
Specify the start index and the end index, separated by a colon, to return a part of the string.

Get the characters from position 2 to position 5


(not included):
b = "Hello, World!"
print(b[2:5])
Get the characters from the start to position 5 (not
included):
b = "Hello, World!"
print(b[:5])
Get the characters from position 2, and all the way
to the end:
b = "Hello, World!"
print(b[2:])
Negative Indexing
Use negative indexes to start the slice from the end of the string:

From: "o" in "World!" (position -5)


To, but not included: "d" in "World!" (position -2):
b = "Hello, World!"
print(b[-5:-2])
Python - Modify Strings
Upper Case Lower Case
a = "Hello, World!" a = "Hello, World!"
print([Link]()) print([Link]())

Remove Whitespace Replace String


a = " Hello, World! " a = "Hello, World!"
print([Link]()) print([Link]("H", "J"))

Split String

a = "Hello, World!"
print([Link](","))
Python - Format - Strings
As we learned in the Python Variables chapter, we cannot
combine strings and numbers like this:

age = 36
txt = "My name is John, I am " + age
print(txt)

Use the format() method to insert numbers into strings:

age = 36
txt = "My name is John, and I am {}"
print([Link](age))

quantity = 3
item = 567
price = 49.95
myorder = "I want {} pieces of item {} for {}
dollars."
print([Link](quantity, item, price))
Python Booleans
Boolean Values
Booleans represent one of two values: True or False.

In programming you often need to know if an expression is True or False.

You can evaluate any expression in Python, and get one of two answers, True or False.

When you compare two values, the expression is evaluated and Python returns the Boolean answer:
4.a = 200
print(10 > 9)
b = 33
print(10 == 9)
print(10 < 9)
if b > a:print("b is greater than a")
else: print("b is not greater than a")

Evaluate Values and Variables Some Values are False


The bool() function allows you to evaluate any In fact, there are not many values that evaluate to False, except
value, and give you True or False in return, empty values, such as (), [], {}, "", the number 0, and the
value None. And of course the value False evaluates to False.
1. print(bool("Hello"))
print(bool(15))
bool(False)
bool(None)
2. x = "Hello" bool(0)
y = 15 bool("")
bool(())
print(bool(x)) bool([])
print(bool(y)) bool({})

[Link]("abc")
bool(123)
bool(["apple", "cherry", "banana"])
Python Operators
Operators are used to perform operations on variables and values.

Python divides the operators in the following groups:

•Arithmetic operators

•Assignment operators

•Comparison operators

•Logical operators

•Identity operators

•Membership operators

•Bitwise operators
Python Arithmetic Operators
Arithmetic operators are used with numeric values to perform common mathematical operations:
Python Comparison Operators
Comparison operators are used to compare two values:
Python Lists
Lists are used to store multiple items in a single variable.
Lists are one of 4 built-in data types in Python used to store collections of data, the other 3
are Tuple, Set, and Dictionary, all with different qualities and usage.
Lists are created using square brackets:

1.t = ["apple", "banana", "cherry"] thislist = ["apple", "banana", "cherry"]


print(t) print(thislist[1])

Negative Indexing
Negative indexing means start from the end
-1 refers to the last item, -2 refers to the second last item etc.

thislist = ["apple", "banana", "cherry"]


print(thislist[-1])

thislist = ["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]


print(thislist[:4])

thislist = ["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]


print(thislist[2:])
thislist = ["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]
print(thislist[-4:-1])

thislist = ["apple", "banana", "cherry"]


if "apple" in thislist:
print("Yes, 'apple' is in the fruits list")

Range of Indexes
You can specify a range of indexes by specifying where to start and where to end the range.
When specifying a range, the return value will be a new list with the specified items.

thislist = ["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]


print(thislist[2:5])
Python - Change List Items
To change the value of a specific item, refer to the index number:

thislist = ["apple", "banana", "cherry"]


thislist[1] = "blackcurrant"
print(thislist)

thislist = ["apple", "banana", "cherry", "orange", "kiwi", "mango"]


thislist[1:3] = ["blackcurrant", "watermelon"]
print(thislist)
Python - Add List Items
To add an item to the end of the list, use the append() method:

thislist = ["apple", "banana", "cherry"]


[Link]("orange")
print(thislist)

To insert a list item at a specified index, use the insert() method. The insert() method inserts an item at the
specified index:

Insert an item as the second position:

thislist = ["apple", "banana", "cherry"]


[Link](1, "orange")
print(thislist)
Extend List
To append elements from another list to the current list, use the extend() method.

thislist = ["apple", "banana", "cherry"]


new = ["mango", "pineapple", "papaya"]
[Link](new)
print(thislist)
Python - Remove List Items
Remove Specified Item
The remove() method removes the specified item.

thislist = ["apple", "banana", "cherry"]


[Link]("banana")
print(thislist)

Remove Specified Index


The pop() method removes the specified index.

thislist = ["apple", "banana", "cherry"]


[Link](1)
print(thislist)

The del keyword also removes the specified index:

thislist = ["apple", "banana", "cherry"] Clear the List


The clear() method empties the list.
del thislist[0]
print(thislist) thislist = ["apple", "banana", "cherry"]
[Link]()
print(thislist)
Python - Loop Lists
you can loop through the list items by using a for loop
Matplotlib
What is Matplotlib?

Matplotlib is a low level graph plotting library in python that serves as a visualization utility.

Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C, Objective-C and

Javascript for Platform compatibility.


Pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under

the plt alias:

import [Link] as plt


Draw a line in a diagram from position (0,0) to position (6,250):

import [Link] as plt


import numpy as np

xpoints = [Link]([0, 6])


ypoints = [Link]([0, 250])

[Link](xpoints, ypoints)
[Link]()
Markers
You can use the keyword argument marker to emphasize each point with a specified marker:

Mark each point with a circle:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, marker = 'o')


[Link]()
Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, linestyle = 'dotted')


[Link]()

[Link](ypoints, linestyle = 'dashed')

Shorter Syntax
The line style can be written in a shorter syntax:
linestyle can be written as ls.
dotted can be written as :.
dashed can be written as --.
Create Labels for a Plot
With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the x- and y-axis.

import numpy as np
import [Link] as plt

x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

[Link](x, y)

[Link]("Average Pulse")
[Link]("Calorie Burnage")

[Link]()
Create a Title for a Plot
With Pyplot, you can use the title() function to set a title for the plot.

import numpy as np
import [Link] as plt

x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

[Link](x, y)

[Link]("Sports Watch Data")


[Link]("Average Pulse")
[Link]("Calorie Burnage")

[Link]()
Specify Which Grid Lines to Display
You can use the axis parameter in the grid() function to specify which grid lines to display.
Legal values are: 'x', 'y', and 'both'. Default value is 'both'.
import numpy as np
import [Link] as plt

x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

[Link]("Sports Watch Data")


[Link]("Average Pulse")
[Link]("Calorie Burnage")

[Link](x, y)

[Link](axis = 'x')

[Link]()
Display only grid lines for the y-axis:

import numpy as np
import [Link] as plt

x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

[Link]("Sports Watch Data")


[Link]("Average Pulse")
[Link]("Calorie Burnage")

[Link](x, y)

[Link](axis = 'y')

[Link]()
Matplotlib Scatter
Creating Scatter Plots
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same length, one for the values of the x-axis,
and one for values on the y-axis:

annot – an array of the same shape as data which is used to annotate the heatmap.
A simple scatter plot:

import [Link] as plt


import numpy as np

x = [Link]([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = [Link]([99,86,87,88,111,86,103,87,94,78,77,85,86])

[Link](x, y)
[Link]()
Compare Plots
In the example above, there seems to be a relationship between speed and age, but what if we plot
the observations from another day as well? Will the scatter plot tell us something else?

import [Link] as plt


import numpy as np

#day one, the age and speed of 13 cars:


x = [Link]([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = [Link]([99,86,87,88,111,86,103,87,94,78,77,85,86])
[Link](x, y)

#day two, the age and speed of 15 cars:


x = [Link]([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = [Link]([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
[Link](x, y)
[Link]()
Colors
You can set your own color for each scatter plot with the color or the c argument:

import [Link] as plt


import numpy as np

x = [Link]([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = [Link]([99,86,87,88,111,86,103,87,94,78,77,85,86])
[Link](x, y, color = 'hotpink')

x = [Link]([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = [Link]([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
[Link](x, y, color = '#88c999')

[Link]()
Color Each Dot
You can even set a specific color for each dot by using an array of colors as value for the c argument:

import [Link] as plt


import numpy as np

x = [Link]([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = [Link]([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
[Link](["red","green","blue","yellow","pink","black","orange","purple","beige","brown","g
ray","cyan","magenta"])

[Link](x, y, c=colors)

[Link]()
Color Map
-The Matplotlib module has a number of available colormaps.
-A colormap is like a list of colors, where each color has a value that ranges from 0 to 100.
-This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple color, and
up to 100, which is a yellow color.

How to Use the Color Map


-You can specify the colormap with the keyword argument cmap with the value of the colormap, in this
case 'viridis' which is one of the built-in colormaps available in Matplotlib.
-In addition you have to create an array with values (from 0 to 100), one value for each of the point in the scatter
plot:
import [Link] as plt
import numpy as np

x = [Link]([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = [Link]([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = [Link]([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])

[Link](x, y, c=colors, cmap='viridis')

[Link]()

[Link]()
Size
You can change the size of the dots with the s argument.
Just like colors, make sure the array for sizes has the same length as the arrays for the x- and y-axis:

import [Link] as plt


import numpy as np

x = [Link]([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = [Link]([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = [Link]([20,50,100,200,500,1000,60,90,10,300,600,800,75])

[Link](x, y, s=sizes)

[Link]()
Alpha
You can adjust the transparency of the dots with the alpha argument.
Just like colors, make sure the array for sizes has the same length as the arrays for the x- and y-axis:

import [Link] as plt


import numpy as np

x = [Link]([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = [Link]([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = [Link]([20,50,100,200,500,1000,60,90,10,300,600,800,75])

[Link](x, y, s=sizes, alpha=0.5)

[Link]()
Combine Color Size and Alpha
You can combine a colormap with different sizes on the dots. This is best visualized if the dots are
transparent:
Create random arrays with 100 values for x-points, y-points, colors and sizes:

import [Link] as plt


import numpy as np

x = [Link](100, size=(100))
y = [Link](100, size=(100))
colors = [Link](100, size=(100))
sizes = 10 * [Link](100, size=(100))

[Link](x, y, c=colors, s=sizes, alpha=0.5, cmap='nipy_spectral')

[Link]()
[Link]()
Color Map
The Matplotlib module has a number of available colormaps.

A colormap is like a list of colors, where each color has a value that ranges from 0 to

100.

This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple
color, and up to 100, which is a yellow color.
Matplotlib Histograms
Histogram
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
Example: Say you ask for the height of 250 people, you might end up with a histogram like this:

Create Histogram
-In Matplotlib, we use the hist() function to create histograms.
-The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
-For simplicity we use NumPy to randomly generate an array with 250 values, where the values will
concentrate around 170, and the standard deviation is 10.

[Link] numpy as np 2. import [Link] as plt


import numpy as np
x = [Link](170, 10, 250)
x = [Link](170, 10, 250)
print(x)
[Link](x)
Matplotlib Bars
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:

[Link] [Link] as plt 2.x = ["APPLES", "BANANAS"]


import numpy as np y = [400, 350]
[Link](x, y)
x = [Link](["A", "B", "C", "D"])
y = [Link]([3, 8, 1, 10])

[Link](x,y)

Horizontal Bars
If you want the bars to be displayed horizontally instead of vertically, use the barh() function:

import [Link] as plt


import numpy as np

x = [Link](["A", "B", "C", "D"])


y = [Link]([3, 8, 1, 10])

[Link](x, y)
Bar Color
The bar() and barh() takes the keyword argument color to set the color of the bars:

import [Link] as plt


import numpy as np

x = [Link](["A", "B", "C", "D"])


y = [Link]([3, 8, 1, 10])

[Link](x, y, color = "red")

Color Names

import [Link] as plt


import numpy as np

x = [Link](["A", "B", "C", "D"])


y = [Link]([3, 8, 1, 10])

[Link](x, y, color = "hotpink")


Color Hex
import [Link] as plt
import numpy as np [Link]

x = [Link](["A", "B", "C", "D"])


y = [Link]([3, 8, 1, 10])

[Link](x, y, color = "#4CAF50")

Bar Width
The bar() takes the keyword argument width to set the width of the bars:

import [Link] as plt


import numpy as np

x = [Link](["A", "B", "C", "D"])


y = [Link]([3, 8, 1, 10])

[Link](x, y, width = 0.1)


Bar Height
The barh() takes the keyword argument height to set the height of the bars:

import [Link] as plt


import numpy as np

x = [Link](["A", "B", "C", "D"])


y = [Link]([3, 8, 1, 10])

[Link](x, y, height = 0.1)


Matplotlib Pie Charts
Creating Pie Charts
With Pyplot, you can use the pie() function to draw pie charts:

import [Link] as plt


import numpy as np

y = [Link]([35, 25, 25, 15])

[Link](y)
[Link]()

Labels
Add labels to the pie chart with the label parameter. The label parameter must be an array with one label for each wedge:

import [Link] as plt


import numpy as np

y = [Link]([35, 25, 25, 15])


mylabels =
["Apples", "Bananas", "Cherries", "Dates"]

[Link](y, labels = mylabels)


[Link]()
Explode
Maybe you want one of the wedges to stand out? The explode parameter allows you to do that.
The explode parameter, if specified, and not None, must be an array with one value for each wedge.
Each value represents how far from the center each wedge is displayed:

import [Link] as plt


import numpy as np

y = [Link]([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]

[Link](y, labels = mylabels, explode = myexplode)


[Link]()
Pandas
What is Pandas?
-Pandas is a Python library used for working with data sets.
-It has functions for analyzing, cleaning, exploring, and manipulating data.
-The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.
Why Use Pandas?
-Pandas allows us to analyze big data and make conclusions based on statistical theories.
-Pandas can clean messy data sets, and make them readable and relevant.
-Relevant data is very important in data science.

What Can Pandas Do?


Pandas gives you answers about the data. Like:
•Is there a correlation between two or more columns?
•What is average value?
•Max value? Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.
Pandas Series
What is a Series?
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.

Create a simple Pandas Series from a list:

import pandas as pd

a = [1, 7, 2]

myvar = [Link](a)

print(myvar)

The type int64 tells us that Python is storing each value within this column as a 64 bit integer.
Labels
❖ If nothing else is specified, the values are labeled with their index number. First value has index

0, second value has index 1 etc.

❖ This label can be used to access a specified value.

print(myvar[0])
Create Labels
With the index argument, you can name your own labels.

import pandas as pd

a = [1, 7, 2]

myvar = [Link](a, index = ["x", "y", "z"])

print(myvar)

When you have created labels, you can access an item by referring to the label.
Return the value of "y":
print(myvar["y"])
Key/Value Objects as Series
You can also use a key/value object, like a dictionary, when creating a Series.

import pandas as pd

calories =
{"day1": 420, "day2": 380, "day3": 390}

myvar = [Link](calories)

print(myvar)
Pandas DataFrames
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
Series is like a column, a DataFrame is the whole table.
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with
rows and columns.

Create a simple Pandas DataFrame:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:


df = [Link](data)

print(df)
Locate Row
The DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)

print([Link][0])

print([Link][[0, 1]])

Named Indexes
With the index argument, you can name your own indexes.

import pandas as pd

data = {"calories": [420, 380, 390],"duration": [50, 40, 45]}

df = [Link](data, index = ["day1", "day2", "day3"])

print(df)
Locate Named Indexes
Use the named index in the loc attribute to return the specified row(s).

print([Link]["day2"])
Pandas Read CSV
-A simple way to store big data sets is to use CSV files (comma separated files).
-CSV files contains plain text and is a well know format that can be read by everyone including
Pandas.
-Using a CSV file called '[Link]’.

Load the CSV into a DataFrame: Print the DataFrame without the to_string() method:

import pandas as pd import pandas as pd

df = pd.read_csv('[Link]') df = pd.read_csv('[Link]')

print(df.to_string()) print(df)

use to_string() to print the entire DataFrame.


If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the last 5 rows:
max_rows
The number of rows returned is defined in Pandas option settings.
You can check your system's maximum rows with the [Link].max_rows statement.

Check the number of maximum returned rows:


import pandas as pd
print([Link].max_rows)

In my system the number is 60, which means that if the DataFrame contains more than 60 rows, the print(df) statement will return only the headers
and the first and last 5 rows.
You can change the maximum rows number with the same statement.

Increase the maximum number of rows to display the entire DataFrame:


import pandas as pd
[Link].max_rows = 9999
df = pd.read_csv('[Link]')
print(df)
Pandas Read JSON
-Big data sets are often stored, or extracted as JSON.
-JSON is plain text, but has the format of an object, and is well known in the world of programming,
including Pandas.
-In our examples we will be using a JSON file called '[Link]'.

Data: [Link]

If your JSON code is not in a file, but in a Python Dictionary, you can load it into a DataFrame directly:
Pandas - Analyzing DataFrames
Viewing the Data
One of the most used method for getting a quick overview of the DataFrame, is the head() method.
The head() method returns the headers and a specified number of rows, starting from the top.

There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows, starting from the bottom.

Info About the Data


The DataFrames object has a method called info(), that gives you more information about the data set.
Pandas - Cleaning Data
Data cleaning means fixing bad data in your data set.

Bad data could be:

•[Link] cells

•[Link] in wrong format

•[Link] data

•[Link]
The data set contains some empty cells ("Date" in row 22, and "Calories" in row 18 and 28).

The data set contains wrong format ("Date" in row 26).

The data set contains wrong data ("Duration" in row 7).

The data set contains duplicates (row 11 and 12).


[Link] - Cleaning Empty Cells
Empty Cells
Empty cells can potentially give you a wrong result when you analyze data.

Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.
This is usually OK, since data sets can be very big, and removing a few rows will not have a big impact on the result.

import pandas as pd

df = pd.read_csv(‘[Link]')

new_df = [Link]()

print(new_df.to_string())

the dropna() method returns a new DataFrame, and will not change the original.
If you want to change the original DataFrame, use the inplace = True argument:

Remove all rows with NULL values:


import pandas as pd

df = pd.read_csv('[Link]')

[Link](inplace = True)

print(df.to_string())

the dropna(inplace = True) will NOT return a new DataFrame, but it will remove all rows containing NULL values from the original DataFrame.
Replace Empty Values
Another way of dealing with empty cells is to insert a new value instead.

This way you do not have to delete entire rows just because of some empty cells.

The fillna() method allows us to replace empty cells with a value:

Replace NULL values with the number 130:


import pandas as pd

df = pd.read_csv('[Link]')

[Link](130, inplace = True)


Replace Only For Specified Columns
The example above replaces all empty cells in the whole Data Frame.
To only replace empty values for one column, specify the column name for the DataFrame:

Replace NULL values in the "Calories" columns with the number 130:
import pandas as pd

df = pd.read_csv('[Link]')

df["Calories"].fillna(130, inplace = True)


Replace Using Mean, Median, or Mode
A common way to replace empty cells, is to calculate the mean, median or mode value of the column.
Pandas uses the mean() median() and mode() methods to calculate the respective values for a specified column:

Calculate the MEAN, and replace any empty values Calculate the MEDIAN, and replace any empty
with it: values with it:
import pandas as pd import pandas as pd

df = pd.read_csv('[Link]') df = pd.read_csv('[Link]')

x = df["Calories"].mean() x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True) df["Calories"].fillna(x, inplace = True)


Calculate the MODE, and replace any empty values
with it:
import pandas as pd

df = pd.read_csv('[Link]')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)


2. Pandas - Cleaning Data of Wrong Format

Data of Wrong Format


Cells with data of wrong format can make it difficult, or even impossible, to analyze data.
To fix it, you have two options: remove the rows, or convert all cells in the columns into the same format.

Convert Into a Correct Format


In our Data Frame, we have two cells with the wrong format. Check out row 22 and 26, the 'Date' column should be a string that
represents a date:
To convert all cells in the 'Date' column into dates.
Pandas has a to_datetime() method for this:

Convert to date:
import pandas as pd

df = pd.read_csv('[Link]')

df['Date'] = pd.to_datetime(df['Date'])

print(df.to_string())

the date in row 26 was fixed, but the empty date in row 22 got a NaT (Not a Time) value, in other words an empty value. One way
to deal with empty values is simply removing the entire row.
Removing Rows
The result from the converting in the example above gave us a NaT value, which can be handled as a NULL
value, and we can remove the row by using the dropna() method.

Remove rows with a NULL value in the "Date" column:

[Link](subset=['Date'], inplace = True)


[Link] - Fixing Wrong Data
Wrong Data
-"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong,

like if someone registered "199" instead of "1.99".

-Sometimes you can spot wrong data by looking at the data set, because you have an

expectation of what it should be.

-If you take a look at our data set, you can see that in row 7, the duration is 450, but for

all the other rows the duration is between 30 and 60.

-It doesn't have to be wrong, but taking in consideration that this is the data set of

someone's workout sessions, we conclude with the fact that this person did not work out in

450 minutes.
How can we fix wrong values, like the one for "Duration" in row 7?

Replacing Values
-One way to fix wrong values is to replace them with something else.
-In our example, it is most likely a typo, and the value should be "45" instead of "450", and we could
just insert "45" in row 7:

Set "Duration" = 45 in row 7:


[Link][7, 'Duration'] = 45

Loop through all values in the "Duration" column.


If the value is higher than 120, set it to 120:
for x in [Link]:
if [Link][x, "Duration"] > 120:
[Link][x, "Duration"] = 120
Removing Rows
-Another way of handling wrong data is to remove the rows that contains wrong data.
-This way you do not have to find out what to replace them with, and there is a good chance you do
not need them to do your analyses.

Delete rows where "Duration" is higher than 120:


for x in [Link]:
if [Link][x, "Duration"] > 120:
[Link](x, inplace = True)
4. Pandas - Removing Duplicates

Discovering Duplicates
Duplicate rows are rows that have been registered more than one time.

By taking a look at our test data set, we can assume that row 11 and 12 are duplicates.

To discover duplicates, we can use the duplicated() method.

The duplicated() method returns a Boolean values for each row:

Returns True for every row that is a duplicate, otherwise False:

print([Link]())
Removing Duplicates
To remove duplicates, use the drop_duplicates() method

Remove all duplicates:


df.drop_duplicates(inplace = True)
Important Methods Pandas Packages
NumPy
What is NumPy?
NumPy is a Python library used for working with arrays.

It also has functions for working in domain of linear algebra, fourier transform, and matrices.

Mainly used for multidimensional (2D,3D) arrays & This is alternative for matlab

NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

NumPy stands for Numerical Python.

Why Use NumPy?


In Python we have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very important.
Why is NumPy Faster Than Lists?
❖ NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access

and manipulate them very efficiently.

❖ This behavior is called locality of reference in computer science.

❖ This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU

architectures.
NumPy Creating Arrays
Create a NumPy ndarray Object
NumPy is used to work with arrays. The array object in NumPy is called ndarray.

We can create a NumPy ndarray object by using the array() function.

Example

import numpy as np

arr = [Link]((1, 2, 3, 4, 5))

print(arr)
A Python Array is a collection of common type of data structures having elements with same data type.

It is used to store collections of data. In Python programming, an arrays are handled by the “array” module.

If you create arrays using the array module, elements of the array
must be of the same numeric type.

You can insert different types of data in it. Like integer, floating,
list, tuple, string, etc.
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).

0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.

Example
Create a 0-D array with value 42
import numpy as np

arr = [Link](42)

print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
These are the most common and basic arrays.
Example

import numpy as np

arr = [Link]([1, 2, 3, 4, 5])

print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.

Example
import numpy as np
arr = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.

import numpy as np

arr = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr)
Scikit-learn
What is scikit-learn used for?
Scikit-learn is probably the most useful library for machine learning in Python. The sklearn library contains a lot of
efficient tools for machine learning and statistical modeling including classification, regression, clustering and
dimensionality reduction.

Important features of scikit-learn:

•Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering

algorithms including support vector machines, random forests, gradient boosting, k-means, etc.

•Accessible to everybody and reusable in various contexts.

•Built on the top of NumPy, SciPy, and matplotlib.


DATA TYPES…
Python Data Types
• Data types are the classification or categorization of
data items.
• It represents the kind of value that tells what
operations can be performed on a particular data.
• Since everything is an object in Python programming,
data types are actually classes and variables are
instance (object) of these classes.
• A variable can hold different types of values.
• For example, a person's name must be stored as a
string whereas its id must be stored as an integer.
• Python provides various standard data types that
define the storage method on each of them.
Standard Data Types
1. Python Numeric Data Type
• Python numeric data type is used to hold numeric values like;
• Int – holds signed integers of non-limited length.
• Long - holds long integers(exists in Python 2.x, deprecated in Python 3.x).
• Float - holds floating precision numbers and it’s accurate up to 15 decimal places.
• Complex - holds complex numbers.
• In Python, we need not declare a datatype while declaring a variable like C
or C++.
• We can simply just assign values in a variable.
• But if we want to see what type of numerical value is it holding right now,
we can use type().
Example

• Output :
Examples
int long float complex
10 51924361L 0.0 3.14j
100 -0x19323L 15.20 45.j
-786 0122L -21.9 9.322e-36j
080 0xDEFABCECBDAECBF 32.3+e18 .876j
BAEl
-0490 535633629843L -90. -.6545+0J

-0x260 -052318172735L -32.54e100 3e+26J

0x69 -4721885298529L 70.2-E12 4.53e-7j


[Link]
• Boolean type provides two built-in values, True and
False.
• These values are used to determine the given statement
true or false.
• It denotes by the class bool.
• True can be represented by any non-zero value or 'T'
whereas false can be represented by the 0 or 'F'.

Example : Output:
[Link]
• Set is an unordered collection of
unique items.
• Set is defined by values
separated by comma inside
braces { }.
• Items in a set are not ordered.
• It is iterable, mutable(can
modify after creation), and has
unique elements.
• In set, the order of the elements
is undefined; it may return the
changed sequence of the
element.
• It can contain various types of
values.
Set – unique values
• We can perform set operations like union, intersection
on two sets.
• Sets have unique values.
• They eliminate duplicates.
Set - indexing

• Since, set are unordered collection, indexing has no meaning.


• Hence, the slicing operator [] does not work.
5. Sequence types
i. STRINGS

• The string can be defined as the sequence of characters


represented in the quotation marks.
• In Python, we can use single, double, or triple quotes to
define a string.
• String handling in Python is a straightforward task since
Python provides built-in functions and operators to
perform operations in the string.
• In the case of string handling, the operator + is used to
concatenate two strings as the operation "hello"+"
python" returns "hello python".
• The operator * is known as a repetition operator as the
operation "Python" *2 returns 'Python Python'.
Some Methods
strip()
✓The strip() method removes any whitespace from the
beginning or the end:
✓a = ” Hello”
✓print([Link]())
Some Methods
len()
✓The len() method returns the length of a string:
>>> a = "Hello, World!"
>>> print(len(a))

lower()
✓The lower() method returns the string in lower case:
>>>a = “Hello, World!”
>>>print([Link]())
>>>print([Link]())

replace()
• The replace() method replaces a string with another string:
>>> print([Link]("H", "J"))
Some Methods
split()
✓The split() method splits the string into substrings if it finds
instances of the separator:
>>> a = "Hello,World!"
>>> print([Link](","))
# returns ['Hello', ' World!']
Array
Python Comments
Machine Learning
Data everywhere!
1. Google: processes 24 peta bytes of data per day.

2. Facebook: 10 million photos uploaded every hour.

3. Youtube: 1 hour of video uploaded every second.

4. Twitter: 400 million tweets per day.

5. Astronomy: Satellite data is in hundreds of PB.

6. : : :

7. “By 2020 the digital universe will reach 44 zettabytes..."

That's 44 trillion gigabytes!


Data types
• Data comes in different sizes and also flavors (types):
• Texts
• Numbers
• Clickstreams
• Graphs
• Tables
• Images
• Transactions
• Videos
• Some or all of the above!
Smile, we are 'DATAFIED'!

• Wherever we go, we are datafied".

• Smartphones are tracking our locations.

• We leave a data trail in our web browsing.

• Interaction in social networks.

• Privacy is an important issue in Data


Science.
The Data Science process
Why Machine Learning
If we stored the data generated in a day on Blu-ray disks and stacked them up, it would be equal to the height

of four Eiffel towers. Machine learning helps analyze this data easily and quickly.

Machine learning
Purpose of Machine Learning
Machine learning is a great tool to analyze data, find hidden data patterns and relationships, and extract information

to enable information-driven decisions and provide insights.

Identify patterns and relationships

Data
Gain insights into unknown data

Take information-driven decisions


Applications of ML
• Spam filtering

• Credit card fraud detection

• Digit recognition on checks, zip codes

• Detecting faces in images

• MRI image analysis

• Recommendation system

• Search engines

• Handwriting recognition

• Scene classication

• etc...
Raw Mango vs. Ripen Mango
SUPERVISED LEARNING
Supervised Learning
UNSUPERVISED LEARNING
Unsupervised Learning
TYPES OF SUPERVISED LEARNING
BINARY CLASSIFICATION
MULTICLASS CLASSIFICATION
REGRESSION

You might also like