0% found this document useful (0 votes)
8 views19 pages

Python Installation for Machine Learning

The document provides a comprehensive guide on installing Python and its ecosystem for machine learning, detailing methods such as individual installation and using Anaconda. It highlights key libraries like NumPy, Pandas, and Scikit-learn, explaining their functionalities and installation processes. Additionally, it introduces Jupyter Notebook as an essential tool for data science applications, outlining its features and types of cells.

Uploaded by

Kalighat Okira
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Python Installation for Machine Learning

The document provides a comprehensive guide on installing Python and its ecosystem for machine learning, detailing methods such as individual installation and using Anaconda. It highlights key libraries like NumPy, Pandas, and Scikit-learn, explaining their functionalities and installation processes. Additionally, it introduces Jupyter Notebook as an essential tool for data science applications, outlining its features and types of cells.

Uploaded by

Kalighat Okira
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning with Python

Python EcoSystem

Prof. Shibdas Dutta,


Associate Professor,
DCG DATA CORE SYSTEMS INDIA PVT LTD
Kolkata

Company Confidential: Data-Core Systems, Inc. | [Link]


Installing Python
For working in Python, we must first have to install it. You can
perform the installation of Python in any of the following two
ways:
• Installing Python individually
• Using Pre-packaged Python distribution: Anaconda
Let us discuss these each in detail.
Installing Python Individually
If you want to install Python on your computer, then then you
need to download only the binary code applicable for your
platform. Python distribution is available for Windows, Linux
and Mac platforms.
Company Confidential: Data-Core Systems, Inc. | [Link]
On Windows platform

With the help of following steps, we can install Python on Windows platform:

 First, go to [Link]
Next, click on the link for Windows installer [Link] file.
Here XYZ is the version we wish to install.
Now, we must run the file that is downloaded. It will take us to
the Python install wizard, which is easy to use. Now, accept the
default settings and wait until the install is finished.

Company Confidential: Data-Core Systems, Inc. | [Link]


Using Pre-packaged Python Distribution: Anaconda
Anaconda is a packaged compilation of Python which have all the libraries widely used in Data science. We can
follow the following steps to setup Python environment using Anaconda:

Step1: First, we need to download the required installation package from Anaconda distribution. The link for the
same is [Link] You can choose from Windows, Mac and Linux OS as per your
requirement.

Step2: Next, select the Python version you want to install on your machine. The latest Python version is 3.7.
There you will get the options for 64-bit and 32-bit Graphical installer both.

Step3: After selecting the OS and Python version, it will download the Anaconda installer on your computer.
Now, double click the file and the installer will install Anaconda package.

Step4: For checking whether it is installed or not, open a command prompt and type Python as follows:

Company Confidential: Data-Core Systems, Inc. | [Link]


Why Python for Data
ExtensiveScience?
set of packages
Python has an extensive and powerful set of packages which are ready to be used in
various domains. It also has packages like numpy, scipy, pandas, scikit-learn etc. which
are required for machine learning and data science.

Company Confidential: Data-Core Systems, Inc. | [Link]


Components of Python ML Ecosystem
In this section, let us discuss some core Data Science libraries that form the components of
Python Machine learning ecosystem. These useful components make Python an important
language for Data Science. Though there are many such components, let us discuss some of
the importance components of Python ecosystem here:

Jupyter Notebook
Jupyter notebooks basically provides an interactive computational environment for developing
Python based Data Science applications. They are formerly known as ipython notebooks. The
following are some of the features of Jupyter notebooks that makes it one of the best
components of Python ML ecosystem:
Jupyter notebooks can illustrate the analysis process step by step by arranging the stuff like
code, images, text, output etc. in a step by step manner.

It helps a data scientist to document the thought process while developing the analysis
process.

One can also capture the result as the part of the notebook.

With the help of jupyter notebooks, we can share our work with a peer also.

Company Confidential: Data-Core Systems, Inc. | [Link]


Installation and Execution
If you are using Anaconda distribution, then you need not install jupyter notebook separately
as it is already installed with it. You just need to go to Anaconda Prompt and type the
following command:
C:\>jupyter notebook

After pressing enter, it will start a notebook server at localhost:8888 of your computer. It is
shown in the following screen shot:

Company Confidential: Data-Core Systems, Inc. | [Link]


Now, after clicking the New tab, you will get a list of options. Select Python 3 and it will take
you to the new notebook for start working in it. You will get a glimpse of it in the following
screenshots:

Company Confidential: Data-Core Systems, Inc. | [Link]


On the other hand, if you are using standard Python distribution then jupyter
notebook can be installed using popular python package installer, pip.

pip install jupyter

Company Confidential: Data-Core Systems, Inc. | [Link]


Types of Cells in Jupyter Notebook
The following are the three types of cells in a jupyter notebook:

Code cells: As the name suggests, we can use these cells to write code. After writing the
code/content, it will send it to the kernel that is associated with the notebook.

Markdown cells: We can use these cells for notating the computation process. They can
contain the stuff like text, images, Latex equations, HTML tags etc.

Raw cells: The text written in them is displayed as it is. These cells are basically used to add
the text that we do not wish to be converted by the automatic conversion mechanism of
jupyter notebook.

Company Confidential: Data-Core Systems, Inc. | [Link]


NumPy
It is another useful component that makes Python as one of the favorite languages for Data
Science. It basically stands for Numerical Python and consists of multidimensional array
objects. By using NumPy, we can perform the following important operations:
Mathematical and logical operations on arrays.
Fourier transformation
 Operations associated with linear algebra.

We can also see NumPy as the replacement of


MatLab because NumPy is mostly used along with Scipy (Scientific Python) and Mat-plotlib (plotting library).

Installation and Execution

If you are using Anaconda distribution, then no need to install NumPy separately as it is already installed with it. You
just need to import the package into your Python script with the help of following:

On the other hand, if you are using standard Python distribution then NumPy can be
installed using popular python package installer, pip.

After installing NumPy, you can import it into your Python script as you did above.

Company Confidential: Data-Core Systems, Inc. | [Link]


Pandas
It is another useful Python library that makes Python one of the favorite languages for Data
Science. Pandas is basically used for data manipulation, wrangling and analysis. It was
developed by Wes McKinney in 2008. With the help of Pandas, in data processing we can
accomplish the following five steps:
Load
Prepare
Manipulate
Model
Analyze
Data representation in Pandas
The entire representation of data in Pandas is done with the help of following three data
structures:
Series: It is basically a one-dimensional ndarray with an axis label which means it is like a
simple array with homogeneous data. For example, the following series is a collection of
integers 1,5,10,15,24,25…

1 5 10 15 24 25 28 36 40 89

Company Confidential: Data-Core Systems, Inc. | [Link]


Data frame: It is the most useful data structure and used for
almost all kind of data representation and manipulation in
pandas.
It is basically a two-dimensional data structure which can
contain heterogeneous data.
Generally, tabular data is represented by using data frames.

For example, the following table shows the data of students


having their names and roll numbers, age and gender:
Name Rollnumber Age Gender

Aarav 1 15 Male

Harshit 2 14 Male

Kanika 3 16 Female

Mayank 4 15 Male

Company Confidential: Data-Core Systems, Inc. | [Link]


Panel: It is a 3-dimensional data structure containing heterogeneous data. It is very
difficult to represent the panel in graphical representation, but it can be illustrated as a
container of DataFrame.
The following table gives us the dimension and description about above mentioned data
structures used in Pandas:
DataStructure Dimension Description

Series 1-D Size immutable, 1-D homogeneous data

DataFrames 2-D Size Mutable, Heterogeneous data in


tabular form

Panel 3-D Size-mutable array, container


ofDataFrame.

We can understand these data structures as the higher dimensional data structure
is the container of lower dimensional data structure.

Company Confidential: Data-Core Systems, Inc. | [Link]


Installation and Execution
If you are using Anaconda distribution, then no need to install Pandas separately as it is
already installed with it. You just need to import the package into your Python script
with the help of following:

import pandas as pd

On the other hand, if you are using standard Python distribution then Pandas can be
installed using popular python package installer, pip.
pip install Pandas

After installing Pandas, you can import it into your Python script as did above.

Example

The following is an example of creating a series from ndarray by using Pandas:

Company Confidential: Data-Core Systems, Inc. | [Link]


In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: data = [Link](['g','a','u','r','a','v'])

In [4]: s= [Link](data)

In [5]: print(s)

0 g

1 a

2 u

3 r

4 a

5 v

dtype: object

Company Confidential: Data-Core Systems, Inc. | [Link]


Scikit-learn
Another useful and most important python library for Data Science and machine learning in
Python is Scikit-learn. The following are some features of Scikit-learn that makes it so useful:

It is built on NumPy, SciPy, and Matplotlib.

It is an open source and can be reused under Berkeley Software Distribution (BSD) license.

It is accessible to everybody and can be reused in various contexts.

Wide range of machine learning algorithms covering major areas of ML like classification,
clustering, regression, dimensionality reduction, model selection etc. can be implemented with
the help of it.

Installation and Execution


If you are using Anaconda distribution, then no need to install Scikit-learn separately as it is
already installed with it. You just need to use the package into your Python script. For
example, with following line of script we are importing dataset of breast cancer patients from
Scikit-learn:

Company Confidential: Data-Core Systems, Inc. | [Link]


from [Link] import load_breast_cancer

On the other hand, if you are using standard Python distribution and having
NumPy and
SciPy then Scikit-learn can be installed using popular python package installer, pip.

After installing Scikit-learn, you can use it into your Python script as you have
done above.

Company Confidential: Data-Core Systems, Inc. | [Link]


Thank You

Company Confidential: Data-Core Systems, Inc. | [Link]

You might also like