INTERNSHIP REPORT
A report submitted in partial fulfillment of the requirement of the award of degree of
MASTER OF SCIENCE IN MATHEMATICS
By
KEERTHI R M
Reg. No.:23083076511012007
(Duration: 12th June to18th June, 2024)
DEPARTMENT OF MATHEMATICS
GOVERNMENT ARTS AND SCIENCE COLLEGE
KANYAKUMARI – 629 401
DATA SCIENCE WITH PYTHON
Submitted by
KEERTHI R M
Reg .No. 23083076511012007
OCTOBER- 2024
ABSTRACT
This report presents a comprehensive analysis of my internship
experience at AK INFOPARK PRIVATE LIMITED,
PARVATHIPURAM, a leading firm in the sector. The primary focus of
the internship was to understand and learn about various software
packages through hands on approach. I was involved in python for Data
Science.
This internship provided an invaluable opportunity to apply theoretical
knowledge acquired in academic studies to real-world scenarios. I
participated in several learning notably python, python operators, working
with numpy and Panda, data science in real time applications, data
visualization and data science components. I utilized various analytical tools
that are very useful for job opportunities.
The internship underscored the importance of adaptability, teamwork,
and continuous learning within the latest updates of Python with data science.
This report details the project undertaken, skills developed and lessons
learned throughout the internship helped me to develop the skills in python
and data science. It concludes with recommendations for future marketing
endeavors based on the insights gained.
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO.
INTRODUCTION 1
1 PYTHON 3
2 PYTHON OPERATORS 8
3 WORKING WITH NUMPY, PANDAS
17
4 DATA SCIENCE IN REALTIME APPLICATION
19
5 DATA VISUALIZATION
21
6 DATA SCIENCE COMPONENTS
24
CONCLUSION
INTRODUCTION
A program is a sequence of instructions that specifies how to
perform a computation. The computation might be something
mathematical, such as solving a system of equations or finding the roots
of a polynomial, but it can also be a symbolic computation such as
searching and replacing text in a document or something graphical, like
processing an image or playing a video. Python is a powerful and
versatile programming language that has become increasingly popular in
the field of data science. With its simple syntax and vast array of libraries
and tools, python has made it easier for data science to manipulate and
analyze data, build predictive models and make data driven decisions. In
this report, we will explore how python is used in data science, as well as
some of the key libraries and tools that data scientists use to perform
their work.
Python, favored by data scientists is flexible and ease of use.
Python is a high-level programming language that is both easy to learn
and easy to read, making it ideal for data science who may not have
strong background in programming. Python also offers a wide range of
libraries and tools that are specifically designed for data analysis and
machine learning such as NumPy, Pandas, matplotlip and scikit-learn.
1
These libraries allow data science to easily manipulate and visualize data,
as well as build and evaluate predictive models.
NumPy is a fundamental package for scientific computing with
python, providing support for large, multi-dimensional arrays and
matrices, as well as a variety of mathematical functions to operate on
these arrays. Pandas are powerful data manipulation library that offers
data structure like data frames and series, which allow data science to
easily work with structured data. Matplotlib is a plotting library that
enables data science to create a wide variety of visualizations, such as
line plots, scatter plots and histogram. Scikit- learns is a machine
learning library that provides a wide range of algorithms for
classification, regression, clustering and more.
As the field of data science continues to grow and evolve, python
still likely remain as a key programming language for data science
around the world.
2
CHAPTER 1
PHYTHON
Python is a popular programming language. It was created by
Guido van Rossum, and released in 1991. It is used for web development
(server-side), software development, mathematics, system scripting.
Python’s popularity in data science is largely attributed to its readability,
ease of learning, and the powerful libraries it provides. These libraries
enable data manipulation, statistical analysis and machine learning,
making Python an invaluable tool for data scientists.
Key Libraries and Tools
Pandas: A library providing high-performance data manipulation
and analysis. It introduces data structures like Data Frames that simplify
data handling and preparation.
NumPy: This library offers support for arrays and matrices, along
with a collection of mathematical functions to operate on these arrays. It
forms the backbone for many scientific computations in Python.
In Python we have list, that serve the purpose of arrays, but they
are slow to process. NumPy aims to provide an array object that is up to
50x faster than traditional Python lists. The array object in NumPy is
3
called ndarray, it provides a lot of supporting functions that make
working with nd array very easy. Arrays are very frequently used in data
science, where speed and resources are very important.
Matplotlib and Seaborn: These libraries are used for data
visualization. Matplotlib offers a wide range of plotting options, while
Seaborn provides a high-level interface for drawing attractive and
informative statistical graphics.
Scikit-learn: A library for machine learning includes simple and
efficient tools for data mining and data analysis. It supports various
algorithms for classification, regression, clustering and dimensionality
reduction.
TensorFlow and PyTorch: These libraries are used for deep
learning. Tensor Flow developed by Google and PyTorch developed by
Facebook, is popular for building and training neural networks.
Workflow in Python
Data Collection: Data can be collected from various sources,
including databases, APIs and web scraping. Libraries like requests and
Beautiful Soup are commonly used for these tasks.
4
Data Cleaning and Preparation: Data often needs to be cleaned
and transformed before analysis. Pandas are particularly useful for
handling missing values, filtering data and merging datasets.
Exploratory Data Analysis (EDA): EDA involves summarizing
the main characteristics of a dataset. This step helps in understanding the
data distribution and uncovering patterns.
Model Building: Using libraries like Scikit-learn or Tensor Flow,
data scientists build and train models to make predictions or classify
data. This involves selecting algorithms, training the model, and tuning
hyper parameters.
PYTHON BASICS
Python is an interpreted high level programming language known for its
simplicity and readability. Python uses indentation to define code blocks,
making it easy to read and understand.
VARIABLES AND DATA TYPES
Variables store data in memory and are assigned using the
assignment operator “=”. Common data types in python include integers,
floats, strings, lists, tuples, dictionaries, and sets.
5
CONTROL STRUCTURES
Conditional statements like ‘if,’ ‘elif’ and ‘else’ allow to make
decisions based on conditions. Loops like ‘for’ and ‘while’ can be used
for iteration and repetitive tasks.
FUNCTIONS
Functions are blocks of reusable code that performs a specific
task. Functions can take arguments as input and return values as output.
MODULES AND PACKAGES
Python modules and files contain python code. Modules are used
to import statement. Packages are directories, containing multiple
modules and a special file called _init_.py.
FILE I/O
Python provides built in functions for reading from and writing
to files. Use ‘open ( )’ to open a file and ‘read ( ) or write ( )’ to
manipulate file contents.
6
OBJECT-ORIENTED PROGRAMMING
Python supports OOP principles like encapsulation, inheritance
and polymorphism. Classes are blueprints for creating objects, while
objects are instance of classes.
7
CHAPTER 2
PYTHON OPERATORS
Operators are standard symbols used for logical and arithmetic
operations and are used to perform operations on variables and values.
Example: +, -, *, /…...The value on which the operator is applied is
called Operand. Python divides the operator as Python Arithmetic
Operators, Python Assignment Operators, Python Comparison Operators,
Python Logical Operators, Python Identity Operators, Python
Membership Operators, and Python Bitwise Operators.
8
Python Arithmetic Operators
Arithmetic operators are used with numeric values to perform common
mathematical operations.
Operator Name Example
Addition x+y
+
x-y
Subtraction
-
* Multiplication x*y
/ Division x/y
% Modulus x*y
** Exponentiation x ** y
Floor division x // y
\\
9
Python Assignment Operators
Assignment operators are used to assign values to variables.
Operator Example Same As
x=5
= x=5
+= x += 3 x=x+3
-= x -= 3 x=x-3
*= x *= 3 x=x*3
/= x /= 3 x=x/3
%= x %= 3 x=x%3
10
Python Comparison Operators
Comparison operators are used to compare two values.
Operators Name Example
== Equal x == y
!= Not equal x != y
> Greater than x>y
< Less than x<y
Greater than or qual to
>= x >= y
11
Python Logical Operators
Logical operators are used to combine conditional statements.
Operator Description Example
Returns True if both
and x < 5 and x < 10
statements are true
Returns True if one of the
or x < 5 or x < 4
statements is true
Reverse the result,
not returns False if the not(x < 5 and x < 10)
result is true
12
Python Identity Operators
Identity operators are used to compare the objects, not if they are
equal, but if they are actually the same object, with the same memory
location.
Operator Name Example
Returns True if both
is Variables are x is y
the same object
Returns True if both
is not Variables are x is not y
not the same object
13
Python Membership Operators
Membership operators are used to test, if a sequence is presented in an
object
Operator Description Example
Returns True if a sequence
In with the specified value is x in y
present in object
Returns True if a sequence
not in with the specified value is x not in y
not present in the object
14
Python Bitwise Operators
Bitwise operators are used to compare (binary) numbers.
Operator Name Description Example
& AND Sets each bit to 1 if x&y
both bits are 1
Sets each bit to 1 if
| OR one of two bits x|y
is 1
Sets each bit to 1 if
^ XOR only one of two x^y
bits is 1
~ NOT Inverts all the bits ~x
Shift left by
pushing zeros in
from the right and x << 2
<< Zero fill left shift
let the leftmost bits
off
15
Examples
16
CHAPTER 3
WORKING WITH NUMPY AND PANDAS
NumPy and pandas are popular libraries in python that are
commonly used for data manipulation and data analysis.
NumPy provides support for multidimensional arrays and
mathematical functions.
Basic operators are importing NumPy, creating arrays, array
indexing, array slicing,
Basic math operations element-wise operations, matrix
multiplication, array reshape, array transpose are array operators
Pandas offer data structures like data frames and series that make
it easy to work with structured data.
In Pandas, creating data frames, creating series, data selection,
data filtering, importing pandas are basic operators. merging, joining,
pivoting, reshaping, grouping and statistics are used for data
manipulation.
17
NumPy and pandas are indispensable tools for best practice
which includes utilizing vectored operations, optimizing data structures
and visualizing data effectively.
Screenshots for NumPy& Pandas
18
CHAPTER 4
DATA SCIENCE IN REAL TIME APPLICATION
Data Science is the deep study of a large quantity of data, which
involves extracting some meaning from the raw, structured and
unstructured data. Extracting meaningful data from large amounts uses
algorithms, processing of data and this processing can be done using
statistical techniques and algorithms, scientific techniques, different
technologies etc. It uses various tools and techniques to extract
meaningful data from raw data.
Data Science is applied in Finance (stock market prediction,
credit scoring), Healthcare (patient monitoring, diseases diagnosis),
Marketing (customer segment) and IOT (sensor data analysis, predictive
maintenance).
In Python Libraries, NumPy and Pandas are used for data
manipulation, Scikitlearn are used for machine learning, Tensor flow or
PyTorch for deep learning, Matplotlib and seaborne for visualization.
In Real time, data sources are used for streaming data (twitter,
sensor, data), API calls (weather, stock prices) and web scraping. Data
19
cleaning and preprocessing, Feature extraction and selection and Data
transformation are used in data processing.
Example:
Import panda as pd
import numpy as np
From sklearn. Model_selection import
train_ test _ split
from sklearn.Linear_modelimport
20
CHAPTER 5
DATA VISUALIZATION
Data visualization is used to represent data graphically to facilitate
understanding, identifying trends, patterns, and correlation. In Popular
Libraries, Matplotlip is used for 2D/3D plotting, Seaborn is used for
statistical visualization, Plotly is used for interactive visualization, Bokeh
are used for web based visualizations and Pandas are used for data
manipulation and visualizations
BASIC PLOTS
Line plots ([Link]( ))
Scatter plots ([Link]( ))
Bar charts ([Link]( ))
Histograms ([Link]( ))
21
In real world Data Visualization is applied in business
intelligence, scientific research, machine learning and web analytics.
df=pd. Read_csv(“[Link]”) is a program to load data.
BASIC PLOT
[Link](df („column‟)) Plt. show( )
Data Visualization is a powerful tool for exploring and
communicating insights from data. Python provides a rich set of libraries
for creating a wide range of visualization from basic static plots to
interactive plots.
22
INPUT
OUTPUT
23
CHAPTER 6
DATA SCIENCE COMPONENTS
Data components are essential elements used for storing,
organizing and manipulating data. These components are crucial for
development and programming tasks, as they allow developers to work
with various types of data efficiency.
Data science in an interdisciplinary field that uses scientific
techniques, procedures, algorithms and structures, to extract know-how
and insights from established and unstructured information.
VARIABLES
Variables are used to store data values in memory. Variables are
created simply by assigning a value to a name. For example, a=10 creates
a variable named ‘a’ with a value of 10. Variables can store different
types of data such as integers, floats, strings, lists and dictionaries.
SETS
Sets are unordered collections of unique elements in python. Sets
do not allow duplicate values and elements to store in a random order.
Sets are defined using curly braces{}.
24
ARRAYS
Arrays in python are data structures that can store multiple values
of the same type. Python does not have built-in support for arrays, but the
NumPy library provides multidimensional arrays that are widely used in
scientific computing and data analysis.
Examples:
DATA FRAMES
Data frames are data structures commonly used in data analysis
and manipulation tasks. Data frames are provided by libraries such as
NumPy and pandas which allow developers to work with tabular data in
a versatile and efficient manner.
25
PACKAGES
Packages are collection of python modules that are organized in a
directory hierarchy. Packages allows developers to structure their code in
a more organized and maintainable way and provide a namespace for
organizing related functionality.
Each data components has its own characteristics and advantages,
allowing developers to choose the most suitable data structure.
26
ASSIGNMENTS
27
28
29
CONCLUSION
Python has emerged as the predominant language in the field of
data science due to its flexibility, extensive libraries, and strong
community support. Its simplicity and readability makes it accessible for
both beginners and experienced programmers, enabling data science to
efficiently manipulate, analyze and visualize data. Python’s libraries such
as NumPy, pandas, and matplotlib provide powerful tools for data
manipulation and visualization, while frame works like scikit-learn and
tensor flow enable the development of complex machine learning
models. The languages versatility allows data science to seamlessly
integrate different tools and technologies, making it an invaluable asset
in tackling diverse data science can easily process large data set, derive
meaningful insights, and build predictive models, making it an essential
tool for anyone working in the field of data science.
30
31