0% found this document useful (0 votes)

6 views12 pages

Data Science Notes Mcs

Data science is a multidisciplinary field that utilizes mathematics, statistics, computer science, and domain expertise to extract insights from data for various industries including healthcare, finance, and retail. It involves different types of data (structured, semi-structured, unstructured) and methods of analysis such as descriptive, inferential, and predictive analysis. The data science life cycle consists of phases like problem definition, data collection, analysis, modeling, and deployment to create data-driven solutions.

Uploaded by

2200813526.neha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views12 pages

Data Science Notes Mcs

Uploaded by

2200813526.neha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data science is a multidisciplinary field that combines principles and practices

from mathematics, statistics, computer science, and domain expertise to

extract valuable insights from data.

It involves collecting, cleaning, analyzing, and visualizing data to identify

patterns, trends, and relationships, and to make predictions. These insights
can be used to inform business decisions, solve complex problems, and
improve operations.

Data science integrates the principles of computer science and

mathematics and domain knowledge to create mathematical models that
shows relationships amongst data attributes. In addition, data
science uses data to perform predictive analysis

Data science is widely used across many industries:

1. Healthcare: Predicting disease outbreaks, personalized treatments, and
improving patient care.
2. Finance: Fraud detection, managing risks, algorithmic trading, and customer
segmentation.
3. Retail: Managing inventory, recommendation systems, and analyzing
shopping patterns.
4. Manufacturing: Predicting maintenance needs, quality control, and
improving supply chains.
5. Transportation: Optimizing routes, forecasting demand, and supporting
self-driving vehicles.

TYPES OF DATA
1. Structured Data
2. Semi-Structured Data
3. Unstructured data
4. Data Streams

Structured Data: Structured data send to data that is organized and design
in a specific way to make it easily readable and understand by both humans
and machines. This is typically achieved through the use of a well-defined
schema or data model, which provides a structure for the data.

Figure 2 shows the sample structure of data that may be stored in a

relational database system. One of the key characteristics of structured data
is that it can be associated with a schema. In addition, each schema element
may be related to a specific data type.

Customer (custID, custName, custPhone, custAddress, custCategory,

custPAN, custAadhar)
Account (AccountNumber,custIDoffirstaccountholder,AccountType,
AccountBalance)
JointHolders (AccountNumber, custID)
Transaction(transDate, transType, AccountNumber, Amountoftransaction)
Figure 2: A sample schema of structured data

Semi-structured Data:As the name suggest Semi-structured has some

structure in it. The structure of semi-structured data is due to the use of tags
or key/value pairs The common form of semi-structured data is produced
through XML, JSON objects, Server
logs, EDI data, etc. The example of semi-structured data is shown in the
Figure 3.
<Book>
<title>Data Science and Big Data</title>
<author>R Raman</author>
<author>C V Shekhar</author>
<yearofpublication>2020</yearofpublication>
</Book>

"Book": {
"Title":"Data Science",
"Price": 5000,
"Year": 2020
}
Figure 3: Sample semi-structured data

Unstructured Data:
The unstructured data does not follow any schema definition. For example, a
written text like content of this Unit is unstructured. You may add certain
headings or meta data for unstructured data.
Data Streams
A data stream is characterised by a sequence of data over a period of time.
Such data may be structured, semi-structured or unstructured, but it gets
generated repeatedly. For example, IoT devices like weather sensors will
generate data stream of pressure, temperature, wind direction, wind speed,
humidity etc for a particular place where it is installed. Such data is huge for
many applications are required to be processed in real time. In general, not
all the data of streams is required to be stored and such data is required to be
processed for a specific duration of time.

Statistical Data Types:

There are two distinct types of data that can be used in statistical analysis.
These are – Categorical data and Quantitative data

Categorical Data:
categorical data provides descriptive information about qualitative
attributes, quantitative data offers numerical values for measuring and
analyzing quantities

Quantitative Data: Quantitative data is the numeric data, which can be

used to define different scale of data. The qualitative data is also of
two basic types –discrete, which represents distinct numbers like 2, 3,
5,… or continuous, which represent a continuous values of a given
variable, for example, your height can be measured using continuous
scale.

Measurement scale of data:

the measurement scales of data refer to how data values are categorized,
ranked, or quantified. These are foundational in choosing the right analysis
method.

1. Nominal Scale (Categorical - No Order)

● Definition: Labels or names without any numeric value or order.

● Examples: Gender (Male/Female), Blood Type (A, B, AB, O), Colors.

● Operations: Only counting or mode; no sorting or calculations.

● ✅ Used for classification only.

2. Ordinal Scale (Categorical - With Order)

● Definition: Data with a natural order, but intervals between values are
not known.

● Examples: Rank in a competition (1st, 2nd, 3rd), Satisfaction level

(High, Medium, Low).

● Operations: Median and mode are valid; mean is not.

● ✅ Used for ranking or preferences.

3. Interval Scale (Quantitative - Equal Intervals, No True Zero)

● Definition: Numeric scale with equal intervals but no absolute zero.

● Examples: Temperature in Celsius or Fahrenheit, IQ scores.

● Operations: Addition/subtraction valid; ratios are not meaningful (e.g.,

20°C is not twice as hot as 10°C).

● ✅ Used for comparison of differences.

4. Ratio Scale (Quantitative - Equal Intervals, True Zero)

● Definition: Same as interval scale but with a true zero point.

● Examples: Age, Weight, Height, Income, Distance.

● Operations: All arithmetic operations allowed.

● ✅ Used for true comparisons and ratios (e.g., 20 kg is twice 10 kg).

BASIC METHODS OF DATA ANALYSIS:

The data for data science is obtained from several data sources. This data is
first cleaned of errors, duplication, aggregated and then presented in a form
that can be analysed by various methods. In this section, we define some of
the basic methods used for analysing data.
These are: Descriptive analysis, Exploratory data analysis and Inferential data
analysis.

Descriptive Analysis

● Summarizes and describes features of a dataset.

● Includes measures like mean, median, mode, standard deviation, and

frequency distribution.

● Often visualized using charts, graphs, and tables.

Inferential Analysis

● Draws conclusions about a population based on a sample.

● Uses statistical techniques like hypothesis testing, confidence

intervals, and regression analysis.

Exploratory Data Analysis (EDA)

● Focuses on discovering patterns, trends, and relationships within data.

● Uses visualizations (e.g., scatter plots, histograms) and summary

statistics.

● Often the first step in data analysis.

Predictive Analysis
● Uses historical data to make predictions about future outcomes.

● Involves machine learning and statistical models like linear

regression, decision trees, etc.

Diagnostic Analysis

● Investigates why something happened in the data.

● Often includes drill-downs, data mining, and correlation analysis.

Prescriptive Analysis

● Suggests actionable steps based on data insights.

● Often used in decision-making systems with optimization algorithms or

simulations.

Common Misconceptions in Data Analysis

Data Analysis Is Only About Numbers
Misconception: Data analysis is solely a numerical or statistical process.
Clarification: While quantitative analysis is a significant component, data
analysis also involves understanding the context, patterns, and qualitative
aspects of data.

More Data Equals Better Results

Misconception: Having more data automatically leads to better insights.
Clarification: The quality of data is more important than quantity. Large
volumes of poor-quality or irrelevant data can lead to misleading
conclusions.

Correlation Implies Causation

Misconception: A correlation between two variables indicates that one
causes the other.
Clarification: Correlation does not establish causation. Two variables may be
correlated due to coincidence or the presence of a third influencing factor.
APPLICATIONS OF DATA SCIENCE:
Data science has widespread applications across many fields.
In healthcare, it is used for disease prediction, drug discovery, personalized
treatments, and medical imaging analysis.
In finance, it powers fraud detection, risk management, customer
segmentation, and algorithmic trading.
Retail and e-commerce use data science for recommendation systems,
inventory forecasting, and customer sentiment analysis.
Transportation and logistics apply it for route optimization, self-driving
vehicles, and demand prediction.
In entertainment, data science drives personalized content recommendation,
audience analysis, and ad targeting.
Manufacturing benefits through predictive maintenance, quality control, and
supply chain optimization.
Education uses data science for personalized learning, predicting student
performance, and curriculum development.
In agriculture, it enables precision farming, crop yield prediction, and early
pest/disease detection.

DATA SCIENCE LIFE CYCLE:

The data science life cycle is a structured approach to developing and
deploying data-driven solutions. It typically involves six key phases: problem
definition, data acquisition and exploration, research and development,
validation, delivery, and monitoring. Each phase has iterative steps, ensuring
a thorough and systematic process.

Data Science Project Requirements Analysis Phase

The first and foremost step for data science project would be to identify the objectives
of a data science project. This identification of objectives is also coupled with the
study of benefits of the project, resource requirements and cost of the project. In
addition, you need to make a project plan, which includes project deliverables and
associated time frame. In addition, the data that is required to be used for the project is
also decided. This phase is similar as that of requirement study and project planning
and scheduling.
Data collection and Preparation Phase
In this phase, first all the data sources are identified, followed by designing the
process of data collection. It may be noted that data collection may be a continuous
process. Once the data sources are identified then data is checked for duplication of
data, consistency of data, missing data, and availability timeline of data. In addition,
data may be integrated, aggregated or transformed to produce data for a defined set of
attributes, which are identified in the requirements phase.
Descriptive data analysis
Next, the data is analysed using univariate and bivariate analysis techniques. This will
generate descriptive information about the data. This phase can also be used to
establish the suitability and validly of data as per the requirements of data analysis.
This is a good time to review your project requirements vis-à-vis collected data
characteristics.
Data Modelling and Model Testing
Next, a number of data models based on the data are developed. All these data models
are then tested for their validity with test data. The accuracy of various models are
compared contrasted and a final model is proposed for data analysis.
Model deployment and Refinement
The tested best model is used to address the data science problem, however, this
model must be constantly refined, as the decision making environment keeps
changing and new data sets and attributes may change with time. The refinement
process goes through all the previous steps again.

Essential Guide to Data Analysis Techniques
No ratings yet
Essential Guide to Data Analysis Techniques
22 pages
Data Analytics Overview and Importance
No ratings yet
Data Analytics Overview and Importance
30 pages
Mid-1 Siv Data Science Unit-3 Half
No ratings yet
Mid-1 Siv Data Science Unit-3 Half
13 pages
SIV DATA SCIENCE UNIT-3 Notes
No ratings yet
SIV DATA SCIENCE UNIT-3 Notes
39 pages
Understanding Data Science Essentials
No ratings yet
Understanding Data Science Essentials
64 pages
Data Definition and Analysis Techniques
No ratings yet
Data Definition and Analysis Techniques
4 pages
Understanding Agent Risk Profiles
No ratings yet
Understanding Agent Risk Profiles
321 pages
Comprehensive Guide to Data Analytics
No ratings yet
Comprehensive Guide to Data Analytics
83 pages
Unitrr
No ratings yet
Unitrr
24 pages
Unit 1 Business Analytics
No ratings yet
Unit 1 Business Analytics
17 pages
Data Analytics Module I Syllabus
No ratings yet
Data Analytics Module I Syllabus
37 pages
Data Analysis Course
No ratings yet
Data Analysis Course
14 pages
Dddsaa
No ratings yet
Dddsaa
8 pages
Understanding Digital Data Types
No ratings yet
Understanding Digital Data Types
48 pages
Data Science Course Syllabus Overview
No ratings yet
Data Science Course Syllabus Overview
65 pages
Data Analysis Using R Notes
No ratings yet
Data Analysis Using R Notes
52 pages
Understanding Business Analytics Types
No ratings yet
Understanding Business Analytics Types
5 pages
Understanding NoSQL Database Systems
100% (1)
Understanding NoSQL Database Systems
135 pages
Data Science Applications Overview
No ratings yet
Data Science Applications Overview
24 pages
Principles of Data Collection Overview
No ratings yet
Principles of Data Collection Overview
46 pages
Data Analytics Techniques Overview
No ratings yet
Data Analytics Techniques Overview
35 pages
EDA Fundamentals: Data Analysis Guide
No ratings yet
EDA Fundamentals: Data Analysis Guide
10 pages
Data Science Overview and Analytics Types
No ratings yet
Data Science Overview and Analytics Types
341 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
114 pages
Data Analytics and Statistical Concepts
No ratings yet
Data Analytics and Statistical Concepts
2 pages
Understanding Exploratory Data Analysis
No ratings yet
Understanding Exploratory Data Analysis
41 pages
Introduction to Data Analytics Overview
No ratings yet
Introduction to Data Analytics Overview
34 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
6 pages
Unitba 1 PDF
No ratings yet
Unitba 1 PDF
8 pages
Unit 1
No ratings yet
Unit 1
143 pages
Data Analysis Techniques Overview
No ratings yet
Data Analysis Techniques Overview
9 pages
Data Analytics Overview and Applications
No ratings yet
Data Analytics Overview and Applications
53 pages
Ba Unit 1
No ratings yet
Ba Unit 1
25 pages
7 Types of Statistical Analysis
100% (1)
7 Types of Statistical Analysis
9 pages
Data - Course Notes
No ratings yet
Data - Course Notes
5 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
7 pages
Data Science Basics: Understanding Data
No ratings yet
Data Science Basics: Understanding Data
9 pages
Principles of Mathematical Data Science
No ratings yet
Principles of Mathematical Data Science
12 pages
Data Storage Solutions for Analytics
No ratings yet
Data Storage Solutions for Analytics
80 pages
Nature of Data in Descriptive Analytics
No ratings yet
Nature of Data in Descriptive Analytics
27 pages
Essential Statistics for Data Science
No ratings yet
Essential Statistics for Data Science
125 pages
Data Science: Analyzing Booking Trends
No ratings yet
Data Science: Analyzing Booking Trends
73 pages
Data Science Course Overview and Applications
No ratings yet
Data Science Course Overview and Applications
63 pages
Data Analysis Foundations and Techniques
No ratings yet
Data Analysis Foundations and Techniques
26 pages
Data Analytics and Business Intelligence Overview
100% (1)
Data Analytics and Business Intelligence Overview
37 pages
Essential Data Quality Factors for BI
No ratings yet
Essential Data Quality Factors for BI
19 pages
Understanding Data Raw Material A. Classification of Data Types Qualitative Quantitative
No ratings yet
Understanding Data Raw Material A. Classification of Data Types Qualitative Quantitative
27 pages
Data Science Process Overview
No ratings yet
Data Science Process Overview
20 pages
Ba - Unit 1
No ratings yet
Ba - Unit 1
16 pages
Data Science Overview and Applications
No ratings yet
Data Science Overview and Applications
25 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
27 pages
Understanding Statistics in Data Science
No ratings yet
Understanding Statistics in Data Science
27 pages
Understanding Data Science Basics
No ratings yet
Understanding Data Science Basics
14 pages
IDS - Unit-III (Mid-1)
No ratings yet
IDS - Unit-III (Mid-1)
12 pages
Data Science Techniques Overview
No ratings yet
Data Science Techniques Overview
19 pages
Comprehensive Guide to Statistical Analysis
No ratings yet
Comprehensive Guide to Statistical Analysis
44 pages
Understanding Data and Analytics Basics
No ratings yet
Understanding Data and Analytics Basics
26 pages
Understanding Data and Its Types
No ratings yet
Understanding Data and Its Types
13 pages
Understanding Data for Business Analytics
No ratings yet
Understanding Data for Business Analytics
40 pages
Understanding Risk Return Trade Off
No ratings yet
Understanding Risk Return Trade Off
40 pages
AI and Data Scientist Roadmap 2023
No ratings yet
AI and Data Scientist Roadmap 2023
7 pages
2025 Major and Minor Exam Timetable
No ratings yet
2025 Major and Minor Exam Timetable
9 pages
Himanshu Raj: BCA Graduate Profile
No ratings yet
Himanshu Raj: BCA Graduate Profile
1 page
AI and ML Applications in Manufacturing
No ratings yet
AI and ML Applications in Manufacturing
20 pages
RAPIDMINER
No ratings yet
RAPIDMINER
3 pages
AI Career Opportunities Explained
No ratings yet
AI Career Opportunities Explained
17 pages
Data-Driven Computing in Mechanics
No ratings yet
Data-Driven Computing in Mechanics
21 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
18 pages
DataScience Project
No ratings yet
DataScience Project
21 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
65 pages
Python Timetable for Data Science
No ratings yet
Python Timetable for Data Science
3 pages
Beginner's Guide to Programming
No ratings yet
Beginner's Guide to Programming
8 pages
Foundations of Data Science Course Syllabus
No ratings yet
Foundations of Data Science Course Syllabus
2 pages
Data Science Comprehensive Overview
No ratings yet
Data Science Comprehensive Overview
42 pages
ML in Childhood Stunting Research
No ratings yet
ML in Childhood Stunting Research
6 pages
2025 PhD Application Guide and Process
No ratings yet
2025 PhD Application Guide and Process
17 pages
Understanding Artificial Intelligence
No ratings yet
Understanding Artificial Intelligence
35 pages
Course Offerings in Data Science and AI
No ratings yet
Course Offerings in Data Science and AI
9 pages
Applied Mathematics & AI Postgraduate Program
No ratings yet
Applied Mathematics & AI Postgraduate Program
26 pages
Mean Absolute Deviation Explained
No ratings yet
Mean Absolute Deviation Explained
58 pages
Data Analyst Job Opportunity
No ratings yet
Data Analyst Job Opportunity
2 pages
Kailash Mali: AI/ML Graduate Profile
No ratings yet
Kailash Mali: AI/ML Graduate Profile
1 page
Data Science: Techniques and Applications
No ratings yet
Data Science: Techniques and Applications
5 pages
Future of Geochemical Data Generation
No ratings yet
Future of Geochemical Data Generation
17 pages
Value Added Course Proposal: Data Science
No ratings yet
Value Added Course Proposal: Data Science
9 pages
Important Viva Questions With Answers
No ratings yet
Important Viva Questions With Answers
31 pages
Mastering Python for Data Science
86% (14)
Mastering Python for Data Science
572 pages
Introduction to Data Science Course Overview
No ratings yet
Introduction to Data Science Course Overview
40 pages
Data Science Research Methodology Course
No ratings yet
Data Science Research Methodology Course
2 pages

Data Science Notes Mcs

Uploaded by

Data Science Notes Mcs

Uploaded by

Data science is a multidisciplinary field that combines principles and practices

from mathematics, statistics, computer science, and domain expertise to

It involves collecting, cleaning, analyzing, and visualizing data to identify

Data science integrates the principles of computer science and

Data science is widely used across many industries:

Figure 2 shows the sample structure of data that may be stored in a

Customer (custID, custName, custPhone, custAddress, custCategory,

Semi-structured Data:As the name suggest Semi-structured has some

Statistical Data Types:

Quantitative Data: Quantitative data is the numeric data, which can be

Measurement scale of data:

1. Nominal Scale (Categorical - No Order)

●​ Definition: Labels or names without any numeric value or order.​

●​ Examples: Gender (Male/Female), Blood Type (A, B, AB, O), Colors.​

●​ Operations: Only counting or mode; no sorting or calculations.​

●​ ✅ Used for classification only.​

2. Ordinal Scale (Categorical - With Order)

●​ Examples: Rank in a competition (1st, 2nd, 3rd), Satisfaction level

●​ Operations: Median and mode are valid; mean is not.​

●​ ✅ Used for ranking or preferences.​

3. Interval Scale (Quantitative - Equal Intervals, No True Zero)

●​ Examples: Temperature in Celsius or Fahrenheit, IQ scores.​

●​ Operations: Addition/subtraction valid; ratios are not meaningful (e.g.,

●​ ✅ Used for comparison of differences.​

4. Ratio Scale (Quantitative - Equal Intervals, True Zero)

●​ Definition: Same as interval scale but with a true zero point.​

●​ Examples: Age, Weight, Height, Income, Distance.​

●​ Operations: All arithmetic operations allowed.​

●​ ✅ Used for true comparisons and ratios (e.g., 20 kg is twice 10 kg).​

BASIC METHODS OF DATA ANALYSIS:

●​ Summarizes and describes features of a dataset.​

●​ Includes measures like mean, median, mode, standard deviation, and

●​ Often visualized using charts, graphs, and tables.​

●​ Draws conclusions about a population based on a sample.​

●​ Uses statistical techniques like hypothesis testing, confidence

Exploratory Data Analysis (EDA)​

●​ Focuses on discovering patterns, trends, and relationships within data.​

●​ Uses visualizations (e.g., scatter plots, histograms) and summary

●​ Often the first step in data analysis.​

●​ Involves machine learning and statistical models like linear

●​ Investigates why something happened in the data.​

●​ Often includes drill-downs, data mining, and correlation analysis.​

●​ Suggests actionable steps based on data insights.​

●​ Often used in decision-making systems with optimization algorithms or

Common Misconceptions in Data Analysis

More Data Equals Better Results​

Correlation Implies Causation​

DATA SCIENCE LIFE CYCLE:

Data Science Project Requirements Analysis Phase

You might also like

● Definition: Labels or names without any numeric value or order.

● Examples: Gender (Male/Female), Blood Type (A, B, AB, O), Colors.

● Operations: Only counting or mode; no sorting or calculations.

● ✅ Used for classification only.

● Examples: Rank in a competition (1st, 2nd, 3rd), Satisfaction level

● Operations: Median and mode are valid; mean is not.

● ✅ Used for ranking or preferences.

● Examples: Temperature in Celsius or Fahrenheit, IQ scores.

● Operations: Addition/subtraction valid; ratios are not meaningful (e.g.,

● ✅ Used for comparison of differences.

● Definition: Same as interval scale but with a true zero point.

● Examples: Age, Weight, Height, Income, Distance.

● Operations: All arithmetic operations allowed.

● ✅ Used for true comparisons and ratios (e.g., 20 kg is twice 10 kg).

● Summarizes and describes features of a dataset.

● Includes measures like mean, median, mode, standard deviation, and

● Often visualized using charts, graphs, and tables.

● Draws conclusions about a population based on a sample.

● Uses statistical techniques like hypothesis testing, confidence

Exploratory Data Analysis (EDA)

● Focuses on discovering patterns, trends, and relationships within data.

● Uses visualizations (e.g., scatter plots, histograms) and summary

● Often the first step in data analysis.

● Involves machine learning and statistical models like linear

● Investigates why something happened in the data.

● Often includes drill-downs, data mining, and correlation analysis.

● Suggests actionable steps based on data insights.

● Often used in decision-making systems with optimization algorithms or

More Data Equals Better Results

Correlation Implies Causation