Foundations of Data Science Concepts

The document outlines essential concepts in data science, including definitions and processes related to data science, machine learning, data mining, and data engineering. It covers various types of data, statistical measures, and tools used for data analysis and model building. Additionally, it introduces Python libraries and techniques for data wrangling and manipulation.

Uploaded by

dheevambiga92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views14 pages

Foundations of Data Science Concepts

Uploaded by

dheevambiga92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

MUST KNOW CONCEPTS

MKC
CSE 2024-25

Course Code & Course Name : CS3352 & Foundations of Data

Science
Year/Sem/Sec : II/III/A
Notation
S.N Concept/Definition/Meaning/
Term (Symbol Units
o Units/Equation/Expression
)
UNIT I- INTRODUCTION
Data science is the area of study
which involves extracting insights
1 Data Science from vast amounts of data using
various scientific methods,
algorithms, and processes.
Machine learning trains the
Machine software model so that it can
2
learning perform the tasks as a human
expert.
The upstream process: Acquiring,
cleaning, integrating
3 Facets of data
The downstream process: analysis,
modelling and prediction.
It is a continuous flow of data from
a source to destination to be
4 Data streaming
processed and analysed in near real
time.
Setting the research goal, retrieving
Data science data, data preparation, data
5
process exploration, data modelling,
presentation and automation.
Any data that has been received,
stored or changed in such a manner
6 Noisy data that it cannot be read or used by
the program that originally created
it can be described as noisy.
Data cleansing is a subprocess of
the data preparation process that
7 Cleansing data
focus on removing errors in the
data.
8 Outliers Outliers is a data object that
deviates significantly from the rest
of the data objects and behaves in a
different manner.
Data transformation is the process
of converting data from one format
Data
9 to another. It is converting raw data
transformation
into clean and usable form,
removing duplicates.
A dummy variable is a numerical
Dummy variable used in regression analysis
10
variables to represent subgroups of the
sample in the study.
D(x,y)= It is the distance between two
sqrt( points. It can be calculated from
Euclidean
11
distance ∑ n ¿ ( yi− xi ) the Cartesian coordinates of the
2
i=1 ¿ points using the Pythagorean
¿
theorem
Exploratory data analysis (EDA) is
used by data scientists to analyze
and investigate data sets and
summarize their main
characteristics, often employing
data visualization methods. It helps
12 EDA
determine how best to manipulate
data sources to get the answers you
need, making it easier for data
scientists to discover patterns, spot
anomalies, test a hypothesis, or
check assumption.
Tools for model R and PL/R, Octave, WEKA, Python,
13
building SQL, MADLib
A data warehouse centralizes and
consolidates large amounts of data
from multiple sources. Its analytical
capabilities allow organizations to
Data derive valuable business insights
14
warehousing from their data to improve decision-
making. Over time, it builds a
historical record that can be
invaluable to data scientists and
business analysts.
Data mining is the process of
sorting through large data sets to
identify patterns and relationships
that can help solve business
15 Data mining problems through data analysis.
Data mining techniques and tools
enable enterprises to predict future
trends and make more-informed
business decisions.
Market and stock analysis, fraud
Data mining
16 detection, risk management,
applications
analysing customer life value.
Data mining Rapid miner, weka, KNime, Apache
17
tools Mahout, Oracle Data mining.
Selection,pre-processing,
Data mining transformation, data mining,
18
steps interpretation, knowledge
extraction.
Statistics, domain expertise, data
Data science engineering, visualization, advances
19
components computing, mathematics, machine
learning.
The definition of big data is data
that contains greater variety,
20 Big data
arriving in increasing volumes and
with more velocity.
Data engineering is the process of
designing and building systems that
let people collect and analyse raw
Data data from multiple sources and
21
Engineering formats. These systems empower
people to find practical applications
of the data, which businesses can
use to thrive
Defined the performance of a
Confusion classification algorithm. It visualizes
22
Matrix and summarizes the performance of
a classification algorithms.
It is a document that lays out the
project vision, scope, objectives,
23 Project charter
project team and their
responsibilities.
Data lake stores an organizations
24 Data lake raw and processed data at both
large and small scales.
Data mart supplies subject oriented
25 Data mart data necessary to support a specific
business unit.
UNIT II: DESCRIBING DATA
Data is a collection of discrete or
continuous values that
convey information, describing
26 Data the quantity, quality, fact, statistics,
other basic units of meaning, or
simply sequences of symbols that
may be further interpreted formally
Nominal data is data that can be
labelled or classified into mutually
27 Nominal data exclusive categories within a
variable. These categories cannot
be ordered in a meaningful way.
28 Ordinal data Ordinal data classifies data while
introducing an order, or ranking. For
instance, measuring economic
status using the hierarchy:
‘wealthy’, ‘middle income’ or ‘poor.’
However, there is no clearly defined
interval between these categories.
a variable is a value that can
change, depending on conditions or
29 Variable on information passed to the
program

Frequency distributions are visual

displays that organise and present
Frequency
30 frequency counts so that the
distribution
information can be interpreted
more easily.
In statistics, an outlier is a data
point that differs significantly from
other observations. An outlier may
31 Outliers
be due to a variability in the
measurement, an indication of
novel data
The cumulative distribution function
gives the probability that the
F(x)
random variable X is less than or
Cumulative
32 equal to x and is usually denoted
distribution F(x)=P[X
F(x). The cumulative distribution
≤x].
function of a random variable X is
the function given by F(x)=P[X≤x].
A graph can be defined as a
pictorial representation or a
diagram that represents data or
33 Graph values in an organized manner. The
points on the graph often represent
the relationship between two or
more things.
A histogram is a graphical
representation of the distribution of
data. The histogram is represented
34 Histogram
by a set of rectangles, adjacent to
each other, where each bar
represents a kind of data.
Collection of data with predictive
35 Table
model
Average is a numeric value in
Mathematics that is used to
represent a large amount of data. It
36 Average
uses a single number to represent
all the other numbers that you
might find in a large data set.
37 Variability Variability, almost by definition,
is the extent to which data points in
a statistical distribution or data set
diverge
Standard deviation is considered to
be a powerful tool to measure
dispersion. Effectively dispersion
38 Standard
means the value by which items
differ from a certain item, in this
case, arithmetic mean.
Mode is said to be one of the
39 Mode measures of central tendency to
determine the value of a set of data
Median is defined as the middle
40 Median value in a given set of numbers or
data
Being four more than thirty.
Synonyms: thirty-four, xxxiv
41 Mean
cardinal. Being or denoting a
numerical quantity but not order.
The range of a data set is the
difference between the greatest
42 Range
value and lowest value within a
collection of numbers.
So, there are 3 quartiles. First
Quartile is denoted by Q1 known as
Interquartile the lower quartile, the second
43
range Quartile is denoted by Q2 and the
third Quartile is denoted by
Q3 known as the upper quartile.
A curve is a shape or a line which is
smoothly drawn in a plane having a
44 Curve bent or turns in it. For example, a
circle is an example of curved-
shape.
A z-score measures exactly how
45 z- score many standard deviations above or
below the mean a data point is.
The standard deviation is the
average amount of variability in
Standard
46 your dataset. It tells you, on
deviation
average, how far each value lies
from the mean.
The degrees of freedom in a
statistical calculation represent how
Degrees of
47 many values involved in a
freedom
calculation have the freedom to
vary.
A Discrete Variable has a certain
Discrete
48 number of particular values and
variable
nothing else.
49 Continuous A continuous variable is defined as
a variable which can take an
variable uncountable set of values or infinite
set of values.
Proportion is simply saying we have
50 Proportion
a relationship between two things.
III- DESCRIBING RELATIONSHIP
The standard error is calculated
SEE=
by dividing the standard deviation
SD/SQRT
Computation of by the sample size's square root. It
(number
51 Standard Error gives the precision of a sample
of
of Estimate mean by including the sample-to-
measure
sample variability of the sample
ment )
means.
R2 is a statistical measure that
determines the proportion of the
Interpretation of
52 variation in the dependent variable
r2
that can be described by the
independent variable.
Multiple It is a method to predict the
53 regression dependent variable with the help of
equations two or more independent variables.
Regression It refers to the tendency for scores,
54 towards the particularly extreme scores to
mean shrink toward the mean.
It is a table which displays the
correlation coefficients for different
Correlation
55 variables. The matrix depicts the
matrix
correlation between all the possible
pairs of value in a table.
It is a term used to describe a
Linear
56 straight line relationship between
relationship
two variables.
Multiple linear regression (MLR),
also known simply as multiple
Multiple Y=mX1+
regression, is a statistical technique
57 regression mX2+mX
that uses several explanatory
equation 3+b
variables to predict the outcome of
a response variable.
Homoscedasticity or homogeneity
Homoscedasticit of variance is an assumption of
58
y equal or similar variances in
different groups being compared.
Regression is used to predict trends
Partial least
59 in data as multiple regression
square
analysis.
It is used to estimate the
Simple linear
60 relationship between two
regression
quantitative variable.
61 Clusters The data points in a scatter plots
form distinct groups. These groups
are called as clusters.
Correlation Pearson correlation, kendall rank
Coefficients correlation, spearman rank
62
based on types correlation, point biserial
of relationships correlation, cramers V correlation
It measures the relationship
63 Correlation
between two variables.
 Positive correlation
Types of
64  Negative correlation
correlation
 No correlation
 Prediction
Need for  Validity
65
correlation  Reliability
 Theory Verification
It is a graph combining a cluster of
66 Scatterplots dots that represents all pairs of
scores.
It indicates one event is the result
of occurrence of the other event
67 Causation
which is referred as cause and
effect.
Relationship between variables
Nonlinear whose scatterplots does not
68
relationship resemble a straight line. It may
resemble a curve or inverted-U
 Quadratic relationship
Types of  Cubic relationship
69 Nonlinear  Exponential relationship
relationship  Logarithamic relationship
 Cosine relationship
A Data point is called Outlier if it
70 Outlier
does not fit the pattern.
It is the relationship between the
dependent variable and a series of
71 Regression
other variables known as
independent variable.
Types of
 Linear Model
72 regression
 Non Linear Model
models
Restricted It refers to the range of values that
73
Range has been condensed or shortened.
It shows connection between a data
74 Regression Line sets in a scatterplots which is best
trend of a given datasets.
It occurs whenever regression
Regression
75 towards the mean is interpreted as
Fallacy
real effect, rather than a chance.
IV – PYTHON LIBRARIES FOR DATA WRANGLING
76 Numpy array NumPy is used to work with arrays.
The array object in NumPy is called
ndarray. We can create a NumPy
ndarray object by using the array()
function.
A Python library is a collection of
related modules. It contains
77 Library
bundles of code that can be used
repeatedly in different programs.
Data wrangling ensures data is
reliable and complete before
professionals analyse it and use it to
78 Data wrangling
create insights. Thanks to this
process, those insights are based on
accurate, high-quality data.
Dynamic data or transactional
data is information that is
periodically updated, meaning it
79 Dynamic data
changes asynchronously over time
as new information becomes
available.
Python Lists are just like
dynamically sized arrays, declared
in other languages (vector in C++
80 List and Array List in Java). In simple
language, a list is a collection of
things, enclosed in [ ] and
separated by commas.
Database replication is the frequent
electronic copying of data from a
database in one computer
81 Replication
or server to a database in another --
so that all users share the same
level of information.
A data join is when two data sets
are combined in a side by side
82 Joining manner, therefore at least one
column in each data set must be
the same.
An aggregation is a collection, or
the gathering of things together.
83 Aggregation Your baseball card collection might
represent the aggregation of lots of
different types of cards.
Joining together two or more things
into a large one. In database
parlance, the things being joined
84 Concatenation
are generally two table fields which
may be from the same or different
tables.
85 Scalar The physical quantities which are
specified with the magnitude or size
alone are scalar quantities. For
example, length, speed, work,
mass, density, etc.
Comparison operators in Python,
also called relational operators,
are used to compare two operands.
86 Comparison They return a Boolean True or False
depending on whether the
comparison condition is true or
false.
Boolean logic takes two statements
or expressions and applies a logical
87 Boolean Logic operator to generate a Boolean
value that can be either true or
false.
An index is a method to track the
88 Indexing performance of a group of assets in
a standardized way.
Structured arrays are ndarrays
whose data type is a composition of
89 Structured array
simpler data types organized as a
sequence of named fields.
Data manipulation is the process of
Data arranging a set of data to make it
90
manipulation more organized and easier to
interpret.
Pandas is a fast, powerful, flexible
and easy to use open source data
91 Pandas analysis and manipulation tool,
built on top of
the Python programming language.
A Pandas Data Frame is a 2
dimensional data structure, like a 2
92 Data Frame
dimensional array, or a table with
rows and columns.
To conform DataFrame to a new
Reindexing in Index with optional filling logic,
93
Pandas placing NA/NaN in location having
no value in the previous index.
 Pandas Series
94 Pandas Objects  Pandas DataFrame
 Index
 Merge and Join Datasets
Features of  Indexing and Subsetting data
95
Pandas  Arrays into Multidimensional
data
 isnull()
Operations on  notnull()
96
Null Values  dropnull()
 fillna()
97 Combining  concat()
Datsets  append()
Methods
Relational algebra refers to a
procedural query language that
Relational
98 takes relation instances as input
Algebra
and returns relation instances as
output.
Grouping of data plays a significant
role when we have to deal with
99 Grouping large data. This information can
also be displayed using
a pictograph or a bar graph.
A PivotTable is a powerful tool to
calculate, summarize, and analyze
data that lets you see comparisons,
100 Pivot table patterns, and trends in your
data. PivotTables work a little bit
differently depending on what
platform you are using to run Excel.
V-DATA VISUALIZATION
It is a multiplatform data
101 Matplotlib visualization library built on numpy
arrays.
Interfaces of MATLAB style state based interface,
102
Matplotlib Object oriented interface.
Line plots id used to represent the
103 Line plots relation between two data X and Y
on a different axis.
Scatter() method in the matplotlib
library is used to draw a scatter
plot. Scatter plots are used to
104 Scatter plots
visualize the relation among
variables and how change in one
affects the other variable.
Continuous error bands are a
graphical representation of error or
Continuous
105 uncertainity as a shaded region
errors
around a main trace, rather than as
discrete whisker like error bars.
These are the methods to show a
106 Contour plots three dimensional surface on a two
dimensional plane.
Histogram is a graph showing
107 frequency distribution. Is shows the
Histograms
number of observations within each
given interval.
These are groups of smaller axes
108 subplots that can exist together within a
single figure.
3D plots are enabled by importing
109 3D plotting the mplot3d toolkit, included with
the main matplotlib installation.
Fig=[Link]()
Ax=[Link](projection=’3d’)
Syntax for wire
110 Ax.plot_wireframe(X,Y,Z,color=’red’
frame
)
Ax.set_title(‘wireframe’)
It is a visualization tool to measure
data distributions. It can be
112 Density plot
considered as a smoothed
histogram.
Import numpy as np
X=[Link](0,10,100)
Code for draw
113 Fig=[Link]()
sine &cos wave
[Link](x,[Link](x),’-‘)
[Link](x,[Link](x),’-‘);
Kernel Density Estimation is one of
114 KDE the technique used to smooth a
histogram.
Pseudo It gives better properties near the
115 cylindrical poles of the projection
projection
It projects the map onto a single
cone and is then unrolled. This can
lead to very good local properties,
116 Conic projection
but regions far from the focus point
of the cone may come very
distorted.
The lines of constant latitude and
Cylindrical longitude are mapped to horizontal
117
Projections and vertical lines called as
cylindrical projections.
It is used to project a spherical map
such that of earth, onto a flat
118 Map projections
surface without distorting it or
breaking its continuity.
 Setting rcparams at runtime
Customize
119  Using style sheets
matplotlib
 Changing your matplotlibrc file
 Sequential
Classes of color
 Diverging
120 maps in scatter
 Cyclic
plot
 Qaualitative
[Link]
(x_axis_data, y_axis_data, s=none,
Syntax for c=none,marker=none, cmap=none,
121
scatter() vmin=none, vmax=none,
alpha=none, linewidth=none,
edgecolors=none)
It indicates the estimated error or
122 Error bars uncertainity to show how precise a
measurement/analytical model.
123 Bar plots Used to aggregate the categorical
data according to some methods
and by default it’s the mean.
Factor plots allows to visualize the
distribution of a parameter within
124 Factor plots
bins defined by any other
parameter.
Plot pairwise relationships in a
datasets. This is a high-level
125 Pair plots interface for PairGrid that is
intended to make it easy to draw a
few common styles.
Placement Questions
Data science is the area of study
which involves extracting insights
126 Data Science from vast amounts of data using
various scientific methods,
algorithms, and processes.
A dummy variable is a numerical
Dummy variable used in regression analysis
127
variables to represent subgroups of the
sample in the study.
Tools for model R and PL/R, Octave, WEKA, Python,
128
building SQL, MADLib
It is a document that lays out the
project vision, scope, objectives,
129 Project charter
project team and their
responsibilities.
The definition of big data is data
that contains greater variety,
130 Big data
arriving in increasing volumes and
with more velocity.
Data is a collection of discrete or
continuous values that convey
information, describing the
131 Data quantity, quality, fact, statistics,
other basic units of meaning, or
simply sequences of symbols that
may be further interpreted formally
Nominal data is data that can be
labelled or classified into mutually
132 Nominal data exclusive categories within a
variable. These categories cannot
be ordered in a meaningful way.
Ordinal data classifies data while
133 Ordinal data
introducing an order, or ranking.
A histogram is a graphical
representation of the distribution of
data. The histogram is represented
134 Histogram
by a set of rectangles, adjacent to
each other, where each bar
represents a kind of data.
A continuous variable is defined as
Continuous a variable which can take an
135
variable uncountable set of values or infinite
set of values.
It is the relationship between the
dependent variable and a series of
136 Regression
other variables known as
independent variable.
It measures the relationship
137 Correlation
between two variables.
It is the relationship between the
dependent variable and a series of
138 Regression
other variables known as
independent variable.
Restricted It refers to the range of values that
139
Range has been condensed or shortened.
It shows connection between a data
140 Regression Line sets in a scatterplots which is best
trend of a given datasets.
NumPy is used to work with arrays.
The array object in NumPy is called
141 Numpy array ndarray. We can create a NumPy
ndarray object by using the array()
function.
A Python library is a collection of
related modules. It contains
142 Library
bundles of code that can be used
repeatedly in different programs.
Data wrangling ensures data is
reliable and complete before
143 Data wrangling
professionals analyse it and use it to
create insights.
Pandas is a fast, powerful, flexible
and easy to use open source data
144 Pandas analysis and manipulation tool built
on top of the Python programming
language.
A Pandas Data Frame is a 2
dimensional data structure, like a 2
145 Data Frame
dimensional array, or a table with
rows and columns.
It is a multiplatform data
146 Matplotlib visualization library built on numpy
arrays.
It is a visualization tool to measure
data distributions. It can be
147 Density plot
considered as a smoothed
histogram.
These are the methods to show a
148 Contour plots three dimensional surface on a two
dimensional plane.
It provides an API of matplotlib for
149 Seaborn plot style, color defaults, statistical
plot types in Pandas DataFrame.
 Matplot is connected with Numpy
Difference
and Pandas by graphics
between
packages in visualization
150 Matplotlib and
 Seaborn is more comfortable in
Seaborn in
handling Pandas DataFrames
Visualization

Faculty Prepared HoD

Principal

Statistics Fundamentals for Data Science
No ratings yet
Statistics Fundamentals for Data Science
2 pages
EDA Concepts in Data Analysis
No ratings yet
EDA Concepts in Data Analysis
19 pages
OLAP in Descriptive Analytics
No ratings yet
OLAP in Descriptive Analytics
19 pages
Business Analytics Essentials Guide
No ratings yet
Business Analytics Essentials Guide
6 pages
Understanding Exploratory Data Analysis
No ratings yet
Understanding Exploratory Data Analysis
41 pages
Overview of Statistics and Data Types
No ratings yet
Overview of Statistics and Data Types
10 pages
Nominal Scale of Measurement Explained
No ratings yet
Nominal Scale of Measurement Explained
9 pages
Understanding Data Objects and Attributes
No ratings yet
Understanding Data Objects and Attributes
74 pages
Data Science Techniques in R
No ratings yet
Data Science Techniques in R
41 pages
EDA and Descriptive Statistics Guide
No ratings yet
EDA and Descriptive Statistics Guide
40 pages
Understanding Data in Machine Learning
No ratings yet
Understanding Data in Machine Learning
86 pages
Data Objects and Attribute Types Explained
No ratings yet
Data Objects and Attribute Types Explained
28 pages
Understanding Data and Measurement Scales
No ratings yet
Understanding Data and Measurement Scales
17 pages
Data Analytics and Statistical Modeling
No ratings yet
Data Analytics and Statistical Modeling
77 pages
Module 6
No ratings yet
Module 6
54 pages
Understanding Agent Risk Profiles
No ratings yet
Understanding Agent Risk Profiles
321 pages
Data Types and Statistical Analysis Overview
No ratings yet
Data Types and Statistical Analysis Overview
39 pages
Understanding Data Processing and Analysis
No ratings yet
Understanding Data Processing and Analysis
132 pages
Proximity Measure for Binary Attributes
No ratings yet
Proximity Measure for Binary Attributes
67 pages
Understanding Data Models and Visualization
No ratings yet
Understanding Data Models and Visualization
59 pages
Stastical Concepts
No ratings yet
Stastical Concepts
89 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
143 pages
Data Science Fundamentals Explained
No ratings yet
Data Science Fundamentals Explained
216 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
14 pages
Understanding Data Attributes and Measures
No ratings yet
Understanding Data Attributes and Measures
7 pages
IDS Unit 2
No ratings yet
IDS Unit 2
7 pages
Data Analysis Basics and Techniques
No ratings yet
Data Analysis Basics and Techniques
12 pages
Data Analytics Techniques Overview
No ratings yet
Data Analytics Techniques Overview
35 pages
Data - Course Notes
No ratings yet
Data - Course Notes
5 pages
Unit 1
No ratings yet
Unit 1
143 pages
Unit Iv
No ratings yet
Unit Iv
29 pages
Machine Learning Types and Data Preprocessing
No ratings yet
Machine Learning Types and Data Preprocessing
30 pages
Chap 02 Data
No ratings yet
Chap 02 Data
115 pages
Data Management in Modern Mathematics
No ratings yet
Data Management in Modern Mathematics
13 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
7 pages
Data Science Course Fees Overview
No ratings yet
Data Science Course Fees Overview
83 pages
Understanding Data Types in Statistics
No ratings yet
Understanding Data Types in Statistics
2 pages
Types of Analytics and Data Science Workflow
No ratings yet
Types of Analytics and Data Science Workflow
13 pages
Unit 1-Data Foreseeing
No ratings yet
Unit 1-Data Foreseeing
17 pages
Data Mining Group Project Guidelines
No ratings yet
Data Mining Group Project Guidelines
50 pages
Week3 Lecture Slides
No ratings yet
Week3 Lecture Slides
93 pages
Understanding Data Types and Analysis
No ratings yet
Understanding Data Types and Analysis
25 pages
Importance of Standard Deviation in Statistics
No ratings yet
Importance of Standard Deviation in Statistics
4 pages
Understanding Data and Its Types
No ratings yet
Understanding Data and Its Types
51 pages
Data Exploration and Pre-processing Guide
No ratings yet
Data Exploration and Pre-processing Guide
21 pages
Data Science Process Overview
No ratings yet
Data Science Process Overview
69 pages
15 Basic Statistics Concepts Every Data Science Beginner Should Know
No ratings yet
15 Basic Statistics Concepts Every Data Science Beginner Should Know
11 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
Data Management Essentials Explained
No ratings yet
Data Management Essentials Explained
16 pages
Info 561 Notes
No ratings yet
Info 561 Notes
8 pages
Nature of Data in Descriptive Analytics
No ratings yet
Nature of Data in Descriptive Analytics
27 pages
Statistics Glossary: Grouped & Ungrouped
No ratings yet
Statistics Glossary: Grouped & Ungrouped
3 pages
Understanding Descriptive Analytics
No ratings yet
Understanding Descriptive Analytics
6 pages
Statistics Glossary for Data Science
No ratings yet
Statistics Glossary for Data Science
5 pages
Understanding Descriptive Analytics
No ratings yet
Understanding Descriptive Analytics
18 pages
Understanding Data Types and Quality
No ratings yet
Understanding Data Types and Quality
66 pages
Measurement Scales and Data Analysis
No ratings yet
Measurement Scales and Data Analysis
3 pages
Data Mining and Analytics Overview
No ratings yet
Data Mining and Analytics Overview
46 pages
Business Statistics: Key Concepts Explained
No ratings yet
Business Statistics: Key Concepts Explained
12 pages
GMR Airport Development in India
No ratings yet
GMR Airport Development in India
53 pages
NCQC Quality Circle Knowledge Test 2019
No ratings yet
NCQC Quality Circle Knowledge Test 2019
3 pages
EDA on Wine Dataset by Sakshi Barapatre
No ratings yet
EDA on Wine Dataset by Sakshi Barapatre
36 pages
Probability and Statistics Quiz Questions
No ratings yet
Probability and Statistics Quiz Questions
18 pages
Confidence Intervals for Mean Estimation
No ratings yet
Confidence Intervals for Mean Estimation
29 pages
FML Definitions in Machine Learning
No ratings yet
FML Definitions in Machine Learning
15 pages
Data Measurement and Analysis Techniques
No ratings yet
Data Measurement and Analysis Techniques
20 pages
When in Rome 1st Edition Sarah Adams Newest Edition 2025
0% (1)
When in Rome 1st Edition Sarah Adams Newest Edition 2025
148 pages
Statistical Inference for Two Populations
No ratings yet
Statistical Inference for Two Populations
33 pages
PhD Viva Voce Examination Guide
No ratings yet
PhD Viva Voce Examination Guide
15 pages
Estimating Multiple Linear Regression
No ratings yet
Estimating Multiple Linear Regression
18 pages
Study Plan (Updated)
0% (1)
Study Plan (Updated)
1 page
Understanding Review of Related Literature
No ratings yet
Understanding Review of Related Literature
8 pages
Introduction to Statistical Modelling
No ratings yet
Introduction to Statistical Modelling
16 pages
Pooled Variance in T-Test Analysis
No ratings yet
Pooled Variance in T-Test Analysis
35 pages
Work Engagement's Impact on Educator Turnover
No ratings yet
Work Engagement's Impact on Educator Turnover
16 pages
ML Algorithms for DDoS Detection in SDN
No ratings yet
ML Algorithms for DDoS Detection in SDN
13 pages
Jones & Harris 1967: Attitude Attribution
No ratings yet
Jones & Harris 1967: Attitude Attribution
24 pages
Forecasting Demand Analysis in MSC301
No ratings yet
Forecasting Demand Analysis in MSC301
13 pages
IIT Kharagpur MA60056 Exam Questions
No ratings yet
IIT Kharagpur MA60056 Exam Questions
2 pages
GameTruck and GameTrailer Spawns
No ratings yet
GameTruck and GameTrailer Spawns
9 pages
Inference About a Mean in Biostatistics
No ratings yet
Inference About a Mean in Biostatistics
37 pages
Crime Prediction Model for Nigeria
No ratings yet
Crime Prediction Model for Nigeria
13 pages
Probability and Expectation Solutions
No ratings yet
Probability and Expectation Solutions
7 pages
Convolutional Deep Belief Networks
No ratings yet
Convolutional Deep Belief Networks
8 pages
Socio-Economic Impact of Somali Refugees
No ratings yet
Socio-Economic Impact of Somali Refugees
44 pages
Pre-Dispatch Inspection Report Format
No ratings yet
Pre-Dispatch Inspection Report Format
2 pages
Understanding Educational Statistics
No ratings yet
Understanding Educational Statistics
3 pages
Queueing Simulation Examples by Güneş
No ratings yet
Queueing Simulation Examples by Güneş
30 pages
Naïve Bayes Spam Classifier Lab Guide
No ratings yet
Naïve Bayes Spam Classifier Lab Guide
4 pages

Foundations of Data Science Concepts

Uploaded by

Foundations of Data Science Concepts

Uploaded by

MUST KNOW CONCEPTS

Course Code & Course Name : CS3352 & Foundations of Data

Frequency distributions are visual

Faculty Prepared HoD

You might also like