0% found this document useful (0 votes)

15 views64 pages

Introduction to Data Modeling Basics

This document provides an introduction to statistics, focusing on data modeling, parameter estimation, and descriptive statistics. It explains key concepts such as point and interval estimation, methods for estimating parameters like Maximum Likelihood Estimation and Method of Moments, and various measures of central tendency, spread, and shape. Additionally, it discusses the importance of graphical representations in making data more comprehensible.

Uploaded by

monahire10_395249603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views64 pages

Introduction to Data Modeling Basics

Uploaded by

monahire10_395249603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit 1 Introduction to Data

Modeling

STATISTICS
BASIC of STATISTICS
Statistics is a branch of mathematics that deals with collecting, organizing, and
analyzing numerical data to solve real-world problems. It is widely regarded as a distinct
scientific discipline due to its vast applications across various fields.
● Helps make sense of complex data through quantitative models.
● Plays a critical role in decision-making in fields like weather forecasting, stock
market analysis, insurance, and data science.
PARAMETER ESTIMATION
● Parameter estimation is all about figuring out the unknown
values in a mathematical model based on data we have
collected.
● In parameter estimation, we use sample data to estimate
the characteristics (parameters) of a larger population. A
parameter is a measurable characteristic of a population
(such as the population mean, variance, or proportion).
● Since we usually cannot collect data from the entire
population, we rely on samples to estimate these
parameters.
● The goal of parameter estimation is to provide the best
possible estimate of these values, given the data we have.
● Parameter estimation involves creating mathematical
models that represent real-world processes.
● The parameters are the variables within these models that
need to be estimated from data. For instance, in a simple
linear regression model, the slope and intercept are
parameters that can be estimated using data points.
PARAMETER ESTIMATION
Types of Parameter Estimation
● Point Estimation
● Interval Estimation

Point Estimation

Point estimation provides a single best guess of a parameter's

value. The result is one number that is considered the best
approximation of the population parameter based on the
sample data.

• For example, if we want to estimate the average height of

students in a school, the sample mean (average) we
calculate from a group of students serves as the point
estimate of the population mean.
PARAMETER ESTIMATION

• Interval estimation provides a range of values within which

the true parameter likely falls. This is more informative than
point estimation because it includes a measure of
uncertainty
.
• For instance, if we estimate that the average height of
students in a school is between 150 cm and 160 cm with
95% confidence, it means that we are 95% sure that the
true population mean lies within this interval.
PARAMETER ESTIMATION
There are various methods for estimating parameters, each suited for different
types of data and situations. Here, we will discuss three common methods:
● Maximum Likelihood Estimation (MLE)
● Method of Moments

Maximum Likelihood Estimation (MLE)

● MLE is one of the most popular and widely used methods of parameter
estimation. The idea behind MLE is to ﬁnd the parameter values that maximize
the likelihood function, which represents the probability of observing the given
sample data.
● The parameter value that maximizes the likelihood function is considered the
best estimate. MLE works well for large sample sizes and has desirable
statistical properties, such as being consistent (the estimate gets closer to the
true value as the sample size increases).
PARAMETER ESTIMATION
Method of Moments

• The method of moments is a simpler, less computationally intense

approach than MLE. It is based on the idea that sample moments (such as
the sample mean, variance, etc.) can be used to estimate the population
moments (such as the population mean, variance, etc.).
• While not as precise as MLE, the method of moments is often easier to
apply, especially for small datasets.
Descriptive Statistics

❑ Methods of describing the characteristics of a data set.

❑ Useful because they allow you to make sense of the data.
❑ Helps exploring and making conclusions about the data in
order to make rational decisions.

❑ Includes calculating things such as the average of the data,

its spread and the shape it produces.
Descriptive Statistics

❑ Descriptive statistics involves describing, summarizing

and organizing the data so it can be easily understood.
❑ Graphical displays are often used along with the
quantitative measures to enable clarity of communication.
Describing Data
• Qualitative data-
the variable which yield non numerical data.

– E.g.- education, marital status, eye colour

– Frequency- number of observations falling into particular

class/ category of the qualitative variable.

– Frequency distribution- table listing all classes & their

frequencies.

– Graphical representation- Pie chart, Bar graph.

Describing Data
• Quantitative data-
– Can be presented by a frequency distribution.
– If the discrete variable has a lot of different values, or if the data is a
continuous variable then data can be grouped into classes/
categories.

– Class interval / BINS- covers the range between maximum &

minimum values.
– Class limits- end points of class interval.
– Class frequency- number of observations in the data that belong
to each class interval.

– Usually presented as a Histogram or a Bar graph.

Frequency Distribution and
Histogram
Descriptive Statistics
❑ When analyzing a graphical display, you can draw
conclusions based on several characteristics of the graph.
❑ You may ask questions such ask:
• Where is the approximate middle, or center, of the graph?
• How spread out are the data values on the graph?
• What is the overall shape of the graph?
• Does it have any interesting patterns?
Descriptive Statistics
The following measures are used to describe a data set:
❑ Measures of position (also referred to as central tendency
or location measures).
❑ Measures of spread (also referred to as variability or
dispersion measures).
❑ Measures of shape.
Descriptive Statistics
❑ If assignable causes of variation are affecting the process,
we will see changes in:
• Position.
• Spread.
• Shape.
• Any combination of the three.
Properties of
Numerical data &
Measures

Central Dispersio Shape

tendency n

Mean Range Skewness

Median Standard Kurtosis

Deviation

Mode
Descriptive Statistics
Measures of Position:
❑ Position Statistics measure the data central tendency.
❑ Central tendency refers to where the data is centered.
❑ You may have calculated an average of some kind.
❑ Despite the common use of average, there are different
statistics by which we can describe the average of a data
set:
• Mean.
• Median.
• Mode.
Measures of center
❑ Central tendency- In any distribution, majority of the observations pile
up, or cluster around in a particular region.

❑ Mean- sum of observed values in a data divided by the

number of observations

❑ Median- observation in the data set that divides the data set intohalf.

❑ Mode- value of the data set which occurs with greatest frequency

❑ Mean & Median can be applied only to Quantitative data

❑ Mode can be used either to Qualitative or Quantitative data.

❑ Outlier- observation that falls far from the rest of the data. Mean
gets highly influenced by the outlier.
Descriptive Statistics
Mean:
❑ The total of all the values divided by the size of the data set.
❑ It is the most commonly used statistic of position.
❑ It is easy to understand and calculate.
❑ It works well when the distribution is symmetric and there
are no outliers.
❑ The mean of a sample is denoted by ‘x-bar’.
❑ The mean of a population is denoted by ‘μ’.

Mean

0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Median:
❑ The middle value where exactly half of the data values
are
above it and half are below it. Median
Mean
❑ Less widely used.
❑ A useful statistic due to its 0 1 2 3 4 5
6 7 8 9
robustness.
❑ It can reduce the effect of outliers.

❑ Often used when the data is nonsymmetrical.

❑ Ensure that the values are ordered before calculation.
❑ With an even number of values, the median is the mean of
the two middle values.
Descriptive Statistics
Median
Calculation:
Example
23 12
1,2,1,1,3,4,100
33 30
34 31
36 37 Mean = 16
38 38 median = 2
40 40
41 41 mode = 1
41 41
44 44 Assume 100 is an outlier
45 Mean =2
Median = 38 + 40 / 2 = 39
median = 1.5
mode = 1
Descriptive Statistics
❑ Why can the mean and median be
different?

Median Mean

0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Mode:
❑ The value that occurs the most often in a data set.
❑ It is rarely used as a central tendency measure
❑ It is more useful to distinguish between unimodal
and multimodal distributions
• When data has more than one peak.
Normal distribution
❑ Bell shaped symmetric distribution.
❑ Why is it important?
❑ Many things are normally distributed, or very close to it.
❑ It is easy to work with mathematically
❑ Most inferential statistical methods make use of properties of
the normal distribution.

❑ Mean = Median = Mode

Descriptive Statistics
Measures of Spread:
❑ The Spread refers to how the data deviates from the
position measure.
❑ It gives an indication of the amount of variation in the
process.
• An important indicator of quality.
• Used to control process variability and improve quality.
❑ All manufacturing and transactional
Spread
processes are variable to some
degree.
❑ There are different statistics by which
we can describe the spread of a data
set:
• Range.
• Standard deviation.
❑ Range- difference between the largest observed value in the data set
and the smallest one.
❑ So, while considering range great deal of information is ignored.

❑ Standard deviation- it is a kind of average of the absolute deviation

of observed values from the mean of the variable.
❑ It is defined using the sample mean & values get strongly
affected by few extreme observations.

❑ Variance- square of standard deviation

Descriptive Statistics
Standard Deviation:
❑ The average distance of the data points from their own
mean.
❑ A low standard deviation indicates that the data points
are clustered around the mean.
❑ A large standard deviation indicates that they are
widely scattered around the mean.
❑ The standard deviation of a sample
is denoted by ‘s’.
❑ The standard deviation of a
population is denoted by “μ”.
Descriptive Statistics
Standard Deviation:
❑ Perceived as difficult to understand because it is not easy
to picture what it is.
❑ It is however a more robust measure of variability.
❑ Standard deviation is computed as follows:

∑ (x–x
s )2
= n-1 Mean (x-bar)

s = standard
deviation x = mean
x = values of the data set
n = size of the data set
Descriptive Statistics
Range:
❑ The difference between the highest and the lowest values.
❑ The simplest measure of variability.
❑ Often denoted by ‘R’.
❑ It is good enough in many practical cases.
❑ It does not make full use of the available data.
❑ It can be misleading when the data is skewed or in the
presence of outliers.
• Just one outlier will
increase the range 0 1 2 3 4 5
dramatically. 6 7 8 9
Range
Descriptive Statistics
Measures of Shape:
❑ Data can be plotted into a histogram to have a general idea
of its shape, or distribution.
❑ The shape can reveal a lot of information about the data.
Shape
❑ Skewness- Lack of symmetry in distribution. It can be interpreted
from frequency distribution.

❑ Properties-
❑ Mean, median & mode fall at different points.
❑ Curve is not symmetrical but stretched more to one side.

❑ Distribution may be positively or negatively skewed. Limits

for coefficient of skewness is ± 3.

❑ Kurtosis- convexity of a curve.

❑ Gives an idea about the flatness/ peakedness of the curve.
❑ Gives an idea about how much weights are at the tail end of the
distribution
Descriptive Statistics
Measures of Shape:
❑ It may be symmetrical or nonsymmetrical.
❑ In a symmetrical distribution, the two sides of the
distribution are a mirror image of each other.
❑ Examples of symmetrical distributions include:
• Uniform.
• Normal.
• Camel-back.
Descriptive Statistics
Measures of Shape:
❑ The shape helps identifying which descriptive statistic is
more appropriate to use in a given situation.
❑ If the data is symmetrical, then we may use the mean or
median to measure the central tendency as they are almost
equal.
❑ If the data is skewed, then the median will be a
more appropriate to measure the central tendency.
❑ Two common statistics that measure the shape of the data:
• Skewness.
• Kurtosis.
Descriptive Statistics
Skewness:
❑ Describes whether the data is distributed symmetrically
around the mean.
❑ A skewness value of zero indicates perfect symmetry.
❑ A negative value implies left-skewed data.
❑ A positive value implies right-skewed data.
XXXXXXX

XXXXXX
X X
XX X X
XXXX

X XX

X
XXX

XX
XX
XX
X

X
X
X

X
X

X
X
X
(+) – SK > 0 (-) – SK < 0
Descriptive Statistics
Kurtosis:
❑ Measures the degree of flatness (or peakness) of the shape.
❑ When the data values are clustered around the middle, then
the distribution is more peaked.
• A greater kurtosis value.
❑ When the data values are spread around more evenly, then
the distribution is more flatted.
• A smaller kurtosis values.

X X X X X X
X
X

X X
X X

X
X

X X X

X X
X X

X X X
X

X
X X

X X
X
X
X
X

X
X
X
X
X

X
(-) (0) (+)
Platykurtic Mesokurtic Leptokurtic
Descriptive Statistics
Further Information:
❑ Variance is a measure of the variation around the mean.
❑ It measures how far a set of data points are spread out
from their mean.
❑ The units are the square of the units used for the original
data.
• For example, a variable measured in meters will have a
variance measured in meters squared. Variance = s2
❑ It is the square of the standard
deviation.
PRESENTATION OF DATA

• Classification and tabulation reduce the

complexity of vast and complicated statistical
data but still it is not easy to interpret the
tabulated data.
• Diagrams and graphs will catch the eye more
easily than tables which provide array of
figures.
• A glance over a graph or diagram will enable any
layman (without statistical knowledge) to get an
idea about the essential characteristics of the
tabulated data without much strain or effort.
FUNCTIONS OF DIAGRAMS AND GRAPHS

• It will attract the attention of a large number of

persons.
• They carry a “birds – eye view” impression in the
human mind.
• It saves a lot of valuable time if presented in a form
of suitable charts & graphs instead of pages of
numerical figures.
• To facilitate comparison between two or more
sets of data.
• Prediction equations can be represented by
graphs and these will be much helpful in
forecastings.
LIMITATIONS OF DIAGRAMS AND
GRAPHS
• They are approximate indicators.
• Exact and accurate information's can be
obtained from original tabular information.
• They cannot substitute the tabular information.
• They fail to disclose small difference when large
figures are involved.
GRAPHICAL REPRESENTATION OF
DATA
Graphical representation is done when the data
are classified in the form of a frequency
distribution. The different graphs are
• Histogram
• Frequency Polygon
• Frequency Curve
• Ogive
• Lorenz Curve
Histogram
• It is a vertical bar diagram without gap between the
bars.
• It consists of bars erected over the true class interval,
their areas being proportional to the frequencies of the
respective classes.
• Since the intervals are of equal width, the height of
each bar serves as a measure of the corresponding
frequency.
• Draw the two diagonals in the highest modal class
rectangles at its top corner to the pre and post modal
rectangle corners and the x co-ordinate of the point of
intersection is the mode.
Frequency Polygon
• If points are plotted with the x co-ordinate equal to the
mid value
of the class intervals and the corresponding frequencies as
the y co-ordinate and these points are joined by means of a
straight line, we obtain frequency polygon.
• These points are the midpoints of the top of the bars in the
histogram.
Frequency Curve
• If points are plotted with the x co-ordinate
equal to the mid value of the class intervals
and the corresponding frequencies as the y
co-ordinate and these points are

joined bymeans of smooth curve then we

a frequency curve. get

A Frequency Curve is a smooth curve which

corresponds to the limiting case of a
histogram computed for a frequency
distribution of a continuous distribution as the
number of data points becomes very large.
Difference Between Frequency Curve
And Frequency Polygon

• The main difference between a frequency

polygon and a frequency curve is that a
frequency polygon is drawn by connecting
points with a straight line, whereas a
frequency curve is drawn by connecting points
following a curve.
DIAGRAMMATIC REPRESENTATION
OF DATA
• Points to be followed in drawing a diagram

• For each diagram, a suitable short heading

should be given.
• It should be drawn to exhibit the statistical
matter clearly. It should be such as to
allow its significant feature to be clearly
shown out by adopting suitable scale and
will depend upon the space available.
• Diagram should be drawn accurately with
the help of drawing instruments.
• Colouring and different markings should be done
with pencil or with colours.
• Different colours or marks or dottings are used
to show different items. In such cases legend
should be given for the column and item it refers.
In doing so, we should see that the visual
impression conveyed by the diagram is not in any
way affected.
• The original data on which the diagram has been
based should be given, if necessary facing the
diagram as this will help the observer to see
the details with clarity.
• Reference to the source of the table should be
provided.
Types of a diagram

• One dimensional diagram

• Line diagram
• Bar Diagram
• Two dimensional (or) Area Diagram
• Pie diagram
• Square diagram and rectangle diagrams
• Three dimensional (or)Volume diagrams
• Cubes
• Spheres, Cylinders etc.
• Pictogram
• Actual pictures
ONE DIMENSIONAL DIAGRAM
Line diagram consisting of curves and lines as well as bars
• Line diagram
• This requires vertical lines to be drawn at equal
intervals each of length proportional to the
magnitude of the variable for the different items.
• It has no width and hence of very poor visual
effect.
• It makes comparison easy although it is less
attractive.
• Bar Diagram
• It is the simplest of all statistical diagrams.
• It consists of bars of equal width (all horizontal or
vertical) standing on a common base line at
equal intervals, the length of the bars being
proportional to the magnitude of the variable
for different items.
Line Bar
Diagram Diagram
BAR DIAGRAM

❖ Sub-divided bar diagram or component bar diagram

• Sometimes the variable is capable of being

sub-divided into two or more component parts each
representing a sub variable.
• In this case, all the bars are subdivided by lines in
the same order so that each subdivision
represents the parts in magnitude in the same
scale.
• They are properly coloured or marked differently for
visual guidance.
Sub-divided bar diagram or component
bar diagram
Superimposed or Multiple bar diagram

• Bars may sometimes be superimposed

for comparative purpose.
• A multiple bar graph shows the relationship
between different values of data.
• Each data value is represented by a column in
the graph. In a multiple bar graph, multiple
data points for each category of data are
shown with the addition of columns.
• These are used also for two or more sets
of interrelated data.
Superimposed or Multiple bar
diagram
❖ Percentage bar diagram

• When the component parts are expressed

in percentages of the whole, the resulting bar
diagram is called a percentage bar diagram.
• In this case all the bars are of equal length.
• A percentage bar chart, bars of length equal
to
100 for each class are drawn in the first step
and sub-divided into the proportion of the
percentage of their component in the second
step.
• The diagram so obtained is called a
percentage component bar chart or
percentage stacked bar chart.
TWO DIMENSIONAL (OR) AREA
DIAGRAM
• Two-dimensional diagrams comprise only two
dimensions (factors), such as length and
width. They are represented by squares,
rectangles, circles, and the variation of the
circle is known as a pie- diagram. Here, each
category is represented by the area proportional
to the data points.
• Pie diagram
• Square diagram and rectangle diagrams
Pie Chart
• The “pie chart” is also known as a “circle chart”,
dividing the circular statistical graphic into sectors
or sections to illustrate the numerical problems.
Each sector denotes a proportionate part of the
whole. To find out the composition of something,
Pie-chart works the best at that time.
• The pie chart is an important type of data
representation. It contains different segments and
sectors in which each segment and sector of a pie
chart forms a specific portion of the
total(percentage). The sum of all the data is
equal to 360°.
• The total value of the pie is always 100%.
Square diagram

• Their areas should be proportional to the magnitudes of

the data.
• For square diagrams, we will have to take the square
root of the given figures which will give the
measurement of the sides of the square. By adopting
suitable scale we can draw squares.
• A square is a quadrilateral with four equal sides.
There are many objects around us that are in the
shape of a square. Each square shape is identified by
its equal sides and its interior angles that are equal to
90°.
Rectangle diagrams
• In the case of rectangle diagrams, if we take
equal breath (width) for the rectangles, then
the areas will be proportional to the lengths and
hence the lengths will be proportional to the
magnitude of the given variables.
PICTOGRAM
• Tabular data can also be represented by pictogram,
cartogram, maps and pictures as these device help in
attracting the attention to statistical matter
which when presented in the ordinary diagrammatic
form is very often ignored.
• Pictograms are diagrams of pictorial or semi-pictorial
nature and are drawn in different sizes according to
scale. Though they are useful in attracting the
attention of the people, they very often lean on
tables, ignoring the pictorial diagrams.
• They cannot be made use of with certain complicated
data.

Understanding Statistics: Types & Applications
No ratings yet
Understanding Statistics: Types & Applications
97 pages
Understanding Statistics: Types & Measures
No ratings yet
Understanding Statistics: Types & Measures
21 pages
Understanding Statistics: Descriptive & Inferential
100% (1)
Understanding Statistics: Descriptive & Inferential
6 pages
Basics of Statistics Overview
No ratings yet
Basics of Statistics Overview
2 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
4 pages
Descriptive Statistics Explained
No ratings yet
Descriptive Statistics Explained
29 pages
Standardized Score Transformation in Statistics
No ratings yet
Standardized Score Transformation in Statistics
81 pages
Introduction to Descriptive Statistics
No ratings yet
Introduction to Descriptive Statistics
34 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
50 pages
Statistics: Data Types and Analysis
No ratings yet
Statistics: Data Types and Analysis
71 pages
Data Science With Python - Lesson 03 - Statistical Analysis and Business Applications
No ratings yet
Data Science With Python - Lesson 03 - Statistical Analysis and Business Applications
57 pages
Understanding Statistics: Types & Uses
No ratings yet
Understanding Statistics: Types & Uses
45 pages
National Diploma in Statistics Outline
No ratings yet
National Diploma in Statistics Outline
3 pages
Biostatistics Overview and Key Concepts
No ratings yet
Biostatistics Overview and Key Concepts
120 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
11 pages
Statistics Essentials for Data Science
100% (2)
Statistics Essentials for Data Science
27 pages
Descriptive Statistics Course Overview
No ratings yet
Descriptive Statistics Course Overview
50 pages
SPSS Data Analysis: Statistics Basics
No ratings yet
SPSS Data Analysis: Statistics Basics
4 pages
Importance of Descriptive Statistics
No ratings yet
Importance of Descriptive Statistics
59 pages
Conceptos Preliminares Estadística
No ratings yet
Conceptos Preliminares Estadística
48 pages
Face To Face (For Final Exam)
No ratings yet
Face To Face (For Final Exam)
72 pages
Understanding Data Types and Statistics
No ratings yet
Understanding Data Types and Statistics
128 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
16 pages
Understanding Data and Variables
No ratings yet
Understanding Data and Variables
31 pages
Statistics Class
No ratings yet
Statistics Class
63 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
146 pages
Data Analysis and Hypothesis Testing Guide
No ratings yet
Data Analysis and Hypothesis Testing Guide
12 pages
Understanding Statistics and Parameters
100% (1)
Understanding Statistics and Parameters
41 pages
Understanding Statistics and Probability
No ratings yet
Understanding Statistics and Probability
48 pages
Fundamentals of Statistical Analysis
100% (1)
Fundamentals of Statistical Analysis
33 pages
Introduction to Statistics and Data Visualization
No ratings yet
Introduction to Statistics and Data Visualization
17 pages
Basic Statistics
100% (10)
Basic Statistics
73 pages
Statistics in Data Science Overview
No ratings yet
Statistics in Data Science Overview
155 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
92 pages
Origin and Meaning of "Statista"
No ratings yet
Origin and Meaning of "Statista"
25 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
6 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
46 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
45 pages
Introduction to Descriptive Statistics
No ratings yet
Introduction to Descriptive Statistics
23 pages
Understanding Data Variables and Statistics
No ratings yet
Understanding Data Variables and Statistics
27 pages
Data Classification and Statistics Overview
No ratings yet
Data Classification and Statistics Overview
6 pages
Understanding Descriptive and Inferential Statistics
No ratings yet
Understanding Descriptive and Inferential Statistics
69 pages
Introduction to Basic Statistics
100% (1)
Introduction to Basic Statistics
64 pages
Business Research Methods Overview
No ratings yet
Business Research Methods Overview
28 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
15 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
72 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
2 pages
Mean Calculation for Successes in Sample
No ratings yet
Mean Calculation for Successes in Sample
510 pages
Central Tendency and Dispersion Explained
No ratings yet
Central Tendency and Dispersion Explained
20 pages
Statistics for Computer Scientists
No ratings yet
Statistics for Computer Scientists
63 pages
CH 1 1
No ratings yet
CH 1 1
28 pages
Statistics Complete Guide
No ratings yet
Statistics Complete Guide
47 pages
Basic Econometrics Course Overview
No ratings yet
Basic Econometrics Course Overview
86 pages
Statistical Methods and Data Analysis Guide
No ratings yet
Statistical Methods and Data Analysis Guide
4 pages
Foundations of Biostatistics Overview
No ratings yet
Foundations of Biostatistics Overview
45 pages
Methods Doing Social Research 4th Edition Jackson Ebook & Testbank
No ratings yet
Methods Doing Social Research 4th Edition Jackson Ebook & Testbank
244 pages
Pengaruh Budaya dan Psikologi pada Pembelian Durian
No ratings yet
Pengaruh Budaya dan Psikologi pada Pembelian Durian
20 pages
Writing an Effective Research Proposal
No ratings yet
Writing an Effective Research Proposal
41 pages
Quantitative Research Objectives and Methods
No ratings yet
Quantitative Research Objectives and Methods
6 pages
Hybrid Resampling for Imbalanced Big Data
No ratings yet
Hybrid Resampling for Imbalanced Big Data
11 pages
SWOT Analysis of Sampling Techniques
No ratings yet
SWOT Analysis of Sampling Techniques
8 pages
Brief 8 Quasi-Experimental Design Eng
No ratings yet
Brief 8 Quasi-Experimental Design Eng
16 pages
CMM vs TMM: Measuring Screw Features
No ratings yet
CMM vs TMM: Measuring Screw Features
4 pages
Urban Regeneration of Gleisdreieck, Berlin
No ratings yet
Urban Regeneration of Gleisdreieck, Berlin
10 pages
Regression Analysis of Condo Prices
No ratings yet
Regression Analysis of Condo Prices
4 pages
Econometrics Final Exam Overview
No ratings yet
Econometrics Final Exam Overview
2 pages
Comprehensive Statistics Cheat Sheet
No ratings yet
Comprehensive Statistics Cheat Sheet
39 pages
Confounding in Factorial Experiments
No ratings yet
Confounding in Factorial Experiments
11 pages
Descriptive Data Analysis Overview
No ratings yet
Descriptive Data Analysis Overview
15 pages
Understanding Heteroscedasticity and Autocorrelation
No ratings yet
Understanding Heteroscedasticity and Autocorrelation
25 pages
The Philosophical Breakfast Club by Laura J. Snyder - Excerpt
100% (9)
The Philosophical Breakfast Club by Laura J. Snyder - Excerpt
21 pages
Writing Articles in Quantitative Research
No ratings yet
Writing Articles in Quantitative Research
7 pages
The Secret Behind The Secret
100% (1)
The Secret Behind The Secret
17 pages
Data Presentation Techniques Guide
No ratings yet
Data Presentation Techniques Guide
25 pages
Mastering the "Why Us" College Essay
No ratings yet
Mastering the "Why Us" College Essay
20 pages
Microfinance's Role in Bangladesh SMEs
No ratings yet
Microfinance's Role in Bangladesh SMEs
10 pages
Understanding Indian Identity and Culture
No ratings yet
Understanding Indian Identity and Culture
10 pages
CHAPTER 1 Research Method
No ratings yet
CHAPTER 1 Research Method
53 pages
Monte Carlo Simulation Techniques
No ratings yet
Monte Carlo Simulation Techniques
13 pages
Basic Business Statistics: Introduction and Data Collection
No ratings yet
Basic Business Statistics: Introduction and Data Collection
33 pages
Assump of RA
No ratings yet
Assump of RA
5 pages
Understanding the Experimental Method in Psychology
No ratings yet
Understanding the Experimental Method in Psychology
6 pages
Repeated-Measures ANOVA Overview
No ratings yet
Repeated-Measures ANOVA Overview
10 pages
Research Methods in IO Psychology
No ratings yet
Research Methods in IO Psychology
29 pages
Research Process Overview and Steps
No ratings yet
Research Process Overview and Steps
43 pages

Introduction to Data Modeling Basics

Uploaded by

Introduction to Data Modeling Basics

Uploaded by

Unit 1 Introduction to Data

Point estimation provides a single best guess of a parameter's

• For example, if we want to estimate the average height of

• Interval estimation provides a range of values within which

Maximum Likelihood Estimation (MLE)

• The method of moments is a simpler, less computationally intense

❑ Methods of describing the characteristics of a data set.

❑ Includes calculating things such as the average of the data,

❑ Descriptive statistics involves describing, summarizing

– E.g.- education, marital status, eye colour

– Frequency- number of observations falling into particular

– Frequency distribution- table listing all classes & their

– Graphical representation- Pie chart, Bar graph.

– Class interval / BINS- covers the range between maximum &

– Usually presented as a Histogram or a Bar graph.

Central Dispersio Shape

Mean Range Skewness

Median Standard Kurtosis

❑ Mean- sum of observed values in a data divided by the

❑ Mean & Median can be applied only to Quantitative data

❑ Mode can be used either to Qualitative or Quantitative data.

❑ Often used when the data is nonsymmetrical.

❑ Mean = Median = Mode

❑ Standard deviation- it is a kind of average of the absolute deviation

❑ Variance- square of standard deviation

❑ Distribution may be positively or negatively skewed. Limits

❑ Kurtosis- convexity of a curve.

• Classification and tabulation reduce the

• It will attract the attention of a large number of

joined bymeans of smooth curve then we

A Frequency Curve is a smooth curve which

• The main difference between a frequency

• For each diagram, a suitable short heading

• One dimensional diagram

❖ Sub-divided bar diagram or component bar diagram

• Sometimes the variable is capable of being

• Bars may sometimes be superimposed

• When the component parts are expressed

• Their areas should be proportional to the magnitudes of

You might also like