0% found this document useful (0 votes)

13 views32 pages

Data Analysis and Statistics Overview

Uploaded by

rawaddy2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views32 pages

Data Analysis and Statistics Overview

Uploaded by

rawaddy2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Chapter 2 : Data and Statistics

Data and Statistics

1. Data :
Elements, Variables, and Observations

a) Elements are the entities on which data are collected.

b) A variable is a characteristic of interest for the elements.

c) Measurements collected on each variable for every element in a study

provide the data. The set of measurements obtained for a particular
element is called an observation.
Example
• The data set in Table 1.1 includes the following five
variables: Fund Type , Net Asset Value ($) ,5-Year,
Average Return (%) , Expense Ratio , Morningstar
Rank
• The set of measurements for the first observation
(American Century Intl. Disc) is IE, 14.37, 30.53,
1.41, and 3-Star.
• The set of measurements for the second observation
(American Century Tax-Free Bond) is FI, 10.73, 3.34,
0.49, and 4-Star, and so on.
• A data set with 25 elements contains 25 observations.
1. Data :

Scales of Measurement

• Data collection requires one of the following scales of measurement:

nominal, ordinal, interval, or ratio.

• The scale of measurement determines the amount of information contained in

the data and indicates the most appropriate data summarization and statistical
analyses.
1. Data :

Scales of Measurement

• Nominal scale : When the data for a variable consist of labels or names used to identify
an attribute of the element

• Ordinal scale : if the data has the properties of nominal data and the order or rank of the
data is significant.

• Interval scale : if the data have all the properties of ordinal data and the interval
between values is expressed in terms of a fixed unit of measure. Interval data are always
numeric.

• Ratio scale if the data have all the properties of interval data and the ratio of two values
is significant.
1. Data :

Categorical and Quantitative Data

• Data that can be grouped by specific categories are referred to as categorical data.
Categorical data use either the nominal or ordinal scale of measurement.

• Data that use numeric values to indicate how much or how many are referred to as
quantitative data.

• Quantitative data are obtained using either the interval or ratio scale of measurement.

• A categorical variable is a variable with categorical data, and a quantitative variable

is a variable with quantitative data.
1. Data :

Cross-Sectional and Time Series Data

• Cross-sectional data are data collected at the same or approximately the

same point in time.

• Time series data are data collected over several time periods.

Example
1. Data :

Statistical Inference

• A population is the set of all elements of interest in particular study.

• A sample is a subset of the population.

• The process of conducting a survey to collect data for a sample is called a sample
survey.

• Statistics uses data from a sample to make estimates and test hypotheses about the
characteristics of a population through a process referred to as statistical inference.

•
1. Descriptive Statistics :

 Summarizing Categorical Data

• Frequency Distribution is a tabular summary of data showing the number (frequency) of items in each of
several nonoverlapping classes.

Example : To develop a frequency distribution for these data, we count the number of times each soft drink
appears in Table 2.1. Coke Classic appears 19 times, Diet Coke appears 8 times, Dr. Pepper appears 5 times,
Pepsi appears 13 times, and Sprite appears 5 times.
1. Descriptive Statistics :

 Summarizing Categorical Data

• A frequency distribution shows the number (frequency) of items in each of several nonoverlapping classes.

• A percent frequency distribution summarizes the percent frequency of the data for each class.

• A bar chart is a graphical device for depicting categorical data summarized in a frequency, relative
frequency, or percent frequency distribution.
1. Descriptive Statistics :

 Summarizing Categorical Data

• The pie chart provides another graphical device for presenting relative frequency and percent frequency
distributions for categorical data.
1. Descriptive Statistics :

Summarizing Quantitative Data

• The three steps necessary to define the classes for a frequency distribution with quantitative
data are:

1. Determine the number of nonoverlapping classes.

2. Determine the width of each class.

3. Determine the class limits.

1. Descriptive Statistics :

Summarizing Quantitative Data

• We define the relative frequency and percent frequency distributions for quantitative data in
the same manner as for qualitative data. First, recall that the relative frequency is the
proportion of the observations belonging to a class. With n observations,

• The percent frequency of a class is the relative frequency multiplied by 100.

1. Descriptive Statistics :

Summarizing Quantitative Data

• One of the simplest graphical summaries of data is a dot plot. A horizontal axis shows the
range for the data. Each data value is represented by a dot placed above the axis.

• A common graphical presentation of quantitative data is a histogram. A histogram is

constructed by placing the variable of interest on the horizontal axis and the frequency,
relative frequency, or percent frequency on the vertical axis.
1. Descriptive Statistics :

Summarizing Quantitative Data

• A variation of the frequency distribution that provides another tabular summary of quantitative
data is the cumulative frequency distribution.

• The cumulative frequency distribution uses the number of classes, class widths, and class
limits developed for the frequency distribution.

• A cumulative percent frequency distribution shows the percentage of data items with values
less than or equal to the upper limit of each class.
1. Descriptive Statistics :

Summarizing Quantitative Data

• A graph of a cumulative distribution, called an ogive, shows data values on the horizontal axis
and either the cumulative frequencies, the cumulative relative frequencies, or the cumulative
percent frequencies on the vertical axis.
1. Descriptive Statistics: Numerical Measures
 Measures of Location

• The mean provides a measure of central location for the data. If the data are
for a sample, the mean is denoted by x ̄; if the data are for a population, the
mean is denoted by the Greek letter μ.
1. Descriptive Statistics: Numerical Measures

Measures of Location

• The median is another measure of central location.

• MEDIAN

• Arrange the data in ascending order (smallest value to largest value).

a) For an odd number of observations, the median is the middle value.

b) For an even number of observations, the median is the average of the two middle values.
1. Descriptive Statistics: Numerical Measures

Measures of Location

• A third measure of location is the mode.

• The mode is the value that occurs with greatest frequency.

• Percentile : the pth percentile is a value such that at least p percent of the observations
are less than or equal to this value and at least (100 - p) percent of the observations are
greater than or equal to this value.
1. Descriptive Statistics: Numerical Measures

Measures of Location
• Calculating the pth percentile
• Step 1. Arrange the data in ascending order (smallest value to largest value).
• Step 2. Compute an index i

where p is the percentile of interest and n is the number of observations.

• Step 3.
a) If i is not an integer, round up. The next integer greater than i denotes the position of the
pth percentile.
b) If i is an integer, the pth percentile is the average of the values in positions i and i + 1.
Example. As an illustration of this procedure, let us determine the 85th percentile for the starting salary data.
Step 1. Arrange the data in ascending order.
3310 3355 3450 3480 3480 3490 3520 3540 3550 3650 3730 3925
Step 2.

Step 3. Because i is not an integer, round up. The position of the 85th percentile is the next integer greater than
10.2, the 11th position.
Returning to the data, we see that the 85th percentile is the data value in the 11th position, or 3730.

As another illustration of this procedure, let us consider the calculation of the 50th per-
centile for the starting salary data. Applying step 2, we obtain

Because i is an integer, step 3(b) states that the 50th percentile is the average of the sixth and seventh data values;
thus the 50th percentile is (3490 + 3520)/2 = 3505. Note that the 50th percentile is also the median.
1. Descriptive Statistics: Numerical Measures

Measures of Location

• Quartiles: The division points are referred to as the quartiles and are defined as
• Q1 = first quartile, or 25th percentile

• Q2 = second quartile, or 50th percentile (also the median)

• Q3 = third quartile, or 75th percentile.

• Figure 3.1 shows a data distribution divided into four parts.

Example. The starting salary data are again arranged in ascending order. We already identified Q2, the second
quartile (median), as 3505. 3310 3355 3450 3480 3480 3490 3520 3540 3550 3650 3730 3925
The computations of quartiles Q1 and Q3 require the use of the rule for finding the 25th and 75th percentiles.
For Q1,

Because i is an integer, step 3(b) indicates that the first quartile, or 25th percentile, is the average of the third and
fourth data values; thus, Q1 = (3450 + 3480)/2 = 3465.
For Q3,

Again, because i is an integer, step 3(b) indicates that the third quartile, or 75th percentile, is the average of the
ninth and tenth data values; thus, Q3 = (3550 + 3650)/2 = 3600. The quartiles divide the starting salary data into
four parts, with each part containing 25% of the observations.
1. Descriptive Statistics: Numerical Measures

Measures of Variability
• Range: the simplest measure of variability is the range.

Range = Largest value - Smallest value

• Interquartile Range: This measure of variability is the difference between the third
quartile, Q3, and the first quartile, Q1. In other words, the interquartile range is the range
for the middle 50% of the data.

IQR = Q3 - Q1

For the data on monthly starting salaries, the quartiles are Q3 = 3600 and Q1 = 3465. Thus,
the interquartile range is 3600 - 3465 = 135.
1. Descriptive Statistics: Numerical Measures

Measures of Variability
• The variance is a measure of variability that utilizes all the data. The variance is based on
the difference between the value of each observation (xi ) and the mean.
1. Descriptive Statistics: Numerical Measures

Measures of Variability
• The standard deviation is defined to be the positive square root of the variance.
• Coefficient of variation: indicates how large the standard deviation is relative to the mean.
1. Descriptive Statistics: Numerical Measures

Exploratory Data Analysis

• In a five-number summary, the following five numbers are used to summarize the data:
1. Smallest value

2. First quartile (Q1)

3. Median (Q2)

4. Third quartile (Q3)

5. Largest value

Example : The monthly starting salaries shown in Table 3.1 for a sample of 12 business school graduates are
repeated here in ascending order.
1. Descriptive Statistics: Numerical Measures

Exploratory Data Analysis

• A box plot is a graphical summary of data that is based on a five-number summary. A key to the
development of a box plot is the computation of the median and the quartiles, Q1 and Q3. The
interquartile range, IQR = Q3 - Q1, is also used.

• The steps used to construct the box plot follow.

o A box is drawn with the ends of the box located at the first and third quartiles. For the salary
data,Q1 =3465 and Q3 =[Link] box contains the middle 50% of the data.

o A vertical line is drawn in the box at the location of the median (3505 for the salary data).

o By using the interquartile range, IQR = Q3 - Q1, limits are located. The limits for the box plot
are 1.5(IQR) below Q1 and 1.5(IQR) above Q3.
o For the salary data, IQR = Q3 - Q1 = 3600 - 3465 = 135. Thus, the limits are 3465 - 1.5(135)
= 3262.5 and 3600 + 1.5(135) = 3802.5. Data outside these limits are considered outliers.

o The dashed lines in Figure 3.5 are called whiskers. The whiskers are drawn from the ends of
the box to the smallest and largest values inside the limits computed in step 3. Thus, the
whiskers end at salary values of 3310 and 3730.

o Finally, the location of each outlier is shown with the symbol *. In Figure 3.5 we see one
outlier, 3925.
1. Descriptive Statistics: Numerical Measures

Measures of Association Between Two Variables

• covariance as a descriptive measure of the linear association between two variables.

• For a sample of size n with the observations (x1, y1), (x2, y2), and so on, the sample covariance
is defined as follows:
Correlation Coefficient

Understanding Statistical Concepts
No ratings yet
Understanding Statistical Concepts
523 pages
Quantitative Decision-Making Methods
No ratings yet
Quantitative Decision-Making Methods
100 pages
Engineering Probability & Statistics Guide
No ratings yet
Engineering Probability & Statistics Guide
45 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
36 pages
Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
14 pages
Data Management: Types and Techniques
No ratings yet
Data Management: Types and Techniques
43 pages
Class 01
No ratings yet
Class 01
44 pages
Statistical Tools for Data Management
No ratings yet
Statistical Tools for Data Management
76 pages
Understanding Statistics in Business
No ratings yet
Understanding Statistics in Business
25 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
26 pages
Understanding Data Types and Statistics
No ratings yet
Understanding Data Types and Statistics
128 pages
Data Classification and Statistics Overview
No ratings yet
Data Classification and Statistics Overview
6 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
72 pages
Understanding Levels of Measurement
No ratings yet
Understanding Levels of Measurement
38 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
46 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
39 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
46 pages
Lecture 02 - SP - Probability Review
No ratings yet
Lecture 02 - SP - Probability Review
47 pages
Types and Characteristics of Statistical Data
No ratings yet
Types and Characteristics of Statistical Data
6 pages
Essential Statistics for Data Science
No ratings yet
Essential Statistics for Data Science
30 pages
Statistics: Data Types and Analysis
No ratings yet
Statistics: Data Types and Analysis
71 pages
Introduction to Statistics Course Overview
No ratings yet
Introduction to Statistics Course Overview
187 pages
Types of Variables and Data Measurement
No ratings yet
Types of Variables and Data Measurement
123 pages
Descriptive Statistics Overview Guide
No ratings yet
Descriptive Statistics Overview Guide
38 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
41 pages
Understanding Data Objects and Attributes
No ratings yet
Understanding Data Objects and Attributes
59 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
4 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
176 pages
Importance of Descriptive Statistics
No ratings yet
Importance of Descriptive Statistics
59 pages
Understanding Data and Statistics Basics
No ratings yet
Understanding Data and Statistics Basics
94 pages
Measures of Location in Statistics
No ratings yet
Measures of Location in Statistics
21 pages
Data Management in Statistics
100% (2)
Data Management in Statistics
104 pages
Statistical Machine Learning Overview
100% (1)
Statistical Machine Learning Overview
12 pages
Foundations of Data Science: Describing Data
No ratings yet
Foundations of Data Science: Describing Data
21 pages
Data Types and Statistical Inference Guide
No ratings yet
Data Types and Statistical Inference Guide
33 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
16 pages
Understanding Data and Statistics Basics
No ratings yet
Understanding Data and Statistics Basics
35 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
146 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
40 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
21 pages
Lec1 Statistics
No ratings yet
Lec1 Statistics
62 pages
Understanding Statistics and Sampling Techniques
No ratings yet
Understanding Statistics and Sampling Techniques
60 pages
Comprehensive Guide to Statistics Concepts
No ratings yet
Comprehensive Guide to Statistics Concepts
68 pages
Measures of Center for Dog Counts
No ratings yet
Measures of Center for Dog Counts
7 pages
Basics of Statistics Overview
No ratings yet
Basics of Statistics Overview
2 pages
Descriptive S
No ratings yet
Descriptive S
44 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
92 pages
Chapter 3 - Representation of Data - Student Slides
No ratings yet
Chapter 3 - Representation of Data - Student Slides
86 pages
Business Statistics Overview and Concepts
No ratings yet
Business Statistics Overview and Concepts
46 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
39 pages
Statistics Concepts and Sampling Methods
No ratings yet
Statistics Concepts and Sampling Methods
20 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
26 pages
Descriptive Statistics Overview and Techniques
No ratings yet
Descriptive Statistics Overview and Techniques
28 pages
Understanding Variables in Statistics
No ratings yet
Understanding Variables in Statistics
63 pages
MMW Statistics and Data Management
No ratings yet
MMW Statistics and Data Management
50 pages
Basic Statistical Data Descriptions
No ratings yet
Basic Statistical Data Descriptions
29 pages
Understanding Statistical Methods and Applications
No ratings yet
Understanding Statistical Methods and Applications
37 pages
Understanding Populations and Statistics
No ratings yet
Understanding Populations and Statistics
19 pages
Introduction to Data Modeling Basics
No ratings yet
Introduction to Data Modeling Basics
64 pages
Understanding Measures of Dispersion
No ratings yet
Understanding Measures of Dispersion
9 pages
Partial and Multiple Correlations Explained
No ratings yet
Partial and Multiple Correlations Explained
18 pages
8614 Statistics Quiz Questions
No ratings yet
8614 Statistics Quiz Questions
101 pages
Business Statistics Question Bank
No ratings yet
Business Statistics Question Bank
6 pages
Foundations of Biostatistics Overview
No ratings yet
Foundations of Biostatistics Overview
9 pages
Measures of Dispersion Explained
No ratings yet
Measures of Dispersion Explained
50 pages
Chapter 2: Research Process Overview
No ratings yet
Chapter 2: Research Process Overview
32 pages
VWAP with Standard Deviation Bands
No ratings yet
VWAP with Standard Deviation Bands
3 pages
Data Analysis Concepts and Examples
No ratings yet
Data Analysis Concepts and Examples
4 pages
Normality Tests: Specificity & Sensitivity
No ratings yet
Normality Tests: Specificity & Sensitivity
11 pages
Analisis ANOVA Pengaruh Baterai dan Media
No ratings yet
Analisis ANOVA Pengaruh Baterai dan Media
7 pages
Probability of Customer Arrivals
No ratings yet
Probability of Customer Arrivals
237 pages
Review - Probability and Statistics 2025
No ratings yet
Review - Probability and Statistics 2025
4 pages
Central Tendency and Standard Deviation
No ratings yet
Central Tendency and Standard Deviation
26 pages
Statistics for Engineering and Food Industries
No ratings yet
Statistics for Engineering and Food Industries
18 pages
Statistical Analysis of Dataset Frequencies
No ratings yet
Statistical Analysis of Dataset Frequencies
4 pages
Introductory Statistics Assignment 1
No ratings yet
Introductory Statistics Assignment 1
4 pages
Statistics Subjective Special 2023
No ratings yet
Statistics Subjective Special 2023
3 pages
Blood Pressure Percentiles for Boys
No ratings yet
Blood Pressure Percentiles for Boys
4 pages
Pace Factor in Surveying Calculations
No ratings yet
Pace Factor in Surveying Calculations
23 pages
Enhancing Audit Quality via Data Analytics
No ratings yet
Enhancing Audit Quality via Data Analytics
11 pages
CEO Traits and Firm Strategy Impact
No ratings yet
CEO Traits and Firm Strategy Impact
22 pages
Skewness and Kurtosis in Statistics
100% (6)
Skewness and Kurtosis in Statistics
15 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
73 pages
AS Practice Paper E Mark Scheme
No ratings yet
AS Practice Paper E Mark Scheme
10 pages
Engineering Data Analysis Course Overview
100% (1)
Engineering Data Analysis Course Overview
56 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
15 pages
Normality Test Results and Analysis
No ratings yet
Normality Test Results and Analysis
1 page
Maths Class X: Statistics & Probability Exercises
No ratings yet
Maths Class X: Statistics & Probability Exercises
2 pages
Pangasinan Elementary School Exam Results
No ratings yet
Pangasinan Elementary School Exam Results
6 pages