Data
Two types of data encountered:
1. Categorical Data = data that can be grouped by specific, non-numerical categories
2. Quantitative data = data that use numeric values to indicate how much or how many
There are limitations as to the extent of analysis
that can be performed on categorical data.
Therefore, we will concentrate our efforts primarily
on the analysis of quantitative data.
3. Descriptive Statistics = summaries of data in a form that is easier for the reader to
understand…includes tabular, graphical or numerical
presentations of the data
Where does the data come from?
4. Population = the set of all elements of interest in a particular study
5. Sample = a subset of the population
How is the data used?--Introduction to Statistical Inference
6. Statistical Inference = the use of data from a sample to make estimates
and predictions about the characteristics of
a population
Descriptive Statistics:
Using tables and graphical
displays
7. Frequency Distribution = a tabular summary of data showing the number (or frequency)
of observations in each of several non-overlapping categories
(or classes)
8. Relative Frequency = the fraction or percentage of observations belonging to a class
FORMULA:
Where: n = the number of observations
Steps for creating a Frequency Distribution for Quantitative Data:
A. Determine the number of non-overlapping classes, called k
Rule: Use the first k, such that 2k ≥ n
B. Determine the width of each class
Rule:
,
C. Determine the class limits
Rule: Class limits must be chosen so that each data item belongs to one and only one class.
Use both LOWER CLASS LIMITS and UPPER CLASS LIMITS
Work Chapter 2, Example #1 on Excel
Using Histograms to summarize data:
8. Histogram = a bar graph in which the classes are marked on the horizontal axis
and the class frequencies on the vertical axis. The class frequencies
are represented by the heights of the bars with the bars being drawn
adjacent to each other.
SPECIAL NOTE:
One of the most important uses of a histogram is to provide
information about the shape, or form, of a distribution.
Create Histogram using Chapter 2, Exercise #1 data
9. Skewness = the tendency of a histogram to be “off centered”
A. Skewed to the right = the histogram’s “tail” extends farther to the right
B. Skewed to the left = the histogram’s tail extends farther to the left
C. Symmetric = the histogram is neither skewed left nor right
Summarizing Data for Two Variables Using Graphical Displays: Scatter Diagram and Trendline
10. Scatter diagram = a graphical display of the relationship between
two quantitative variables
11. Trendline = a line that provides an approximation of the relationship
between two variables
Types of Relationships Depicted by Scatter Diagrams
12. Positive relationship = the scatter diagram seems to suggest an “upward”
pattern. Positive relationship indicates that the two
variables move in the SAME DIRECTION.
13. Negative relationship = the scatter diagram seems to suggest a
“downward” pattern. Negative relationship
indicates that the two variables move in
OPPOSITE DIRECTIONS.
14. No apparent relationship = the scatter diagram seems to suggest a
“random” pattern.
Descriptive Statistics:
Using Numerical Measurements
Measures of Location
15. Mean = the average value for a variable. It provides a measure of the
central location for the data.
Two different means:
16. Population mean = the average value for all the observations from the POPULATION.
It is denoted by the Greek letter µ.
17. Sample mean = the average value for all the observations from the SAMPLE.
It is denoted by the standard letter .
SPECIAL NOTE:
18. Parameter = a characteristic of a population
Remember the “P’s” go together
19. Statistic = a characteristic of a sample
Remember the “S’s” go together
FORMULAS for Mean:
∑
Population mean:
where: Σ = the summation operator
N = total number of observations in the population
∑
Sample mean:
where: Σ = the summation operator
n = total number of observations in the sample
Work Sample Mean Problem using Excel
20. Median = another measure of central location. It is the value in the middle when
the data are arranged in ascending order (smallest value to largest value).
How To Find the Median of a Data Set:
A. Arrange the data in ascending order (smallest value to largest value)
B. Determine ODD or EVEN data:
1. For an ODD number of observations, the median is the middle value
2. For an EVEN number of observations, the median is the average of the two middle values
21. Mode = another measure of central location. It is the value that occurs with the greatest
frequency
Measures of Variability
22. Range = largest value – smallest value
23. Variance = a measure of variability that utilizes all the data.
Two Different Variance Formulas:
∑( )
POPULATION VARIANCE:
( ̅)
SAMPLE VARIANCE:
24. Standard deviation = the positive square root of the variance
Two Different Standard Deviation Formulas:
POPULATION STANDARD DEVIATION: σ=
SAMPLE STANDARD DEVIATION: s=
Work Range, Var. and St. Dev. Problems on Excel
Measures of Relative Location
25. z-Scores = often called the standardized value. It is the number of standard
deviations a data value (x) is from the sample mean
FORMULA FOR CALCULATING A z-Score:
̅
z=
26. Chebyshev’s Theorem = enables us to make statements about the
percentage of data values that must be
within a specified number of standard
deviations of the mean
FORMULA FOR CHEBYSHEV’S THEOREM
At least ( ) of the data values must be within c standard
deviations of the mean, where c is any value greater than 1
27. Empirical Rule = used to determine the percentage of data
values that must be within a specified number
of standard deviations of the mean for
data that has a symmetric, bell-shaped
distribution.
FORMULA FOR EMPIRICAL RULE
A. ~68% of the data values will be within ONE standard deviation of the mean
B. ~95% of the data values will be within TWO standard deviations of the mean
C. ~99.7 or ~ALL of the data values will be within THREE standard deviations of the mean
END OF TEST #1 NOTES