0% found this document useful (0 votes)
7 views23 pages

Understanding Data Types and Analysis

The document outlines two types of data: categorical and quantitative, emphasizing the focus on quantitative data analysis. It introduces descriptive statistics, statistical inference, and various methods for summarizing data, including frequency distributions and histograms. Additionally, it discusses measures of central tendency, variability, and relationships between variables using scatter diagrams and trendlines.

Uploaded by

snrpd2pqwh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views23 pages

Understanding Data Types and Analysis

The document outlines two types of data: categorical and quantitative, emphasizing the focus on quantitative data analysis. It introduces descriptive statistics, statistical inference, and various methods for summarizing data, including frequency distributions and histograms. Additionally, it discusses measures of central tendency, variability, and relationships between variables using scatter diagrams and trendlines.

Uploaded by

snrpd2pqwh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data

Two types of data encountered:

1. Categorical Data = data that can be grouped by specific, non-numerical categories

2. Quantitative data = data that use numeric values to indicate how much or how many

There are limitations as to the extent of analysis


that can be performed on categorical data.
Therefore, we will concentrate our efforts primarily
on the analysis of quantitative data.
3. Descriptive Statistics = summaries of data in a form that is easier for the reader to
understand…includes tabular, graphical or numerical
presentations of the data
Where does the data come from?

4. Population = the set of all elements of interest in a particular study

5. Sample = a subset of the population

How is the data used?--Introduction to Statistical Inference

6. Statistical Inference = the use of data from a sample to make estimates


and predictions about the characteristics of
a population
Descriptive Statistics:
Using tables and graphical
displays
7. Frequency Distribution = a tabular summary of data showing the number (or frequency)
of observations in each of several non-overlapping categories
(or classes)

8. Relative Frequency = the fraction or percentage of observations belonging to a class

FORMULA:

Where: n = the number of observations


Steps for creating a Frequency Distribution for Quantitative Data:
A. Determine the number of non-overlapping classes, called k

Rule: Use the first k, such that 2k ≥ n

B. Determine the width of each class

Rule:
,

C. Determine the class limits

Rule: Class limits must be chosen so that each data item belongs to one and only one class.
Use both LOWER CLASS LIMITS and UPPER CLASS LIMITS
Work Chapter 2, Example #1 on Excel
Using Histograms to summarize data:
8. Histogram = a bar graph in which the classes are marked on the horizontal axis
and the class frequencies on the vertical axis. The class frequencies
are represented by the heights of the bars with the bars being drawn
adjacent to each other.
SPECIAL NOTE:

One of the most important uses of a histogram is to provide


information about the shape, or form, of a distribution.

Create Histogram using Chapter 2, Exercise #1 data


9. Skewness = the tendency of a histogram to be “off centered”

A. Skewed to the right = the histogram’s “tail” extends farther to the right
B. Skewed to the left = the histogram’s tail extends farther to the left
C. Symmetric = the histogram is neither skewed left nor right
Summarizing Data for Two Variables Using Graphical Displays: Scatter Diagram and Trendline

10. Scatter diagram = a graphical display of the relationship between


two quantitative variables

11. Trendline = a line that provides an approximation of the relationship


between two variables
Types of Relationships Depicted by Scatter Diagrams

12. Positive relationship = the scatter diagram seems to suggest an “upward”


pattern. Positive relationship indicates that the two
variables move in the SAME DIRECTION.

13. Negative relationship = the scatter diagram seems to suggest a


“downward” pattern. Negative relationship
indicates that the two variables move in
OPPOSITE DIRECTIONS.

14. No apparent relationship = the scatter diagram seems to suggest a


“random” pattern.
Descriptive Statistics:
Using Numerical Measurements
Measures of Location
15. Mean = the average value for a variable. It provides a measure of the
central location for the data.

Two different means:


16. Population mean = the average value for all the observations from the POPULATION.

It is denoted by the Greek letter µ.

17. Sample mean = the average value for all the observations from the SAMPLE.

It is denoted by the standard letter .


SPECIAL NOTE:

18. Parameter = a characteristic of a population

Remember the “P’s” go together

19. Statistic = a characteristic of a sample

Remember the “S’s” go together


FORMULAS for Mean:


Population mean:

where: Σ = the summation operator


N = total number of observations in the population


Sample mean:

where: Σ = the summation operator


n = total number of observations in the sample

Work Sample Mean Problem using Excel


20. Median = another measure of central location. It is the value in the middle when
the data are arranged in ascending order (smallest value to largest value).

How To Find the Median of a Data Set:

A. Arrange the data in ascending order (smallest value to largest value)

B. Determine ODD or EVEN data:

1. For an ODD number of observations, the median is the middle value


2. For an EVEN number of observations, the median is the average of the two middle values

21. Mode = another measure of central location. It is the value that occurs with the greatest
frequency
Measures of Variability
22. Range = largest value – smallest value

23. Variance = a measure of variability that utilizes all the data.

Two Different Variance Formulas:

∑( )
POPULATION VARIANCE:

( ̅)
SAMPLE VARIANCE:
24. Standard deviation = the positive square root of the variance

Two Different Standard Deviation Formulas:

POPULATION STANDARD DEVIATION: σ=

SAMPLE STANDARD DEVIATION: s=

Work Range, Var. and St. Dev. Problems on Excel


Measures of Relative Location
25. z-Scores = often called the standardized value. It is the number of standard
deviations a data value (x) is from the sample mean

FORMULA FOR CALCULATING A z-Score:

̅
z=
26. Chebyshev’s Theorem = enables us to make statements about the
percentage of data values that must be
within a specified number of standard
deviations of the mean

FORMULA FOR CHEBYSHEV’S THEOREM

At least ( ) of the data values must be within c standard


deviations of the mean, where c is any value greater than 1
27. Empirical Rule = used to determine the percentage of data
values that must be within a specified number
of standard deviations of the mean for
data that has a symmetric, bell-shaped
distribution.

FORMULA FOR EMPIRICAL RULE

A. ~68% of the data values will be within ONE standard deviation of the mean
B. ~95% of the data values will be within TWO standard deviations of the mean
C. ~99.7 or ~ALL of the data values will be within THREE standard deviations of the mean
END OF TEST #1 NOTES

You might also like