0% found this document useful (0 votes)

9 views14 pages

Understanding Descriptive Statistics

Uploaded by

jacobsimiyu2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views14 pages

Understanding Descriptive Statistics

Uploaded by

jacobsimiyu2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

TOPIC 2: DESCRIPTIVE STATISTICS

Descriptive statistics are used to examine the basic features of data. They provide simpe
summaries about the sample. The focus is on immediate data (sample) and does not extend
beyond this. This is the difference between descriptive and inferential statistics as inferential
extends beyond the immediate data (sample) to what it means to the population.

As the word states, it is about describing the data; it uses quantitative descriptions; it depitcs
characteristics, demographics, sizes and other facts about the data.

Descriptive statistics are therefore tabular, graphical, and numerical summaries of data that
serves to not only provide a basic understanding but also prepare the data for further analysis.
The purpose of descriptive statistics is to facilitate the presentation and interpretation of data.

Univariate methods of descriptive statistics use data to enhance the understanding of a single
variable; multivariate methods focus on using statistics to understand the relationships among
two or more variables.

There are three major categories of descriptive statistics;

 Frequency distribution,

 Measures of central tendency

 Measures of dispersion/variability.

Frequency Distribution

In statistics, the frequency (or absolute frequency) of an event is the number of times the
observation occurred/was recorded. These frequencies are often depicted graphically or in
tabular form.

The cumulative frequency is the total of the absolute frequencies of all events at or below a
certain point in an ordered list of events/sum of the frequencies.

The relative frequency (or empirical probability) of an event is the absolute frequency
normalized by the total number of events

A frequency distribution shows us a summarized grouping of data divided into mutually

exclusive classes and the number of occurrences in a class.
a. Frequency Distribution Table

The most commonly used tabular summary of data for a single variable is a frequency
distribution.

A frequency distribution shows the number of data values in each of several nonoverlapping
classes.

Another tabular summary, called a relative frequency distribution, shows the fraction,
or percentage, of data values in each class.

The most common tabular summary of data for two variables is a cross tabulation, a two-
variable analogue of a frequency distribution.

For a qualitative variable, a frequency distribution shows the number of data values in each
qualitative category.

For instance, if a variable gender has two categories: male and female. Thus, a frequency
distribution for gender would have two nonoverlapping classes to show the number of males
and females.

A relative frequency distribution for this variable would show the fraction of individuals that are
male and the fraction of individuals that are female.

Constructing a frequency distribution for a quantitative variable requires more care in defining
the classes and the division points between adjacent classes.

For instance, if age data ranges from 22 to 78 years, the following six nonoverlapping classes
could be used: 20–29, 30–39, 40–49, 50–59, 60–69, and 70–79.

A frequency distribution would show the number of data values in each of these classes, and a
relative frequency distribution would show the fraction of data values in each.

Constructing a frequency distribution table

1. Decide the number of classes. Too many classes or too few classes might not reveal the
basic shape of the data set, also it will be difficult to interpret such frequency
distribution.

2. Calculate the range of the data (Range = Max – Min) by finding the minimum and
maximum data values. Range will be used to determine the class interval or class width.

3. Decide the width of the classes (assuming the class intervals are the same for all
classes).
4. Generally the class interval or class width is the same for all classes. The classes all
taken together must cover at least the distance from the lowest value (minimum) in the
data to the highest (maximum) value.

5. Equal class intervals are preferred in frequency distribution, while unequal class intervals
(for example logarithmic intervals) may be necessary in certain situations to produce a
good spread of observations between the classes and avoid a large number of empty, or
almost empty classes

Example 1

Frequency distribution table A frequency distribution table is an arrangement of the values that
one or more variables take in a sample. Each entry in the table contains the frequency or count
of the occurrences of values within a particular group or interval, and in this way, the table
summarizes the distribution of values in the [Link] is an example of a univariate (=single
variable) frequency table.

Rank Degree of agreement Number

1 Strongly agree 22

2 Agree somewhat 30

3 Not sure 20

4 Disagree somewhat 15

5 Strongly disagree 15

Example 2

Frequency distribution table B aggregates values into bins such that each bin encompasses a
range of values. For example, the heights of maize plants in a class could be organized into the
following frequency table.

Height range Number of plants Cumulative number

less than 5.0 feet 25 25

5.0–5.5 feet 35 60

5.5–6.0 feet 20 80

100
6.0–6.5 feet 20

Cross tabulation

A cross tabulation is a two-way table with the rows of the table representing the classes of one
variable and the columns of the table representing the classes of another variable.

To construct a cross tabulation using the variables gender and age, gender could be shown with
two rows, male and female, and age could be shown with six columns corresponding to the age
classes 20–29, 30–39, 40–49, 50–59, 60–69, and 70–79.

The entry in each cell of the table would specify the number of data values with the gender given
by the row heading and the age given by the column heading. Such a cross tabulation could be
helpful in understanding the relationship between gender and age. (Assignment: complete a
cross tabulation as described)
Graphical methods

Some of the graphs that can be used with frequency distributions are histograms, line charts,
bar charts and pie charts. Frequency distributions are used for both qualitative and quantitative
data.

Bar graph

A bar graph is a graphical device for depicting qualitative data that have been summarized in a
frequency distribution. Labels for the categories of the qualitative variable are shown on the
horizontal axis of the graph. A bar above each label is constructed such that the height of each
bar is proportional to the number of data values in the category.
Pie chart

A pie chart is another graphical device for summarizing qualitative data.

The size of each slice of the pie is proportional to the number of data values in the
corresponding class.
Histogram

A histogram is the most common graphical presentation of quantitative data that have been
summarized in a frequency distribution.

The values of the quantitative variable are shown on the horizontal axis. A rectangle is drawn
above each class such that the base of the rectangle is equal to the width of the class interval
and its height is proportional to the number of data values in the class.
Measures of Central Tendency

Measures of central tendency seek to find the single "middle" or "average" value in a dataset.

These measures include;

a. Mean:

The mean is a measure of the central location for the data. The sum of all values divided by the
total number of values (the "average"). The mean, often called the average, is computed by
adding all the data values for a variable and dividing the sum by the number of data values. The
men is often affected when there are huge outliers in data. In this case, the mean might not
correctly indicate the "middle' as it could be pushed upwards or downwards by a single large
number or very small number that is far away from the rest.

b. Median

The median is the middle value in a dataset that is ordered from least to greatest. The median is
another measure of central location but, unlike the mean, median is not affected by extremely
large or extremely small data values. When determining the median, the data values are first
ranked in order from the smallest value to the largest value. If there is an odd number of data
values, the median is the middle value; if there is an even number of data values, the median is
the average of the two middle values.

c. Mode

The value that appears most frequently in the dataset. The third measure of central tendency is
the mode, the data value that occurs with greatest frequency.

d. Percentiles

Percentiles provide an indication of how the data values are spread over the interval from the
smallest value to the largest value.

Approximately p percent of the data values fall below the pth percentile, and roughly 100
− p percent of the data values are above the pth percentile. Percentiles are reported, for
example, on most standardized tests.

e. Quantiles

Quartiles divide the data values into four parts; the first quartile is the 25th percentile, the
second quartile is the 50th percentile (also the median), and the third quartile is the 75th
percentile.
Measures of Variability (or Dispersion)

Measuers of variability describe "how spread out or scattered" the data points are from the
center.

Measures of variability include:

a. Range

The range, the difference between the largest value and the smallest value, is the simplest
measure of variability in the data. The range is determined by only the two extreme data values
(The difference between the highest and lowest values in the dataset)

b. Variance

The average of the squared differences from the mean. The deviation (difference) of each data
value from the sample mean is computed and squared. The squared deviations are then
summed and divided by n − 1 to provide the sample variance.

c. Standard deviation

A measure of the typical amount that data points deviate from the mean. The standard
deviation is the square root of the variance. Because the unit of measure for the standard
deviation is the same as the unit of measure for the data, many individuals prefer to use the
standard deviation as the descriptive measure of variability.
TOPIC 3: MULTIVARIATE PARAMETRIC STATISTICAL TECHNIQUES

Parametric tests refers to a pool of tests that impose certain requirements on the data in order
to produce reliable results (maximize prediction/minimize error). Parametric tests are only
aplied to continuous data(interval and ratio-scale). The tests require that data is normally
distributed (has no significant outliers) and that there is no significant variability in means
(homogeneous variance). The observations or data points should be independent of each other
(independence).

Paremetric tests have greater power in estimating results compared to non-parametric tests.
They have also found wide use, hence are widely accepted. Their robustness means they can
detect true differences even with smaller samples.

Multivariate parametric tests focus on two or more variables while univariate parametric tests
focus on a single variable. The following are examples of multivariate parametric tests;

a. Correlation Analysis

Correlation analysis examines association between two variables. Where the test statistic is not
0, there is a relationship.

Besides confirming the existence of a relationship, correlation analysis also tends to find out on
the type of relationship.

i) Positive or negative relationship/correlation

ii. Linear or non-linear

Positive correlation

 If the values of the two variables progress in the same direction i.e if an increase (or
decrease) in values of one variable, results, on average, in a corresponding increase (or
decrease) in values of the other variable, the correlation is said to be positive.

 Example 1: Increase in taxes correlates with increase in cost of goods

 Example 2: Heights and weights tend to be positively correlated.

 Example 3: A child grows through a normal positive correlation between months and
weight.

 Example 4: Amount of rainfall and yield

 Example 5: Demand and supply of commodities (see table below)

Demand Supply

January 0 0

February 4 4

March 8 8

April 12 12

 Demand and supply is expected to follow a positive correlation

Line of fit.

 One way of depicting a correlation is to use a line of best fit. Corresponding values of the
two variables are plotted at their intersection on the x and y axis.

 A line that goes through all the intersections (data points) is then drawn beggining from
the bottom (0) upwards through each data point.

 A positive correlation is shown by data points that that begin on the bottom left corner
and head to the top right corner as shown in the figure below.

 A perfect positive correlation has no outliers (data points that do not align to a straight
line). In a perfect correlation, the points begin at the same point and the scores are the
same thoughout.
Negative correlation

 Correlation between variables is said to be negative where the two deviate in opposite
directions.

 This means increase in the value of one variable, result on average, to a corresponding
decrease in the value of the other variable. In the same manner, if there is decrease in
one variable, the other increases.

 There are many examples of negatively correlated variables;

Taxes and job creation

Food production levels and inflation

Linear correlation

 Linear correlation means there is constant increase in y when x increases. It means that
the relationship between the two variables over the entire period is put at a constant,
predictable number of x to y.

 Ideally the relationship assumes the equation;

y= a+ bx; where a=. b= and x=

Assignment: Read about the interpretation of the gradient and intercept.

Non-linear correlation

Variables that change, but not at a constant rate, are deemed to be non-linear.

Example of an equation; y=a+bx+cx2

Coefficient of Correlation (r)

 The test statistic for correlation analysis is the coefficient of Correlation (r).

 The coefficient of Correlation is widely used in data analysis to give an estimate on the
level, and direction of the relationship between two variables.

 r thus quantifies the relationship besides indicating the direction i.e a measure of the
degree of relationship

 r takes values between +1 and -1.

 If r assumes a negative value ( -), there is a negative relationship.

 Where the r statistic is +, there is a positive relationship.

 Where r is lower than 5, the relationship is said to be a weak relationship while where r is
equal or greater than 6, it is deemed a strong relationship.

Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
28 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
15 pages
SB Niazi
No ratings yet
SB Niazi
38 pages
Data Organization and Presentation Techniques
No ratings yet
Data Organization and Presentation Techniques
39 pages
Definitions
No ratings yet
Definitions
14 pages
Understanding Frequency Distributions
No ratings yet
Understanding Frequency Distributions
63 pages
Medical Statistics: Descriptive Stats Overview
No ratings yet
Medical Statistics: Descriptive Stats Overview
58 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
88 pages
Data Analysis Concepts and Techniques
No ratings yet
Data Analysis Concepts and Techniques
67 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
7 pages
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
No ratings yet
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
59 pages
Lec 15
No ratings yet
Lec 15
12 pages
Sampling Methods and Data Analysis Guide
No ratings yet
Sampling Methods and Data Analysis Guide
10 pages
Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
14 pages
Understanding Frequency Distributions
No ratings yet
Understanding Frequency Distributions
4 pages
Understanding Frequency Distributions in Statistics
No ratings yet
Understanding Frequency Distributions in Statistics
37 pages
Probability and Statistics Basics
No ratings yet
Probability and Statistics Basics
58 pages
Understanding Data Handling in Statistics
No ratings yet
Understanding Data Handling in Statistics
58 pages
Statistical Concepts and Principles Overview
No ratings yet
Statistical Concepts and Principles Overview
5 pages
Statistics and Probability Formulas Guide
No ratings yet
Statistics and Probability Formulas Guide
47 pages
Data Classification and Visualization Techniques
No ratings yet
Data Classification and Visualization Techniques
77 pages
Essential Guide to Statistics and Data Analysis
No ratings yet
Essential Guide to Statistics and Data Analysis
35 pages
SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
Understanding Data and Information Basics
No ratings yet
Understanding Data and Information Basics
13 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
3 pages
Frequency Distribution in Biostatistics
100% (2)
Frequency Distribution in Biostatistics
36 pages
Statistical Analysis Techniques Overview
No ratings yet
Statistical Analysis Techniques Overview
7 pages
Frequency Distributions and Graphs
No ratings yet
Frequency Distributions and Graphs
59 pages
Confidence Intervals for Microwave Prices
No ratings yet
Confidence Intervals for Microwave Prices
59 pages
Stat
No ratings yet
Stat
20 pages
Understanding Statistics: Types and Methods
No ratings yet
Understanding Statistics: Types and Methods
20 pages
ENGDATA Module 1 (Basics of Statistics)
100% (1)
ENGDATA Module 1 (Basics of Statistics)
44 pages
Understanding Statistics and Biostatistics
No ratings yet
Understanding Statistics and Biostatistics
70 pages
Statistical Data Management Overview
No ratings yet
Statistical Data Management Overview
44 pages
Organizing Data: Frequency & Graphs
No ratings yet
Organizing Data: Frequency & Graphs
21 pages
Statistics Unit 1: Data Distribution
No ratings yet
Statistics Unit 1: Data Distribution
125 pages
Basic Statistical Concepts Explained
No ratings yet
Basic Statistical Concepts Explained
9 pages
Frequency Distribution in Data Science
No ratings yet
Frequency Distribution in Data Science
85 pages
Quantitative Data Analysis Techniques
No ratings yet
Quantitative Data Analysis Techniques
10 pages
Frequency Distribution and Histogram Guide
No ratings yet
Frequency Distribution and Histogram Guide
26 pages
Data Management and Descriptive Statistics
No ratings yet
Data Management and Descriptive Statistics
49 pages
Statistical Data Analysis Overview
No ratings yet
Statistical Data Analysis Overview
115 pages
Chap 1 - Introduction To Statistics
No ratings yet
Chap 1 - Introduction To Statistics
16 pages
Constructing Frequency Distributions
No ratings yet
Constructing Frequency Distributions
104 pages
Data Management: Frequency Distribution Techniques
No ratings yet
Data Management: Frequency Distribution Techniques
23 pages
Frequency Distributions in Statistics
100% (4)
Frequency Distributions in Statistics
66 pages
Understanding Variables in Statistics
No ratings yet
Understanding Variables in Statistics
63 pages
Organizing and Presenting Data in Statistics
No ratings yet
Organizing and Presenting Data in Statistics
8 pages
Statistics and Probability Basics
No ratings yet
Statistics and Probability Basics
51 pages
Data Management and Statistical Tools
100% (1)
Data Management and Statistical Tools
86 pages
02.describing Data Qualitative
No ratings yet
02.describing Data Qualitative
23 pages
Data Organization and Presentation Techniques
No ratings yet
Data Organization and Presentation Techniques
42 pages
Statistics Notes
No ratings yet
Statistics Notes
54 pages
Business Statistics: Data Display Methods
No ratings yet
Business Statistics: Data Display Methods
25 pages
Profed (Assessment in Learning) Transes B&W 1st Sem Finals
No ratings yet
Profed (Assessment in Learning) Transes B&W 1st Sem Finals
5 pages
Constructing Frequency Distributions
100% (2)
Constructing Frequency Distributions
25 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
8 pages
Business Statistics Assignment Overview
No ratings yet
Business Statistics Assignment Overview
2 pages
IGNOU MMPC-005 Quantitative Analysis Guide
No ratings yet
IGNOU MMPC-005 Quantitative Analysis Guide
94 pages
Statistical Analysis Summary Report
No ratings yet
Statistical Analysis Summary Report
2 pages
Stats Book
No ratings yet
Stats Book
102 pages
Median and Distribution Analysis in Statistics
100% (1)
Median and Distribution Analysis in Statistics
24 pages
STATA Panel Data Analysis Guide
No ratings yet
STATA Panel Data Analysis Guide
19 pages
Yearly Crime Rate Statistics Analysis
No ratings yet
Yearly Crime Rate Statistics Analysis
2 pages
Statistical Fundamentals: Data Analysis Techniques
No ratings yet
Statistical Fundamentals: Data Analysis Techniques
15 pages
CBSE Mathematics Statistics Question Bank
No ratings yet
CBSE Mathematics Statistics Question Bank
12 pages
Linear Discriminant Analysis in R
No ratings yet
Linear Discriminant Analysis in R
7 pages
Business Statistics Syllabus for B.Com
No ratings yet
Business Statistics Syllabus for B.Com
3 pages
Measures of Dispersion Exercises
No ratings yet
Measures of Dispersion Exercises
9 pages
Descriptive Statistics Exam Paper
No ratings yet
Descriptive Statistics Exam Paper
4 pages
T-Test Analysis of Satisfaction and Age
No ratings yet
T-Test Analysis of Satisfaction and Age
1 page
IT Skills and Data Analysis Report
No ratings yet
IT Skills and Data Analysis Report
40 pages
Understanding the Mode in Data Analysis
No ratings yet
Understanding the Mode in Data Analysis
16 pages
44a Statistical Diagrams Box Plots - H - Question Paper
No ratings yet
44a Statistical Diagrams Box Plots - H - Question Paper
17 pages
Business Quantity Techniques Overview
No ratings yet
Business Quantity Techniques Overview
8 pages
Mathematical & Computational Thinking Course
No ratings yet
Mathematical & Computational Thinking Course
41 pages
Properties of Normal Distribution
No ratings yet
Properties of Normal Distribution
16 pages
BA4212 Data Analysis Lab Record
No ratings yet
BA4212 Data Analysis Lab Record
174 pages
Econometrics Exam Questions and Solutions
No ratings yet
Econometrics Exam Questions and Solutions
3 pages
STA 111 Statistics Practice Questions
No ratings yet
STA 111 Statistics Practice Questions
7 pages
Measures of Central Tendency Worksheet
No ratings yet
Measures of Central Tendency Worksheet
6 pages
Quartiles and Percentiles Calculation
100% (2)
Quartiles and Percentiles Calculation
8 pages
Covariance and Correlation in Statistics
No ratings yet
Covariance and Correlation in Statistics
4 pages
True/False in Statistics Concepts
100% (1)
True/False in Statistics Concepts
9 pages
Understanding Mean, Median, Mode
No ratings yet
Understanding Mean, Median, Mode
3 pages
DAVV 2025 Statistics Predicted Paper
No ratings yet
DAVV 2025 Statistics Predicted Paper
4 pages
Statistical Analysis of Financial Variables
No ratings yet
Statistical Analysis of Financial Variables
12 pages

Understanding Descriptive Statistics

Uploaded by

Understanding Descriptive Statistics

Uploaded by

TOPIC 2: DESCRIPTIVE STATISTICS

There are three major categories of descriptive statistics;

 Measures of central tendency

A frequency distribution shows us a summarized grouping of data divided into mutually

Constructing a frequency distribution table

Rank Degree of agreement Number

Height range Number of plants Cumulative number

less than 5.0 feet 25 25

A pie chart is another graphical device for summarizing qualitative data.

These measures include;

Measures of variability include:

i) Positive or negative relationship/correlation

ii. Linear or non-linear

 Example 1: Increase in taxes correlates with increase in cost of goods

 Example 2: Heights and weights tend to be positively correlated.

 Example 4: Amount of rainfall and yield

 Example 5: Demand and supply of commodities (see table below)

 Demand and supply is expected to follow a positive correlation

 There are many examples of negatively correlated variables;

Taxes and job creation

Food production levels and inflation

 Ideally the relationship assumes the equation;

y= a+ bx; where a=. b= and x=

Assignment: Read about the interpretation of the gradient and intercept.

Example of an equation; y=a+bx+cx2

Coefficient of Correlation (r)

 r takes values between +1 and -1.

 If r assumes a negative value ( -), there is a negative relationship.

 Where the r statistic is +, there is a positive relationship.

You might also like