0% found this document useful (0 votes)
12 views79 pages

Understanding Descriptive Statistics

Uploaded by

nafsiyah xyz
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views79 pages

Understanding Descriptive Statistics

Uploaded by

nafsiyah xyz
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Descriptive Statistics

Research Methodology

Faculty of Dentistry
University of Malaya
Semester 1 Session 2023/2024

‘Abqariyah Yahya, PhD


abqariyahyahya@[Link]
Learning outcomes
• Methods of summarizing data

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 2


Type of Statistics
Inferential
Descriptive Statistics
Statistics

• Information about the sample is used to infer


to the population
• Describes information of the sample
• E.g. of statistics used:
• Statistics presented should be
• p value, 95% CI
interpreted with regards to the sample
• (Hypothesis testing and interval
• Limited to measures of location, central
estimation)
tendency and dispersion
• Extrapolation is part & parcel of the process
• Extrapolation is judgmental

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 3


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 4
(a) Measures of central tendency

Parameter Definition Example


Mean The sum divided by the (4+7+5+9+5)/5=6
number of cases. Xbar =  Xi / n
Median The mid value of an sorted Ordered: 4,5,5,7,9
data. (n+1)/2=(5+1)/2=3; third
value, m=5
Mode The value with the largest M=5 (appear twice)
frequency

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 5


Using Measures of Central Tendency
• Which measure is the best to use?
• Depend on two factors:
i. The scale of the measurement (ordinal or numerical)
ii. The shape of the distribution
• If outlier observations occurs in one direction – skewed distribution

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 6


Using Measures of Central Tendency
• Some guidelines:
• The mean is used for numerical data and for symmetrical (not skewed
distribution)
• The median is used for ordinal data or for numerical data if the distribution is
skewed.
• The mode is used primarily for binomial distributions
• The geometric mean is used primarily for observations measured on a
logarithmic scale.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 7


(b) Measures of variation/ spread
• Do all the observations tend to be quite similar and therefore lie close
to the center, or are they spread out across a broad range of values?

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 8


Measures of Dispersion
• The extent to which values in a distribution deviate from their central
tendency.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 9


Two distributions with identical means,
medians and modes.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 10


Measures of dispersion
Parameter Definition Example
1. Range Difference between the maximum and the minimum 4,5,5,7,9
values Max-Mini=9-4=5
2. Variance The average of the squared deviations from the S2 =  (Xi - X ) 2 / n-1
mean.
3. Standard Square root of variance. (Measure the variability S= S2
deviation around the mean, same measurement unit)
4. Coefficient of the ratio of the standard deviation to the mean (to CV=S/Xbar
variation describe the dispersion of the variable in a way does
not depend on the measurement unit)
5. Inter-quartile a measure of where the “middle fifty” is in a data set Q3-Q1
range
6. Percentile A percentage of a distribution that is equal to or 95th percentile of weigh=12; in
below a particular number. the data, 95% weigh 12kg or
less

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 11


Summary of quantiles
Quantile No of
Interval Description in ordered set
name quantiles
2 Median 1 50% of observations both above and below
median
4 Quartiles 3 25% of observations below 1st, above 3rd and
between successive quartiles
5 Quintiles 4 20% of observations below 1st, above 4th and
between successive quintiles
10 Deciles 9 10% of observations below 1st, above 9th and
between successive deciles
100 Percentiles 99 1% of observations below 1st, above and
between successive percentiles
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 12
Measures of skewness

Parameter Definition
SK = 0 Symmetry
0 < SK < 3 Skewed to the right
-3 < SK < 0 Skewed to the left

SK = 3*( X - Md ) / S
Pearson 2 skewness coefficient

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 13


Using different measures of dispersion
• Some guidelines:
• The SD is used when the mean is used-symmetric data.
• The range is used with numerical data when the purpose
is to emphasize extreme values
• The CV is used when the intent is to compare numerical
distributions measured on different scales.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 14


Using different measures of dispersion
• Some guidelines:
• Percentiles and the IQR are used in two situations:
i. When the median is used, e.g. median(IQR);
ii. When the mean is used but the objective is to
compare individual observations with a set of norms

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 15


2. Summarizing/ displaying nominal & ordinal
data with numbers
a) Meaningful statistic:
i. Proportion
ii. Percentage
iii. Ratio
iv. Rate

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 16


i. Proportion
• A proportion is the number a (with a given characteristic) devided by
the total number of observations, a+b
𝑎
𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 =
𝑎+𝑏
• A part divided by the whole
• Useful for ordinal, nominal and numerical data

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 17


ii. Percentage
• Percentage is the proportion multiplied by 100%
𝑎
𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 = × 100%
𝑎+𝑏

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 18


iii. Ratio
• the quantitative relation between two amounts showing the number
of times one value contains or is contained within the other.
• A ratio says how much of one thing there is compared to another
thing.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 19


• There are 3 blue squares to 1 yellow square
• Ratios can be shown in different ways:
• Using the ":" to separate the values: 3 : 1
• Instead of the ":" we can use the word "to": 3 to 1
• Or write it like a fraction:
3
1

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 20


iv. Rate
• Rates are used by people every day, such as when they work 40 hours
a week or earn interest every year at a bank.
• When rates are expressed as a quantity of 1, such as 2 feet per
second or 5 miles per hour, they are called unit rates.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 21


• Similar to proportions except that a multiplier (e.g. 1000, 10,000 or
100,000) is used, and they are computed over a specified period of
time
• The multiplier is called the base
𝑎
𝑅𝑎𝑡𝑒 = × 𝐵𝑎𝑠𝑒
𝑎+𝑏

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 22


Example of rate
• If the aspirin study among physicians had lasted exactly 1 year, the
rate of MI per 10,000 physicians taking aspirin per year would be
(139/11,037)x(10,000) or 0.0126 x 10,000 or 126 per 10,000
physicians per year

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 23


2. Summarizing/ displaying nominal & ordinal
data with numbers
b) Tables & Graphs
i. Simple frequency table
ii. Contingency table
iii. Bar chart
iv. Pie chart

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 24


a. Frequency table
• Summarizing data using tables

sbp interval | Freq. Percent Cum.


------------+-----------------------------------
94- 113.2 | 21 21.00 21.00
132.4 | 36 36.00 57.00
151.6 | 29 29.00 86.00
170.8 | 12 12.00 98.00
190 | 2 2.00 100.00
------------+-----------------------------------
Total | 100 100.00

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 25


b. Contingency table

Table #: Favorite way to eat ice cream between gender

gender cup cone sundae sandwich other total

male 592 300 204 24 80 1200

female 410 335 180 20 55 1000

total 1002 635 384 44 135 2200

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 26


b. Contingency table

Table #: Birth weight by medical condition of mother

Birth Weight
Medical condition of mother Less than 2.5 kg Equal to or greater
than 2.5 kg
Yes 26 (44.5%) 32 (55.2%)
No 103 (10.0%) 927 (90.0%)
Total 129 (11.8%) 959 (88.2%)

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 27


Summarizing data using
graphs
Displaying Qualitative Data
Bar Chart

Birth Order of Spring 1998 Stat 250 Students

40

30
Percent

20

10

Middle Oldest Only Youngest


Birth Order
n=92 students

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 30


Bar Chart
• Summarizes categorical data.
• Horizontal axis represents categories, while vertical axis represents
either counts (“frequencies”) or percentages (“relative frequencies”).
• Used to illustrate the differences in percentages (or counts) between
categories.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 31


Pie Chart

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 32


Displaying Quantitative Data
Histogram
• Divide measurement up into equal-sized categories.
• Determine number (or percentage) of measurements falling into each
category.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 34


Too few categories

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 35


Too many categories

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 36


Stem-and-Leaf Plot
Stem-and-leaf of Shoes N = 139 Leaf Unit = 1.0

12 0 223334444444
63 0 555555555555566666666677777778888888888888999999999
(33) 1 000000000000011112222233333333444
43 1 555555556667777888
25 2 0000000000023
12 2 5557
8 3 0023
4 3
4 4 00
2 4
2 5 0
1 5
1 6
1 6
1 7
1 7 5
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 37
Stem-and-Leaf Plot
• Summarizes measurement data.
• Each data point is broken down into a “stem” and a “leaf.”
• First, “stems” are aligned in a column.
• Then, “leaves” are attached to the stems.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 38


Box Plot

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 39


Box Plot
• Summarizes measurement data.
• Vertical (or horizontal) axis represents measurement scale.
• Lines in box represent the 25th percentile (“first quartile”), the 50th
percentile (“median”), and the 75th percentile (“third quartile”),
respectively.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 40


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 41
Box Plot
• Roughly speaking:
• The “25th percentile” is the number such that 25% of the data points fall
below the number.
• The “median” or “50th percentile” is the number such that half of the data
points fall below the number.
• The “75th percentile” is the number such that 75% of the data points fall
below the number.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 42


Using Box Plots to Compare

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 43


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 44
Quantile-Quantile Plots (QQ Plots)
• Numerical data
• Visually compare collected data with a known distribution – normal
distribution (Normal QQ plots)
• We check to see whether the sample follows a normal distribution
• Note:
• In a normal QQ plot, if your scatter plot “hugs” the line, there is good reason
to believe that your data is normally distributed.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 45


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 46
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 47
What if your data is not normally distributed?
• Perform transformation
• For right / positively skewed data
• Log or;
• Square root
• For left/ negatively skewed data
• Exponential or;
• Square

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 48


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 49
Histograms vs Boxplots vs QQ plots
Histograms Boxplots QQ plots
Advantages With properly-sized bins, Don’t have to weak Can identify whether
histograms can summarize with “graphical” the data came from a
any shape of the data parameters (i.e. bin certain distribution
(modes, skew, quantiles, size in histograms)
outliers)
Summarize skew, Don’t have to tweak
quantiles, and with “graphical”
outliers parameters (i.e. bin
size in histograms)
Can compare several Summarize quantiles
measurements side-
by-side
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 50
Histograms vs Boxplots vs QQ plots
Histograms Boxplots QQ plots
Disadvantages Difficult to compare side- Cannot distinguish Difficult to compare
by-side (takes up too modes! side-by-side
much space in a plot)
Depending on the size of Difficult to
the bins, interpretation distinguish skews,
may be different modes, and outliers

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 51


Dotplot
• A dotplot is a type of graphic display used to compare
frequency counts within categories or groups.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 52


Dotplot
• Each dot represents a specific number of observations from a set of
data. (Unless otherwise indicated, assume that each dot represents
one observation. If a dot represents more than one observation, that
should be explicitly noted on the plot.)

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 53


Which graph to use when?
• Stem-and-leaf plots and dotplots are good for small data sets, while
histograms and box plots are good for large data sets.
• Boxplots and dotplots are good for comparing two groups.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 54


Which graph to use when?
• Boxplots are good for identifying outliers.
• Histograms, boxplots and dotplot are good for identifying “shape” of
data.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 55


Scatter Plots

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 56


Scatter Plots
• Summarizes the relationship between two measurement variables.
• Horizontal axis represents one variable and vertical axis represents
second variable.
• Plot one point for each pair of measurements.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 57


No relationship

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 58


Which graph to use?
• Depends on type of data
• Depends on what you want to illustrate
• Boxplots are good for identifying outliers.
• Histograms and boxplots are good for identifying “shape” of data.

• Depends on statistical software

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 59


Descriptive Statistics using SPSS

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 60


Introduction
• All statistical analyses will be conducted using the same dataset called
[Link]. A brief description about the dataset as follows:
• This dataset contains 5,811 observations. It is about a study done in a
university to study the pattern of anxiety and resilience among
university students during COVID 19 pandemic. The main aim of this
study is to measure the association between anxiety and resilience
among university students during COVID 19 pandemic.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 61


Summarizing Data
Frequencies

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 62


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 63
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 64
Saving syntax

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 65


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 66
We can customize the syntax file by adding
more commands starting from Open a
dataset and many more.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 67


You can also save the Syntax file for later use.
Please give a meaningful name and save it in an
appropriate folder.

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 68


Frequencies

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 69


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 70
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 71
Descriptives

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 72


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 73
Explore

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 74


Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 75
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 76
Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 77
Normality Test

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 78


Thank you

Dr 'Abqariyah Yahya, Semester 1 Session 2023/2024 79

You might also like