0% found this document useful (0 votes)
13 views8 pages

Data Management in Modern Mathematics

The document covers key concepts in data management and statistics, including data collection methods (qualitative and quantitative), scales of measurement (nominal, ordinal, interval, and ratio), and organizing data through frequency distribution tables. It also discusses measures of central tendency (mean, median, mode) and measures of dispersion (range, interquartile range, variance, standard deviation). Additionally, it explains measures of relative position, including percentiles and z-scores, with examples for clarity.

Uploaded by

Hbugvfyc
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Data Management in Modern Mathematics

The document covers key concepts in data management and statistics, including data collection methods (qualitative and quantitative), scales of measurement (nominal, ordinal, interval, and ratio), and organizing data through frequency distribution tables. It also discusses measures of central tendency (mean, median, mode) and measures of dispersion (range, interquartile range, variance, standard deviation). Additionally, it explains measures of relative position, including percentiles and z-scores, with examples for clarity.

Uploaded by

Hbugvfyc
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MATHEMATICS IN THE MODERN WORLD (Midterms) 2.

Classes should continue throughout the


distribution with no gaps. Include all
Lesson 1: DATA MANAGEMENT classes.
➢ STATISTICS 3. All classes should have the same width.
→ It is a branch of science that deals with the 4. Class widths should be “convenient”
development of methods for a more effective numbers.
way of collecting, organizing, presenting, and 5. Use 5-20 classes.
analyzing data. 6. Make lower or upper limits multiples of the
➢ DATA COLLECTION width.
→ The process of gathering and measuring → Example Data Set:
information about variables on study in an
established systematic procedure, which then
enable to answer relevant questions at hand
and evaluate outcomes.

TWO TYPES OF DATA COLLECTION METHODS

1. QUALITATIVE DATA COLLECTION METHOD


→ It produces results that are easy to manipulate
for interpretation. This uses individual
interviews, focus groups, and observations.
2. QUANTITATIVE DATA COLLECTION METHOD
→ This is collected through direct interaction with
→ Solution:
individuals on a one-to-one basis or direct
1. Arrange
interaction with individuals in a group setting
that makes it time consuming and expensive.

SCALES OF MEASUREMENT

1. NOMINAL – this scale is used for classifying and


labeling variables without quantitative value.
2. ORDINAL – can easily be remembered because it
sounds like order which matters in ordinal scale. In
this scale, the values between intervals do not have
meaning.
3. INTERVAL – one problem with interval scale, it 2. Identify the range
does not have a “true zero.”
4. RATIO – possesses the characteristics of nominal, 𝑅 = 𝐻𝑣 − 𝐿𝑣
ordinal, and interval scale.
𝑅 = 108 − 54
ORGANIZING DATA
𝑅 = 𝟓𝟒
➢ FREQUENCY DISTRIBUTION TABLE (FDT)
→ Grouping all the observations into intervals or 3. Solve for k
classes together with a count of the number of 𝑘 = 1 + 3.322 log(𝑛)
observations that fall in each interval or class.
𝑘 = 1 + 3.322 log(54)

𝑘 = 𝟔. 𝟕𝟔 𝒐𝒓 𝟕

→ note that n=54 (do not confuse n and range)


→ k is the expected number of classes
4. Identify the c (width of the classes)
𝑟𝑎𝑛𝑔𝑒
𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 =
𝑐𝑙𝑎𝑠𝑠𝑒𝑠
54 (𝑟𝑎𝑛𝑔𝑒)
𝑐=
6.76 (𝑐𝑙𝑎𝑠𝑠𝑒𝑠)

→ Guidelines for Frequency Tables: 𝑐 = 𝟕. 𝟗𝟗 𝒐𝒓 𝟖


1. Class intervals should not overlap. Classes
are mutually exclusive. 5. Identify the intervals, starting with the
lowest value
Intervals 8. Solve for the <cf, >cf, rf
LL UL
54 61
62 69
70 77
78 85
86 93
94 101
102 109

→ Class Boundary
i. Subtract the upper class limit for the first
class from the lower class limit for the
second class.
ii. Divide the result by two.
iii. Subtract the result from the lower class limit
and add the result to the upper class limit
for each class.
iv. d
6. Compute for the class midpoint

9. Completed table

7. Tally and add the frequency

Lesson 2: MEASURES OF CENTRAL TENDENCIES

➢ A central tendency (or measure of central


tendency) is a central or typical value for a
probability distribution. It may also be called a
center or location of the distribution.
➢ The term central tendency refers to the “middle”
value or perhaps a typical value of the data, and is
measured using the mean, median, or mode.
➢ MEAN – the balance point of the observation.
→ e.g., find the mean of the set of values: 3, 4, 6,
8, 9, and 5
→ Solution:
3+4+6+8+9+5
𝑥̄ =
6
35
𝑥̄ =
6
𝑥̄ = 𝟓. 𝟖𝟑
➢ MEDIAN – the value separating the higher half of a = 15
data sample, a population, or a probability
distribution, from the lower half. → STANDARD DEVIATION – the square root of
→ e.g., in the data set {1, 3, 3, 6, 7, 8, 9}, the its variance. A low standard deviation indicates
median is 6, the fourth largest, and also the that the data set tend to be close to the mean.
fourth smallest, number in the sample. A high standard deviation indicates that the
→ e.g., in the data set {2, 4, 7, 8, 9, 12}, the spread of data points is of wider range.
7+8
median is 𝑚𝑑 = 2 = 𝟕. 𝟓. • e.g., consider the example in variance
when:
➢ MODE – the value is the value that appears most
• 𝑠 2 = 15
often; the most commonly occurring value in a
• 𝑠 = √15
distribution; may exist sometimes does not.
→ e.g., given the data set {3, 2, 7, 3, 4, 5}, the • 𝑠 = 𝟑. 𝟖𝟕
mode is 3. → ABSOLUTE DEVIATION – the average of the
→ e.g., given the data set {4, 5, 3, 4, 7, 5, 6, 3, 8}, absolute deviation from the central point or the
the mode are 4, 5, and 3. average of the average distance between each
data value and the mean.
MEASURES OF DISPERSION • To find the absolute deviation:
1. Arrange the values from highest to
➢ It measures how the various elements behave with
lowest.
regards to central tendency.
2. Compute the mean and the absolute
➢ Includes:
deviation of each value from the mean.
→ RANGE – it is the difference between the 𝚺|𝒙−𝒎𝒆𝒂𝒏|
highest value and the lowest value; it tells how 3. Compute using: 𝑨𝑫 =
𝒏
far is the lowest value from the highest value. • e.g., consider the number of blender unit
• e.g., given the data set {5, 8, 7, 12, 12, 13, sold by a store for 1 week: (5, 7, 2, 8, 9, 6,
18} 12)
• 𝑅 = 18 − 5
Unit Sold |𝒙 − 𝒎𝒆𝒂𝒏|
• 𝑅 = 𝟏𝟑
2 5
→ INTERQUARTILE RANGE – it is the difference
5 2
between the 75th and the 25th percentile or 6 1
between upper and lower quartile and is also 7 0
called the midspread. 8 1
• e.g., when 𝑄3 = 17.5 and 𝑄1 = 10.5, 𝐼𝑄𝑅 = 9 2
17.5 − 10.5 = 𝟕. 𝟎 12 5
→ VARIANCE – the expectation of the squared 16
𝑛=7 𝐴𝐷 =
deviation of a random variable from its mean. It 7
𝑚𝑒𝑎𝑛 = 7
measures how far a set of numbers are spread 𝐴𝐷 = 2.29
out from their average value.
• To determine the variance of ungrouped MEASURES OF RELATIVE POSITION
data, ➢ Sometimes referred to as measure of location is
1. Arrange the data from highest to considered as the extension of median. It talks
lowest. about the position/location of the value relative to
2. Calculate the mean and the deviations the other values in the data set.
of each item from the mean. ➢ In the measure of relative position, it is the location
𝚺(𝒙−𝒎𝒆𝒂𝒏)𝟐
3. Use the formula: 𝒔𝟐 = of the value in the data set that is being described.
𝒏−𝟏
• e.g., consider the following scores of ➢ It refers to the position/location of the value relative
students in an achievement exam (15, 19, to the other values in the data set.
11, 13, 17, 10, 20) ➢ Includes: percentiles, quartiles, deciles
➢ PERCENTILES
x 𝒙 − 𝒎𝒆𝒂𝒏 (𝒙 − 𝒎𝒆𝒂𝒏)𝟐 → Positional measures used mainly in educational
10 -5 25 and health-related fields to indicate the position
11 -4 16 of an individual in a group.
13 -2 4 → For example, the graphs and tables show the
15 0 0 percentiles for various measures such as test
17 2 4 scores, height, or weight.
19 4 16 → Denoted as 𝑃𝑥 , it divides a set of data into 100
20 5 25 equal parts.
𝑛=7 2
90
𝑠 =
𝑚𝑒𝑎𝑛 = 15 7−1
→ They are used as positional measures to 2. Next, compute the position using
indicate what percent of the data set have a 𝑸𝒌 :
𝒌(𝒏+𝟏)
𝒕𝒉.
𝟒
value less than a specified value.
3. If the result is an integer, the k-th quartile is
→ Percentiles are not the same as percentages.
the data value in that location.
For example, if a student gets 72 correct
4. When the result has a decimal number, the
answers out of 100 on a test, they earn 72%. If
k-th quartile is the data value in that location
a score of 72 correct answers corresponds to
of the whole number part plus the fractional
the 64th percentile, then they did better than
part of the difference of the data value in
64% of the students in the class, but still
that location minus the data value after it.
received a score of 72%.
→ e.g., the following data shows the fasting serum
→ How to compute percentile:
triglyceride level (mg/dl) of 10 male medical
1. First, arrange the observation from lowest
students: (49, 72, 86, 101, 129, 54, 78, 88, 106,
to highest.
151). Find 𝑄1 and 𝑄3 .
2. Next, compute the position using
𝒌(𝒏+𝟏) 1. Arrange the data in ascending order:
𝑷𝒌 : 𝒕𝒉.
𝟏𝟎𝟎
3. If the result is an integer, the k-th percentile 49, 54, 72, 78, 86, 88, 101, 106, 129, 151
is the data value in that location. 2. Compute for the location or 𝑄1 :
4. When the result has a decimal number, the
k-th percentile is the data value in that 1(10 + 1)
𝑄1 : = 𝟐. 𝟕𝟓
location of the whole number part plus the 4
fractional part of the difference of the data
3. Thus,
value in that location minus the data value
after it. 𝑄1 = 54 + 0.75(72 − 54) = 67.5 ≈ 𝟔𝟖
→ e.g., the following data shows the fasting serum
triglyceride level (mg/dl) of 10 male medical 4. Compute for the location or 𝑄3 :
students: (49, 72, 86, 101, 129, 54, 78, 88, 106,
3(10 + 1)
151). Find 𝑃50 , 𝑃85 𝑄3 : = 𝟖. 𝟐𝟓
4
1. Arrange the data in ascending order:
5. Thus,
49, 54, 72, 78, 86, 88, 101, 106, 129, 151
𝑄3 = 106 + 0.25(129 − 106) = 111.75 ≈ 𝟏𝟏𝟐
2. Compute for the location or 𝑃50 :
STANDARD SCORES OR Z-SCORES
50(10 + 1)
𝑃50 : = 𝟓. 𝟓
100 ➢ Z-SCORE
→ Indicates how many standard deviations an
3. Thus,
element is from the mean.
𝑃50 = 86 + 0.5(88 − 86) = 𝟖𝟕 → Allows us to calculate the probability of a score
occurring within our normal distribution.
4. Compute for the location or 𝑃85 : → It can also help in comparing scores that are
from different normal distribution.
85(10 + 1)
𝑃85 : = 𝟗. 𝟑𝟓 → Computed using the formula:
100
𝒙−𝒙̅
5. Thus, 𝒛=
𝑺
𝑃85 = 129 + 0.35(151 − 129) = 136.7 ≈ 𝟏𝟑𝟕
→ e.g., given a mean of 60 and a standard
➢ QUARTILE deviation of 5, find the corresponding z-score of
→ This measure divides the observation in four 65.
equal parts. The first quartile, denoted as 𝑸𝟏 , is 65 − 60
the middle point between the smallest and the 𝑧= =𝟏
5
center value.
→ The center value is second quartile, denoted as → This means that 65 is one standard deviation
𝑸𝟐 . It is the middle value between the median higher than the mean.
and the highest value of the data set. → e.g., in a report card of Cris shows that his
→ The third quartile, denoted as 𝑸𝟑 , is the middle grade in Math is 98 and in Science is 90. When
value between the median and the highest the mean grade in Math is 90 and a standard
value of the data set. deviation was 10. In Science, the mean grade
→ How to compute quartile: is 80 and the standard deviation was 5. In which
1. First, arrange the observation from lowest subject Cris performs better?
to highest. • for Math:
98 − 90 should be sampled from each subgroup.
𝑧=
10 Then you use random or systematic
sampling to select a sample from each
𝑧 = 𝟎. 𝟖
subgroup.
• for Science: 4. CLUSTER SAMPLING
• Also involves dividing the population into
90 − 80 subgroups, but each subgroup should have
𝑧=
5 similar characteristics to the whole sample.
Instead of sampling individuals from each
𝑧=𝟐
subgroup, you randomly select entire
→ The values shows that Math is 0.8 higher than subgroups.
the mean while Science is 2 standard deviation • If it is practically possible, you might include
higher than the mean, thus, Cris performs every individual from each sampled cluster.
better in Science. If the clusters themselves are large, you
can also sample individuals from within
Lesson 3: PROBABILITY SAMPLING each cluster using one of the techniques.
➢ SAMPLE – the group of individuals who will actually This is called multistage sampling.
participate in the research. • This method is good for dealing with large
➢ SAMPLING METHOD – carefully deciding how one and dispersed populations, but there is
will select a sample that is representative of the more risk of error in the sample, as there
group as a whole to draw valid conclusions from the could be substantial differences between
results. clusters.
➢ NON-PROBABILITY SAMPLING
PROBABILITY SAMPLING METHODS 1. Convenience Sampling
2. Purposive Sampling
➢ PROBABILITY SAMPLING – means that every
3. Quota Sampling
member of the population has a chance of being
4. Snowballing Sampling
selected. It is mainly used in quantitative research.
There are four main types of probability sampling: ➢ PROBABILITY AND NORMAL DISTRIBUTION
1. SIMPLE RANDOM SAMPLING → The chance or likelihood of an event to happen.
• In a simple random sample, every member → It can be expressed as proportions from 0 to 1
of the population has an equal chance of or percentages from 0% to 100%
being selected. Your sampling frame ➢ UNCONDITIONAL PROBABILITY
# 𝑝𝑒𝑟𝑠𝑜𝑛𝑠 𝑤𝑖𝑡ℎ 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐
should include the whole population. → P (characteristic) = 𝑁
• To conduct this type of sampling, you can ➢ CONDITIONAL PROBABILITY
use tools like random number generators or # 𝑝𝑒𝑟𝑠𝑜𝑛𝑠 𝑤𝑖𝑡ℎ 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐
→ P (characteristic) =
other techniques that are based entirely on 𝑁
chance. ➢ e.g., consider the data below that shows the
2. SYSTEMATIC SAMPLING number of gadget addictiveness among children 5-
• Similar to simple random sampling, but 10 years of age who are seeking medical care.
usually slightly easy to conduct. Every
member of the population is listed with a
number, but instead of randomly generating
numbers, individuals are chosen at regular
intervals.
• If you use this technique, it is important to
make sure that there is no hidden pattern in
the list that might skew the sample.
3. STRATIFIED SAMPLING
• Involves dividing the population into → UNCONDITIONAL PROBABILITY
subpopulations that may differ in important
2560
ways. 𝑃(𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑛𝑔 𝑎 𝑏𝑜𝑦) =
5290
• To use this sampling method, you divide the
population into subgroups (called strata) 913
𝑃(𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑛𝑔 𝑎 7 𝑦𝑒𝑎𝑟 𝑜𝑙𝑑 ) =
based on relevant characteristics (e.g., 5290
gender identity, age range, income bracket,
→ CONDITIONAL PROBABILITY
job role).
• Based on the overall proportions of the 460
population, you calculate how many people 𝑃(9 𝑦𝑒𝑎𝑟 𝑜𝑙𝑑 𝑔𝑖𝑟𝑙 ) = = 0.169
2730
NORMAL DISTRIBUTION • This problem is calculated by subtracting
the lesser than part of the area. Thus, the
➢ Characteristics of a Normal Distribution:
whole area is 1 minus to the left of 24 cm.
1. Symmetric
2. Unimodal – one mode
3. Asymptotic
4. Equal – the mean, median, and mode are all
equal
➢ e.g., you join in a crab catching contest. The sizes
of the crabs is normally distributed with a mean of
16 cm and a standard deviation of 4. What is the
chance of catching crabs that are less than 8 cm? • In symbols, we have: 𝑃(𝑥 > 24) = 𝑃 (𝑧 > 2)
What is the chance of winning a prize if the prize is • Note that: 𝑃(𝑧 > 2) + 𝑃(𝑧 < 2) = 1
offered for any crabs over 24 cm? What is the o Therefore:
chance of catching crabs between 16 cm and 24 𝑃(𝑧 > 2) = 1 − 𝑃(𝑧 < 2) = 1 − .9772 = .0228
cm?
➢ Solution: o This means that the probability of getting
1. Draw a picture of the normal distribution. crabs bigger than 24 cm is 2.28%.
7. Problem: 𝑃(16 < 𝑥 < 24) = 𝑃(0 < 𝑧 < 2)
• The probability of catching crabs with sizes
between 16 cm and 24 cm is 47.72% which
is also the probability of winning a prize.

2. Locate the problems in the graph.


3. Translate each problem into probability
notation.
• Problem 1: 𝑃(𝑥 < 8)
• Problem 2: 𝑃(𝑥 > 24) o 𝑃 (0 < 𝑧 < 2 ) = 𝑃 (𝑧 < 2) − 𝑃 ( 𝑧 < 0) =
• Problem 3: 𝑃(16 < 𝑥 < 24) .9772 − .5000 = .4772
4. Change the x values into z-scores.
OTHER EXAMPLES
• When x is 8
1. 𝑃(𝑍 < −1.54) = 0.0618
8 − 16
𝑧= = −2 2. 𝑃(𝑍 > 2.13) = 1 − 𝑃 (𝑍 < 2.13) = 1 − .9834 =
4
.0166
• When x is 24 3. 𝑃(𝑍 < 2.11) − 𝑃(𝑍 < −1.14) = .9826 − .1271 =
.8555
24 − 16
𝑧= =2
4 Lesson 4: REGRESSION ANALYSIS AND
CORRELATION ANALYSIS
• Thus,
• Problem 1: 𝑃(𝑥 < 8) = 𝑃(𝑧 < −2) ➢ LSRL
• Problem 2: 𝑃(𝑥 > 24) = 𝑃(𝑧 > 2) → Given a bivariate quantitative dataset the least
• Problem 3: 𝑃(16 < 𝑥 < 24) = 𝑃(0 < 𝑧 < 2) square regression line, almost always
5. Calculate probabilities (use z-scores table) abbreviated to LSRL, is the line for which the
• Problem 1: 𝑃(𝑥 < 8) = 𝑃 (𝑧 < −2) = 0.0228 sum of the squares of the residuals is the
o The probability of getting crabs smaller smallest possible.
than 8 cm is 2.28%. REGRESSION ANALYSIS

➢ REGRESSION ANALYSIS
→ A statistical method that makes use of the
relationship between two or more quantitative
variables.
➢ DEPENDENT OR RESPONSE VARIABLE (Y)
→ Can be explained with the knowledge of the
6. Calculate the probability of getting crabs larger
values of the other variable, called the
than 24 cm.
Independent or Explanatory Variable (X).
➢ In regression analysis, the variable that is being
predicted or estimated is typically referred to as the
dependent variable or the response variable. This ➢ Note that:
is the variable whose values are being modeled or 5
estimated based on one or more independent ∑ 𝑥𝑖 𝑦𝑖 = 30500
variables, also known a predictor variables or 𝑖=1

explanatory variables. 5
➢ The two main purposes of regression: ∑ 𝑥𝑖 = 390
1. To measure relationship between two or more 𝑖=1
variables.
5
2. To predict or estimate values of dependent
variable from the known values of independent ∑ 𝑦𝑖 = 385
variables. 𝑖=1

➢ The term “simple” in simple linear regression is 5


used when we consider only one independent ∑ 𝑥𝑖2 = 31150
variable. 𝑖=1
➢ The line with “best fit” is that line such that when the 5
differences between the actual values of Y and the
∑ 𝑦𝑖2 = 30275
predicted values of Y based on the regression line
𝑖=1
for each X are squared and summed, the sum is
minimum. ➢ Thus,
➢ The estimated regression line is defined by the
equation: 1
𝑦̅ = (385) = 77
5
𝒀′ = 𝒂 + 𝒃𝑿, …
1
𝑥̅ = (390) = 78
→ Where 𝑌′ is the predicted dependent variable, 5
𝑋 is the independent variable, and 𝑎 and 𝑏 are
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑦𝑖
the least estimates of the parameters of 𝑏=
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2
regression which are calculated from the
available sample points. (5)(30500) − (390)(385)
𝑏= = 𝟎. 𝟔𝟒
ESTIMATION OF PARAMETERS (5)(31150) − (390)2

𝒏 ∑𝒏 𝒏 𝒏
𝒊=𝟏 𝒙𝒊 𝒚𝒊−∑𝒊=𝟏 𝒙𝒊 ∑𝒊=𝟏 𝒚𝒊
𝑎 = 𝑦̅ − 𝑏𝑥̅
𝒃= 𝟐 &𝒂=𝒚
̅ − 𝒃𝒙
̅
𝒏 ∑𝒏 𝟐 𝒏
𝒊=𝟏 𝒙𝒊 −(∑𝒊=𝟏 𝒙𝒊)
𝑎 = 77 − (0.64)(78) = 𝟐𝟕. 𝟎𝟖
➢ Where 𝑏 is the regression coefficient or the slope of
➢ The regression equation for the data is:
the regression line, 𝑎 is the constant of regression
or the y-intercept of the regression line. 𝑦̂ = 27.08 + 0.64𝑥
𝟏 𝟏
➢ Moreover, 𝒚 ̅ = ∑𝒏𝒊=𝟏 𝒚𝒊 and 𝒙 ̅ = ∑𝒏𝒊=𝟏 𝒙𝒊 are the
𝒏 𝒏 ➢ If a student made an 80 on the aptitude test, her
means of the sample value of x and y. expected grade in statistics would be:
➢ e.g., last year, five randomly selected students took
a math aptitude test before they began their 𝑦̂ = 27.08 + 0.68(80) = 𝟕𝟖. 𝟐𝟖 ≈ 𝟕𝟖
statistics course. The Statistics Department has a
question. What linear regression equation best
predicts statistics performance, based on math CORRELATION ANALYSIS
aptitude scores? If a student made an 80 on the
aptitude test, what grade would we expect her to ➢ Correlation Analysis attempts to measure the
make in statistics? strength of the relationship between two random
➢ Solution: variables by means of a single number called a
correlation coefficient.
➢ Correlation Coefficient measures the strength of
the linear relationship between two variables, X and
Y.
➢ The formula for correlation coefficient:
𝒏 ∑𝒏𝒊=𝟏 𝒙𝒊 𝒚𝒊 − ∑𝒏𝒊=𝟏 𝒙𝒊 ∑𝒏𝒊=𝟏 𝒚𝒊
𝒓=
√[𝒏 ∑𝒏𝒊=𝟏 𝒙𝟐𝒊 − (∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐 ][𝒏 ∑𝒏𝒊=𝟏 𝒚𝟐𝒊 − (∑𝒏𝒊=𝟏 𝒚𝒊 )𝟐 ]
SCATTER PLOT EXAMPLES ➢ e.g.,

➢ Solution: Based on our previous example on


student’s aptitude and statistics grade, the
computed r would be:
5
∑ 𝑥𝑖 𝑦𝑖 = 30500
𝑖=1

∑ 𝑥𝑖 = 390
𝑖=1

∑ 𝑦𝑖 = 385
𝑖=1

∑ 𝑥𝑖2 = 31150
𝑖=1

∑ 𝑦𝑖2 = 30275
𝑖=1
EXAMPLES OF APPROXIMATE R-VALUES
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑦𝑖
𝑟=
√[𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 ][𝑛 ∑𝑛𝑖=1 𝑦𝑖2 − (∑𝑛𝑖=1 𝑦𝑖 )2 ]

(5)(30500) − (390)(385)
𝑟=
√[(5)(31150) − (390)2 ][(50)(30275) − (385)2 ]
= 𝟎. 𝟗𝟗

➢ The relationship between student’s statistics grade


and aptitude is strong.

Note: 𝑟 2 =. 𝟗𝟖𝟎𝟏 ≈ 𝟗𝟖. 𝟎𝟏%

SUGGESTED INTERPRETATION OF 𝒓
Correlation Coefficient 𝑟 Interpretation
−0.25 to 0.25 no or weak relationship
−0.5 to −0.25 or 0.25 to fair degree of
0.5 relationship
−0.75 to −0.5 or 0.5 to moderate to good
0.75 relationship
−1.0 to −0.75 or 0.75 to good to strong
1.0 relationship

➢ If the correlation coefficient is a positive value, it


indicates a positive linear relationship between the
two variables. In other words, as one variable
increases, the other variable tends to increase as
well.

You might also like