0% found this document useful (0 votes)
13 views27 pages

Introduction to Basic Statistics Concepts

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

Introduction to Basic Statistics Concepts

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Statistics

Manufacturing and Industrial Engineering


Lecture 1
Basic Concepts of Statistics

* Definition:
Statistics is the science of data. It involves
collecting, classifying, summarizing, organizing,
and interpreting numerical information.

* What is the Objective of Statistics?


Is to make inference (prediction, and decisions)
about a population based upon information
contained in sample.
•Fundamental Elements of Statistics:
Statistics methods are particularly useful for studying,
analyzing, and learning about populations.
A population is a set of units (usually people, objects,
transactions, or events) that we are in studying.
For examples, population may include
1) All employed workers in a company
2) All registered voters in an election
3) All the cars produced last year by a particular
assembly line
In studying a population, we focus on one or more
characteristics or properties of the units in the
population. We call such characteristics variables.
A variable is a characteristic or property of an
individual population unit.
When we measure a variable for every unit of a
population, the result is called a census of the
population.
For such populations, conducting a census would be
prohibitively time-consuming and/or costly.
A reasonable alternative would be to select and study a
subset (or portion) of the units in the population.
A sample is a subset of the units of a population.
Statistical Inference is an estimate or prediction or
some other generalization about a population based on
information contained in a sample.
Since the statistics is the science of data and are obtained by
measuring the values of one or more variables on the units
in the sample (or population). All the data (and the variables
we measure) can be classified as one of two general types:
1. Quantitative data are measurements that are recorded
on a naturally occurring numerical scale.
2. Qualitative data are measurements that cannot be
measured on a natural numerical scale; they can only be
classifies into one of a group of categories.
Ex:
Chemical and manufacturing plants sometimes discharge toxic-
waste materials such as DDT into nearby rivers and streams. These
toxins can adversely affect the plants and animals inhabiting the
river and river bank. The U.S. Army Corps of Engineers conducted a
study of fish in the Tennessee River (in Alabama) and its three
tributary creeks. A total of 144 fish were captured and the
following variables measured for each:
1. River/creek where fish was captured.
2. Species (catfish, largemouth bass, or smallmouth buffalo fish)
3. Length (centimeters)
4. Weight (grams)
5. DDT concentration (parts per million)
Classify each of the five variables measured as quantitave or
qualitative.
Solution:
The variables length, weight, and DDT are quantitative
because each is measured on a numerical scale:
Length in centimeters
Weight in grams
DDT in parts per million
In contrast, river/creek and species cannot be measured
quantitatively, they can only be classified into categories
(e.g., channel catfish, largemouth bass and smallmouth
buffalofish for species. Consequently, data on river/creek and
species are qualitative.
Collecting Data:
To solve a problem at hand, you need to decide the appropriate type of
data-quantitative or qualitative- so you’ll need to collect the data.
Generally, you can obtain the data in four different ways:
1. Data from a publish source, such as book, journal, or newspaper.
2. A data from a designed experiment, in which researcher exerts strict
control over the units (people, objects, or events) in the study. For
example, a recent medical study investigated the potential of aspirin
in preventing heart attacks.
3. Data from survey, in which the researcher samples a group of people,
asks one or more questions, and records the responses. Probably the
most familiar type of survey is the political polls conducted by any one
of a number of organizations and designed to predict the outcome of
political election.
4. Data collected observationally, in the observation study the
researcher observes the experimental units in their natural setting and
records the variable(s) of interest. For example, a company
psychologist might observe and record the level of “Type A” behavior
of a sample of assembly line workers.
Regardless of the data collection method employed, it is likely
that the data will be a sample from some population. And
if we wish to apply inferential statistics, we must obtain a
representative sample.
A representative sample exhibits characteristics typical of
those possessed by the population of interest.

The most common way to satisfy the representative sample


requirement is to select a random sample. A random
sample ensures that every subset of fixed size in the
population has the same chance of being included in the
sample.
Frequency Distributions:
Definitions:
Raw Data:
It is the collected data which have not been organized numerically.
Ex: The masses of 10 students are:
50 kg, 55kg, 45 kg, 40 kg, 47 kg, 60 kg, 61 kg, 62 kg, 46 kg, 51 kg

Arrays:
An array is an arrangement of raw numerical data in ascending or descending
order of magnitude.

Range:
It is the difference between the largest and the smallest numbers of data.
Ex: The range of the data set (35, 50, 55, 45, and 30) is:
55- 30= 25
When we summarizing data into classes or categories to determine the
number of individuals belonging to each class is called the “class frequency”.

Frequency distribution or (Frequency Table):


Is a tabular arrangement of data by classes; for example:
Class Frequency Distribution (Class frequency)
60- 62 5
63- 65 18
66- 68 42
69- 71 27
72- 74 8

Total frequency 100


* Class Intervals and class Limits:
The class of 60- 62 is called the “class interval”. The end numbers 60 and 62 are called
“class limits”; the smaller number 60 is the “lower class limit” and the larger number 62 is
the “upper class limit”.
* Open Class Interval:
It is the class interval which has either no upper or no lower, class limit. For example, the
age of groups of individuals of “65 years and above” is an open class interval.
* Class mark:
It is the “midpoint” of the class interval and it is obtained by adding the lower and upper
class limits and dividing by 2.
Ex: the class mark of interval (60- 62) is
(60+ 62)/2= 61 it is the class mark or class midpoint.
* Histogram and Frequency Polygon:
The two graphical representations of frequency distributions, are:
1. The Histogram or (Frequency Histogram) consists of a set of rectangles having basses on
a horizontal axis (x-axis) with the class marks and lengths equal to the class frequency.
Frequency polygon is a line graph of class frequency plotted against class mark.
2. The Frequency Polygon is a line graph of class frequency plotted against class mark.
• Relative Frequency:
It is the frequency of the class divided by the total frequency of all classes and
generally expressed by a percentage.
Ex:
Class Interval Frequency Dist. Relative Frequency
(Class Frequency) Distribution

60- 62 5 5%
63- 65 18 18 %
66- 68 42 42 %
69- 71 27 27 %
72- 74 8 8%

100 Total Frequency 100 % Sum of Relative


Freq. Dist.
• Cumulative Frequency Distribution (OGIVES):
It is the total frequency of all values of a given class intervals.
Ex:
Class Interval Frequency Cumulative Freq. Dist. Relative
Distribution (OGIVES) Cumulative
Freq. Dist.
(Percentage
OGIVES)=
Cumulative Freq/
Total Freq
60- 62 5 5 5%
63- 65 18 23 23 %
66- 68 42 65 65 %
69- 71 27 92 92 %
72- 74 8 100 100 %
• Class Boundaries:
They are obtained by adding the upper limit of one class interval to the lower limit
of the next higher class interval and dividing by 2.
Ex:
For the class intervals 63- 65, 66- 68; the class boundaries are
62.5- 65.5, 65.5- 68.5
• Class Width or (Class Size):
It is the difference between the upper and lower class boundaries.
Ex: The class width or (class size) of the above classes are
65.5- 68.5= 3 and 68.5- 65.5= 3

• Mean, Median, Mode and Measures of Central Tendency:


An average is a value which is typical or representative of a set of data. Since such
typical value s tend to lie centrally within a set of data arranged according to
magnitude, averages are also called “measures of central tendency”.
Several types of averages could be defined, the most common being are:
* Arithmetic Mean or briefly the Mean, Median, Mode, Geometric Mean, and the
Harmonic Mean.
•Arithmetic Mean (Mean):
For a set of N numbers X1, X2, …, XN it is denoted by and defined by
Ex: The mean for the set of numbers 8, 3, 5, 12, 10 is

If the numbers X1, X2, …, Xk occur with frequency of f1, f2, …, fk the
arithmetic mean is:

where = total frequency

Ex:
The final examination in a course is weighted three times as much as a quiz,
and a student has a final examination grade of 85 and quiz grades 70, and
90, the mean grade is
Note
The algebraic sum of the of the deviations of a set of numbers from their
arithmetic mean is zero.
Ex:
The deviations of the numbers 8, 3, 5, 12, and 10 from their mean 7.6 are
8-7.6, 3-7.6, 5-7.6 , 12-7.6, 10-7.6 so that the algebraic sum is equal to:
0.4-4.6-2.6+4.4+2.4 = 0
• Median:
The median of a set of numbers arranged in order of magnitude (in an
array) is the “middle value” or the arithmetic mean of two middle values.
Ex:
The set of numbers 3, 4, 4, 5, 6, 8, 8, 10 has median 6.
Ex:
The set of numbers 5, 5, 7, 9, 11, 12, 15, and 18 has median
(9+11)/2= 10
• For grouped data, the median obtained by:

L1 = lower class boundary of the median class


N = number of items in the data (total frequency)
= sum of frequencies of all classes lower than the median class
f median = frequency of median class
c = size of median class interval
Ex:
Find the median length of the 40 laurel leaves of the following table:
Length Frequency
(mm)
118-126 3
127-135 5
136-144 9
145-153 12
154-162 5
163-171 4
172-180 2
Total 40
Solutions
Method 1:
By Interpolation
The median lies between the half total frequency (40/2 = 20)
The sum of the first three class frequencies is (3+5+9=17) , we require 3 to reach the desired 20, so
that the median lies in the class interval of 145-153
Which is the 4th class , the lower boundary of the 4th class is (144+145)/2=144.5, and the it’s upper
boundary = (153+154)/2=153.5
144.5+(3/12)(153.5-144.5)= 144.5+(3/12)(9)= 146.8 mm
Method 2
By using the formula

Its clear that the median lies in the 4th class interval since 3+5+9=17, while
3+5+9+12=29 and 40/2=20 half of total frequency, so that
L1 = 144.5 lower class boundary of median class
N = 40 # of items in the data
= sum of all classes lower than the median class = 3+5+9= 17
f median = frequency of median class = 12
c = size (width0 of median class interval = 153.5- 144.5= 9, so that
Median = 144.5 + [(20-17)/12](9)= 146.8 mm
Note: The Geometric Median in the Histogram is the value of X
corresponding to the vertical line which divides the histogram into two
parts having equal areas.
H.W: Find the Median age of the following (Ans. 45.1)

Age of Head of Family Number*106


(years) (Frequency
Under 25 2.22
25-29 4.05
30-34 5.08
35-44 10.45
45-54 9.47
55-64 6.63
65-74 4.16
75 and over 1.66
Total (N) 43.42
• Mode:
It is the value of a set of numbers which occurs with greatest frequency . The
mode may not exists, and if it does exist, it may not be unique.
Ex:
The set 2, 2, 5, 7, 9, 9, 9, 10, 10, 11, 12, 18 has mode equals 9.
The set 3, 5, 8, 10, 12, 15, 16 has no mode
The set 2, 3, 4, 4, 4, 5, 5, 7, 7, 7, 9 has two modes 4 and 7 and is called “Bimodal”.
A distribution having only one mode is called “Unimodal”.
So the mode is the number occurring most frequently.

Note:
In the case of grouped data when a frequency curve has been constructed to fit the data,
the Mode will be the value of X corresponding to the maximum point (or points) on the
curve.
• The Mode can be obtained for group data with frequency distribution or histogram by
the formula:

= Lower class boundary of modal class (class containing the mode)


= excess of modal frequency over frequency of next lower class
= excess of modal frequency over frequency of next higher class
= size of modal class interval
Ex: The following table shows a frequency distribution of monthly wages in pounds, find the modal
wages of 65 employees:
Wages ($) Number of Employees (f)
50-59.99 8
60-69.99 10
70-79.99 16
80-89.99 14
90-99.99 10
100-109.99 5
110-119.99 2
Total 65
Sol.
The class containing the mode (modal class) is the 3rd class (70-79.99)

Frequency of next lower class = 10


Frequency of next higher class = 14

C = size of modal class = 79.995- 69.995 = 10


So the
Mode = 69.995 +(6/(6+2))(10)= $ 77.5
• An Empirical Relation between Mean, Median, and Mode:
For Unimodal frequency curves which are moderately skewed (asymmetrical),
(which is either Right Skewed, or Left Skewed), the empirical relation:
Mean – Mode= 3(Mean – median)
Ex:
Use the empirical relation between Mean, Mode, and Median to find the modal wage of
65 employees of past result of modal.
Sol:
Since
Mean – mode = 3(Mean – Median)
Mode = Mean- 3(Mean – Median)

Mean = , where X = class mark

Class f Class Mark (X)


50- 59.99 8 54.995
60- 69.99 10 64.995
70- 79.99 16 74.995
80- 89.99 14 84.995
90- 99.99 10 94.995
100- 109.99 5 104.995
110- 119.99 2 114.995
Total 65
Mean = ((8)(54.995) + (10)(64.995) +…)/65 = 79.764

N/2 = 65/2 = 32.5


8+10+16=34
So the Median lies in the 3rd class interval
Where 8+10 = 18
L1 = 69.995, f median =16
Median = 69.995 + ((32.5-18)/16)(10) = 79.06
So the Mode = 79.764-3(79.764-79.06) = 77.652
The difference of the two results is
77.65- 77.5 = 0.15 so,
there is good agreement with the empirical result in this case.
Geometric Mean (G):
For a set of numbers X1, X2, …, XN , G is the Nth root of the product of the numbers:

Ex: For the numbers 2, 4, 8

• For group of data


with frequencies
where
The geometric mean

• Harmonic Mean (H)


For a set of N numbers X1, X2, …, XN the Harmonic mean is the reciprocal of the arithmetic mean

Ex: The Harmonic mean of 2, 4, 8 is

H.W: Find , G, H for 5, 10, 18.


•Root Mean Square (R.M.S):
The R.M.S or the “Quadratic Mean” of set of numbers X 1, X2, …, XN is defined by:

This average is used in physical applications

• Quantities: The equal subdivisions of data, such as:


-Divided by Two equal parts is denoted by “Median”
-Divided by Four equal parts, each part is called “Quartiles”
- Divided by Ten equal parts, each part is called “Deciles”
- Divided by one hundred equal parts, each part is called “Percentiles”

You might also like