0% found this document useful (0 votes)
6 views18 pages

Chapter 1$2

Statistics is a branch of applied mathematics focused on the collection, organization, presentation, analysis, and interpretation of numerical data, classified into descriptive and inferential statistics. The statistical investigation involves five stages: data collection, organization, presentation, analysis, and inference. Additionally, the document outlines various methods of data collection, presentation techniques, and the definitions of key statistical terms and concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views18 pages

Chapter 1$2

Statistics is a branch of applied mathematics focused on the collection, organization, presentation, analysis, and interpretation of numerical data, classified into descriptive and inferential statistics. The statistical investigation involves five stages: data collection, organization, presentation, analysis, and inference. Additionally, the document outlines various methods of data collection, presentation techniques, and the definitions of key statistical terms and concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter One

Introduction

Definition and Classification of Statistics

Statistics is a branch of applied mathematics that deals with the collection, organization,
presentation, analysis and interpretation of numerical data.

Classification of Statistics

Statistics is broadly categorized into two categories based on how the collected data are used.

Classifications:
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics: is concerned with summary calculations, graphs, charts and tables.
2. Inferential Statistics: is a method used to generalize from a sample to a population.
For example, the average income of all families (the population) in Ethiopia can be estimated
from a few thousands (sample) families.
• It is important because statistical data usually arises from sample.
1.1.2 Stages in statistical investigation.
There are five stages or steps in any statistical investigation.

i. Collection of data: the process of measuring, gathering, assembling the raw data up on which
the statistical investigation is to be based.

 Data can be collected in a variety of ways, three of the most common methods are:
 Telephone survey
 Mailed questionnaire
 Personal interview.
ii. Organization of data:

Summarization of data in some meaningful way, e.g. table form

iii. Presentation of the data:

The process of re-organization, classification, compilation, and summarization of data to present


it in a meaningful form.

1
iv. Analysis of data: The process of extracting relevant information from the summarized data.

v. Inference (Interpretation) of data:


The interpretation and further observation of the various statistical measures through the analysis
of the data by implementing those methods by which conclusions are formed and inferences
made.

Definition of some terms

Population: the totality of all subjects under study.

Sample: a part/ portion of the population selected to draw conclusions about the population. It
should be selected using some pre-defined sampling technique in such a way that they represent
the po1pulation very well.

Parameter: Summary calculated from population.


Statistic: summary calculated from sample.
Census: A survey in which there is complete coverage.
Sample survey: A survey in which there is partial coverage.
Sampling: The process or method of sample selection from the population.
Sample size: The number of elements or observations to be included in the sample.
Variable: is a characteristic or attribute that can assume on many different numerical values.
There are two types of variables.
Qualitative Variables: are nonnumeric variables and can't be measured but can be placed in to
distinct categories, according to some characteristics or attributes.
Examples: gender, religious affiliation, political affiliation etc
Quantitative variables: are variables that can be quantified or can have numerical values.
Examples: height, income, temperature e t c.
Quantitative variables can be further classified as

 Discrete variables, and

 Continuous variables

a) Discrete variables are variables whose values are counts.


Examples: number of students, family size, Number of pages of a book.

2
b) Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.

Scales of Measurement

The type of variable classification depends on how variables are categorized, counted or
measured. This variable classification uses measurement scales and four common types of
measurement scales are used. Namely: nominal, ordinal, interval and ratio.
Nominal scale: the nominal level of measurement classifies data into mutually exclusive (non-
overlapping) exhausting categories in which no order or ranking can be imposed on the data.

Eg: Residence (rural, urban),


Political affiliation (Democrat, Republican)

Ordinal scale: the ordinal level of measurement classifies data into categories that can be
ranked, however, no precise differences between the ranks.

Examples: Letter grades (A, B, C, D and F)


Rating scale (poor, good, excellent)
Interval Scale: Interval scales are measurement systems that possess the properties of Order and
distance, but not the property of fixed zero.
Example: Temperature, IQ test scores of persons
 All arithmetic operations except division & multiplication are applicable.
Ratio Scale: the ratio level of measurement possesses all the characteristics of interval
measurement and there exists a true zero. In addition, true ratios exist when the same variable is
measured on two different members of the population.

Example: Height, weight, time, salary etc

Applications, uses and limitations of statistics

Applications of statistics:

• Almost all human beings in their daily life are subjected to obtaining numerical facts e.g.
about price.

3
• Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.

• In industries especially in quality control area.

Uses of statistics:

The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:

 It presents facts in a definite and precise form.

 Data reduction.

 Measuring the magnitude of variations in data.

 Furnishes a technique of comparison

 Estimating unknown population characteristics.

 Testing and formulating of hypothesis.

 Studying the relationship between two or more variable.

 Forecasting future events.

Limitations of statistics

As a science statistics has its own limitations. The following are some of the limitations:

• Deals with only quantitative information.

• Deals with only aggregate of facts and not with individual data items.

• Statistical data are only approximately and not mathematical correct.

• Statistics can be easily misused and therefore should be used be experts.

4
Chapter Two

Methods of data collection and presentation

Statistical Data

Statistical data are data collected by survey statisticians, scientists and researchers to be edited,
imputed, analyzed and/or used in the compilation and production of statistical findings and/or
theories.
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
Sources of Data

Primary source: Is a source of data that supplies firsthand information for the use of the
immediate purpose.
 Primary data: are data originally collected for the immediate purpose.
 Primary data are more expensive than secondary data.

The process of data collection from a primary source may be through:


 Field trials
 Laboratory experiments
 Surveys – census survey (personal interview, telephone interview and mailed questionnaire)
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.

 Usually they are published or unpublished materials, records, reports, e t c.

 Secondary data: data collected from a secondary source.

Methods of Data Presentation

Frequency Distributions

A frequency distribution is the organization of raw data in table form using classes and
frequencies.

5
Frequency: - is the number of times a certain value or set of values occurs in a specific group.

Generally, there are three basic types of frequency distributions: Categorical/qualitative,


Ungrouped and Grouped frequency distributions.

i. Categorical Frequency Distribution:

This distribution is used to data that can be placed in categories such as nominal or ordinal.

Example: A social worker collected data on marital status for 25 persons (M= married, S=single,
W=widowed, D=divorced)

M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Solution: there are four types of marital status M, S, D and W. These categories will be used as
classes for the frequency distribution. The following procedures shall be followed while
constructing categorical frequency distribution.

Step 1: Make a table as shown.

Class Frequency Percent


(1) (2) (3)
M
S
D
W

Step 2: Count the categories and place the result in column (2).
f
Step 3: Find the percentages of values in each class by using; %  * 100 where f=frequency
n
of the class and n= total no of observations.
-Percentages are not normally a part of frequency distribution but they can be added since they
are used in certain types diagrammatic such as pie charts.

Step 4: Find the total for column (2) and (3).

6
Combing the entire steps one can construct the following frequency distribution.

Class Frequency Percent


(1) (2) (3)
M 6 6/25*100=24
S 7 28
D 7 28
W 5 20
Total 25 100
ii. Ungrouped frequency distribution
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution
is often constructed for small set of data.
– Raw data: data collected in original form.
– Array: data arranged, in ascending or descending order.
– Class: different, non-overlapping groups of data.
Constructing an ungrouped frequency distribution
To construct an ungrouped frequency distribution,

 First find the smallest and the largest raw scores in the collected data and calculate the range.
For small range values UFD is appropriate.
 Arrange the data in order of magnitude

 Then construct the UFD by putting the classes along with corresponding frequencies.

Example: Given the following data construct an appropriate frequency distribution.


12 12 12 16 18
12 18 12 17 15
15 16 12 16 16
12 14 15 15 15
19 13 16 16 14

Solution:
STEP 1. Find the range of the data:
Range  Maximumobservation  Minimumobservation =19-12=7. Since the range of the data

(7) is small, classes consisting of a single data value can be used. They are 12, 13, 14, 15, 16, 17,
18 and 19.

7
STEP 2. Arrange the data 12, 12, 12, 12, 12, 12, 12, 13, 14, 14, 15, 15, 15, 15, 15, 16, 16, 16,
16, 16, 16, 17, 18, 18, 19
STEP 3. Then construct the UFD by putting the classes along with corresponding frequencies.
Finally,

Age Frequency Percent


12 7 28
13 1 4
14 2 8
15 5 20
16 6 24
17 1 4
18 2 8
19 1 4
25 100

iii. Grouped Frequency Distribution

When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.

Some Important Definitions


 Class limits: separate one class in a grouped frequency distribution from another.
o The limits could actually appear in the collected data and have gaps between the
upper limit of one class and the lower limit of the next class.
– Unit of measurement (U): the distance/ difference between any two values of the variable
being measured. It is usually taken as 1, 0.1, 0.01, 0.001 etc.
– Class boundaries: separate one class in a grouped frequency distribution from another.
o The boundaries have one more decimal place than the raw data and therefore do not
appear in the collected data.
o There is no gap between the upper boundary of one class and the lower boundary of
the next class.

8
o The lower class boundaries (LCBs) are found by subtracting 0.5 units of
measurement from the lower class limits (LCLs) and the upper class boundaries
(UCBs) are found by adding 0.5 units of measurement to the upper class limits
(UCLs). That is, LCBi=LCLi+ 1 2 U and UCBi =UCLi + 1
2 U
– Class width (W): the difference between the upper and lower boundaries of any class
o or the difference between two lower limits of consecutive classes,
o or the difference between upper limits of two consecutive classes.
o or the difference between two lower boundaries of consecutive classes,
o or the difference between upper boundaries of two consecutive classes.
N.B: Class width is not equal to the difference between UCL and LCL of the same
class.
– Class mark (M): the midpoint of a class interval.
UCLi  LCLi
i.e. M  or M  LCBi  UCBi
2 2
– Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
– Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.

Cumulative frequency distribution: A tabular arrangement of class intervals together with their
corresponding cumulative frequency (either less than or more than type; as defined above).
Relative frequency: the frequency of a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
frequencyof the class
Re lative frequencyof a class 
total frequency
Guidelines to construct a grouped frequency distribution

STEP 1. Determine the unit of measurement, U


STEP 2. Find the maximum (Max) and the minimum(Min) observation, and then compute their
range, R Range  Max  Min

9
STEP 3. Select the number of classes desired (K), usually between 5 and 20 or use Sturge’s

Formula: k  1 3.332log10 n where n is the total frequency and round this value of k up
to get an integer number when it turns to be fraction.
STEP 4. Find the class widths (W) by dividing the range by the number of classes and round the
number up to get an integer value. W R
K
STEP 5. Pick a suitable starting point less than or equal to the minimum value. This starting point
is the lower limit of the first class. Continue to add the class width to this lower limit to
get the rest of the lower limits.
STEP 6. Find the upper class limits. To find the upper class limit of the first class, subtract one unit
of measurement from the lower limit of the second class. Then continue to add the class
width to this upper limit so as to get the rest of the upper limits.

STEP 7. Compute the class boundaries as: LCBi  LCLi  12 U and UCBi  UCLi  12 U
STEP 8. Put the classes along with corresponding counts/ frequencies.
STEP 9. (If necessary) Find the cumulative frequencies (more than and less than types).
Example: The following data represent the record of high temperature in 0 F for each of the 50 US
states.
112 110 111 110 112 116 118 105 110 109
112 100 118 104 116 114 118 122 114 114
105 110 127 112 111 108 115 118 117 118
112 109 118 120 114 120 110 121 113 120
119 106 107 117 134 114 113 126 117 105

Construct a suitable frequency distribution.

STEP 1. Unit of measurement; U= 1


STEP 2. Max = 134, Min = 100 so that R = 134-100 = 34

STEP 3. k  1  3.332log10 50  6.644  7

STEP 4. Class width W  34  4.9  5


7
STEP 5. Let 100 (the smallest observation) be the Starting point or lower limit of the first class. And
hence the lower class limits become
100 105 110 115 120 125 130

10
STEP 6. Upper limit of the first class = 105-1 = 104. And hence the upper class limits become
104 109 114 119 124 129 134
STEP 7. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units
of measurement to the upper class limits, we can get lower and upper class boundaries as
follows. Hence, the LCBs are 99.5 104.5 109.5 114.5 119.5 124.5 129.5 and the UCBs
are 104.5 109.5 114.5 119.5 124.5 129.5 134.5

STEPS 8, 9 and 10 are displayed in the following table.

Class limits Class frequency Cumulative Cumulative


boundaries frequency frequency
(less than (more than
type) type)
100 – 104 99.5 – 104.5 2 2 50
105 – 109 104.5 – 109.5 8 10 48
110 – 114 109.5 – 114.5 18 28 40
115 – 119 114.5 – 119.5 13 41 22
120 – 124 119.5 – 124.5 7 48 9
125 – 129 124.5 – 129.5 1 49 2
130 – 134 129.5 – 134.5 1 50 1

Diagrammatic and/or graphical presentation of data:

The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically. There are techniques for presenting data in visual display.

Diagrams and graphs:

 have greater attraction than figures (numbers);

 facilitate comparison of data;

 are easily understandable by anyone who does have no statistical background

11
a. Diagrammatic presentation of data: Bar charts, pie-chart, pictogram, Stem and leaf plot

Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continuous types of data.

There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms. Stem and Leaf plot is also used some times.

i. Pie-charts
A pie-chart is a circle that is divided into sections according to the percentages of frequencies in
each category of the distribution. The angle of the sector of a class is obtained by multiplying the
ratio of the frequency of the class to the total frequency by 3600.
frequencyof the class
i.e. sec tor angleof a class   3600
total frequency
Note that pie-charts are usually used for depicting nominal level data.

Example: for the data given in the UFD below, construct a pie chart.
Class Frequency
M 6
S 7
D 7
W 5
Total 25

How to draw a pie-chart


- First find the percentages of each class
- Next calculate the degree measures for each class
- Finally, using a protractor, put each sector /degree measure/ in a circle and give a key for
explanation.
Class Frequency Percent Degree
M 6 6/25*100=24 86.4
S 7 28 100.8
D 7 28 100.8
W 5 20 72
Total 25 100 360

Now we can draw the pie-chart for the data.

12
Chart Title

5, 20% 6, 24%

7, 28%
7, 28%

M S D W

ii. Bar-diagrams or Bar-charts

 Bar-diagram is a series of equally spaced bars having equal width and the height of each
bar representing the magnitude or frequency of observations in each group.

 They are useful for comparing aggregate over time space.

 Bar-diagrams can be drawn either horizontally or vertically.

There are different types of bar charts. The most common being:

o Simple bar chart


o Component bar chart
o Multiple bar chart

a. Simple bar chart

– Are used to display data on one variable.


– They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
Product Sales ($) Sales ($) Sales ($)
In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54

13
The horizontal bar chart for sales in 1957 is then as follows

B. Component Bar Chart

When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a
variable with each aggregate broken into its component parts and different colors or designs are
used for identification.
Example: Represent the data given above using component bar-charts

c. Multiple bar-charts

Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.

Example: Use the same data given in the above example and depict it sing multiple bar charts.

14
b. Graphical Presentation of Data
Graphs are used to present continuous data. The three common graphic presentations of data are:
histogram, frequency polygon, and cumulative frequency polygon (ogive).
i. Histogram
A histogram is another way of data presentation which is more suitable for frequency
distributions with continuous classes.

In drawing a histogram, we put the class boundaries of each class on the horizontal axis and its
respective frequency on the vertical axis.

Example: Draw a histogram for the following data (temperature data of the 50 US states).

Class limits Class boundaries frequency [Link] (less than type) Cum. freq (more than type)
100 – 104 99.5 – 104.5 2 2 50
105 – 109 104.5 – 109.5 8 10 48
110 – 114 109.5 – 114.5 18 28 40
115 – 119 114.5 – 119.5 13 41 22
120 – 124 119.5 – 124.5 7 48 9
125 – 129 124.5 – 129.5 1 49 2
130 – 134 129.5 – 134.5 1 50 1
Solution:
Step 1 Draw and label the x and y axes. The x axis is always the horizontal axis, and the y axis is
always the vertical axis.
Step 2 Represent the frequency on the y axis and the class boundaries on the x axis.
Step 3 Using the frequencies as the heights, draw vertical bars for each class.

15
ii. Frequency Polygon

The frequency polygon is a graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are represented by the heights
of the points.
Example: Present the data in the previous example using a frequency polygon.

Solution
Step 1 Find the midpoints of each class. Recall that midpoints are found by adding the upper and
lower boundaries and dividing by 2:

For the first class CM  99.5  104.5  102 , same procedure follows for others and the CMs are
2
given in the table below.

Class limits Class boundaries Class mid points Frequency


100 – 104 99.5 – 104.5 102 2
105 – 109 104.5 – 109.5 107 8
110 – 114 109.5 – 114.5 112 18
115 – 119 114.5 – 119.5 117 13
120 – 124 119.5 – 124.5 122 7
125 – 129 124.5 – 129.5 127 1
130 – 134 129.5 – 134.5 132 1
Step 2 Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a
suitable scale on the y axis for the frequencies.
Step 3 Using the midpoints for the x values and the frequencies as the y values, plot the points.
Step 4 Connect adjacent points with line segments. Draw a line back to the x axis at the
beginning and end of the graph

16
iii. Cumulative Frequency Polygon (Ogive)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.

Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.

(i) Less than type cumulative frequency polygon

Construct an ogive (less than type) for the frequency distribution of the 50 US states.
Solution
Step 1 Find the cumulative frequency for each class.

Class boundaries frequency


Less than 99.5 0
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50
Step 2 Draw the x and y axes. Label the x axis with the class boundaries. Use an appropriate
scale for the y axis to represent the cumulative frequencies.
Step 3 Plot the cumulative frequency at each upper class boundary

17
Step 4 Starting with the first upper class boundary, 104.5, connect adjacent points with line
segments

18

You might also like