Some Basic Concepts
Statistics
A collection of numerical information is called statistics.
A branch of Applied Mathematics dealing with the collection, presentation,
analysis, and interpretation of numerical data to assist in making more effective
decisions.
Why Study Statistics?
Students study statistics for several reasons:
1. Like professional people, you must be able to read and
understand the various statistical studies performed in your
fields. To have this understanding, you must be knowledgeable
about the vocabulary, symbols, concepts, and statistical
procedures used in these studies.
2. You may be called on to conduct research in your field, since
statistical procedures are basic to research. To accomplish this,
you must be able to design experiments; collect, organize,
analyze, and summarize data; and possibly make reliable
predictions or forecasts for future use. You must also be able to
communicate the results of the study in your own words.
3. You can also use the knowledge gained from studying statistics
to become better consumers and citizens. For example, you can
make intelligent decisions about what products to purchase
based on consumer studies, about government spending based
on utilization studies, and so on.
Main Branches of Statistics
The study of statistics is divided into two branches.
Descriptive statistics
Inferential statistics
Descriptive statistics
Descriptive statistics is the branch of statistics that summarizes, presents and analyze the
great bodies of statistical data for describing their salient feature. i.e.
Tabulation & Classification of raw data
Histogram
1
Frequency Polygon
Measures of central tendency and dispersion. i.e.(easy to understand data).
Inferential statistics
Set of methods used to make a generalization, estimation, prediction or decision. It is also
called inductive Statistics.
Inferential statistics allow us to make inferences about a population based on data
collected (sample).
This phase is based on probability theory as the inference which are made based
on sample observation, cannot be absolutely certain.
Population
A population is the totality or collection of all possible observations whether finite or
infinite relevant to some characteristic of interest e.g., the height of all college students,
the wages of all the employees of a steel mill in a particular year and the amount of
money spent by each student in GCUF in a given academic year. A population may be
finite or infinite. The number of observations in a finite population is called the size of
population
which is denoted by N.
Sample
A sample is a representative part of the population which is selected to obtain information
concerning the characteristics of the population. The number of observations in a sample
is called the size of sample which is denoted by n.
Variable
Any characteristic which varies either in quantity or in quality from one individual to the
other is called a variable. For example: the height of an individual, weight of a person,
family size, marital status, education level etc. The variables are usually represented by
last alphabets of the English language as X, Y, Z etc.
Quantitative variable
A characteristic which varies only in quantity from one individual to another is called
quantitative variable e.g., income of an individual person, age of the student etc.
Qualitative variable
2
A characteristic which varies only in quality from one individual to another is called
qualitative variable e.g., gender, marital status, eye color , beauty etc. It is also called
attribute. It is not possible to measure it numerically.
Discrete Variable
A variable that can take only specified values (whole numbers) is called discrete variable
e.g., number of rooms in a house, number of deaths in an accident etc. This variable
represents the count data.
Continuous Variable
A variable that can take on any value between a specified interval. A continuous variable
represents measurement data such as age of person, height of a plant, temperature at a
place, speed of a car etc.
The statistical data may have two types
i) Primary data
ii) Secondary data
Primary data
The data published or used by an organization which is originally collected by them are
called Primary data. Thus, the primary data are the firsthand information collected,
compiled and published by an organization for a certain purpose. The data in the
population census reports are primary because these are collected, compiled, and
published by Population Census Organization. The primary data are most original in
nature and have not undergone any statistical treatment.
Primary data is reliable (since we have obtained our self).
More time is required to collect Primary data.
Tends to be costly.
There may be considerable delay
Secondary Data
The data published or used by an organization other than the one which originally
collected are known as secondary data. The primary data become the secondary data
when they have been worked out to a certain extent. For example, the raw data when
arranged, classified, and presented in some suitable form like aggregates averages,
percentages etc, changes into the secondary. Distinction between the primary and
secondary data is only of stage. The data in the Economic Survey of Pakistan are
secondary because they are originally collected by the Federal Bureau of Statistics, the
State bank of Pakistan, the central board of revenue etc.
3
Relatively inexpensive.
Quick to obtain.
May or may not reliable.
Method for collection of Primary data
Direct personal Interviews
Indirect personal investigation
Investigation through questionnaire to be filled by trained Enumerators
Mailed Questionnaire
Registration (birth, deaths, marriages, divorces etc are registered with municipal
committee in urban areas and with the union councils in rural areas.
Collection of Secondary data
Official Publications (Federal bureau of stat. Population census, ministries of
health, food, agricultural, etc.
Semi Official source (publication of NBP,SBP, Wapda, District councils)
Private Source (Publication of trade association, chamber of commerce and
industry etc.)
Publications of research organizations
Data in books, journals, newspapers, magazines, etc.
Measurement Scales
Data comes in various sizes and shapes and it is important to know about these so that the
proper analysis can be used on the data. There are usually 4 scales of measurement that
must be considered:
i) Nominal Scale
It is the weakest of the four measurement scales. The nominal scale distinguishes one
object or event from another on the basis of a name.
o classification data, e.g. Male and Female, Defective and Non Defective
o no ordering, e.g. it makes no sense to state that Male > Female
o arbitrary labels, e.g., m/f, 0/1, etc
We frequently use arbitrary numbers instead of names to distinguish among objects or
events on the basis of some characteristics. For example, we may use 1 to denote
defective item 0 to denote non-defective item.
ii) Ordinal Scale
An ordinal scale is next up the list in terms of power of measurement. The simplest
ordinal scale is a ranking. When a market researcher asks you to rank 5 types of ice
cream from most flavorful to least flavorful, he/she is asking you to create an ordinal
scale of preference.
o ordered but differences between values are not important
o e.g., political parties on left to right spectrum given labels 0, 1, 2
o e.g. rank on a scale of 1...5 your degree of satisfaction
4
o e.g., restaurant ratings
Comparisons of greater and less can be made, in addition to equality and inequality.
However, operations such as conventional addition and subtraction are still meaningless
iii) Interval Scale
The standard survey rating scale is an interval scale. The numbers assigned to objects
have all the features of ordinal measurements, and in addition equal differences between
measurements represent equivalent intervals. That is, differences between arbitrary pairs
of measurements can be meaningfully compared. Operations such as addition and
subtraction are therefore meaningful. The zero point on the scale is arbitrary; negative
values can be used. Ratios between numbers on the scale are not meaningful, so
operations such as multiplication and division cannot be carried out directly. But ratios of
differences can be expressed; for example, one difference can be twice another.
o ordered, constant scale, but no natural zero
o differences make sense, but ratios do not (e.g., 30°-20°=20°-10°, but
20°/10° is not twice as hot! ,temperature (C,F), dates
Ratio Scale
A ratio scale is the top level of measurement and is not often available in social research.
The factor which clearly defines a ratio scale is that it has a true zero point. The numbers
assigned to objects have all the features of interval measurement and also have
meaningful ratios between arbitrary pairs of numbers. Operations such as multiplication
and division are therefore meaningful. The zero value on a ratio scale is non-arbitrary.
Variables measured at the ratio level are called ratio variables. Most physical quantities,
such as mass, length or energy are measured on ratio scales. Social variables of ratio
measure include age, length of residence in a given place, number of organizations
belonged to or number of church attendances in a particular time.
o ordered, constant scale, natural zero
o e.g., height, weight, age, length
Q. What is a natural zero?
Some scales of measurement have a natural zero and some do not. For example,
height, weight etc have a natural 0 at no height or no weight. Consequently, it
makes sense to say that 2m is twice as large as 1m. Both variables are measured
on ratio scale.
On the other hand, year and temperature (C) do not have a natural zero. The year
0 is arbitrary and it is not sensible to say that the year 2000 is twice as old as the
year 1000. Similarly, 0 C is arbitrary (why pick the freezing point of water?) and
5
it again does not make sense to say that 20 C is twice as hot as 10 C. Both
variables are interval scale.