0% found this document useful (0 votes)

4 views19 pages

Introduction to Biostatistics Basics

The document provides an introduction to biostatistics, emphasizing its role in the design, execution, and analysis of scientific experiments involving living organisms. It distinguishes between descriptive and inferential statistics, explaining how data is collected, analyzed, and interpreted to make informed decisions. Additionally, it covers various types of variables, data sources, and sampling methods used in statistical research.

Uploaded by

kipkemboi.korir2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views19 pages

Introduction to Biostatistics Basics

Uploaded by

kipkemboi.korir2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

INTRODUCTION TO BIOSTATISTICS

What does the word statistics bring to mind? To most people, it suggests
numerical facts or data, such as unemployment ﬁgures, farm prices, or the
number of marriages and divorces. Biostatistics deals with the design and
execution of scientific experiments on living creatures, the acquisition and
analysis of data from those experiments, and the interpretation and
presentation of the result of those analyses.

As such, Statistics is the art of learning from data. It is concerned with the
collection of data, their subsequent description, and their analysis, which often
leads to the drawing of conclusions.

Every day we make decisions that may be personal, business related, or of

some other kind. Usually these decisions are made under conditions of
uncertainty. Many times, the situations or problems we face in the real world
have no precise or definite solution. Statistical methods help us make scientific
and intelligent decisions in such situations. Decisions made by using statistical
methods are called educated guesses.
Decisions made without using statistical (or scientific) methods are pure
guesses and, hence, may prove to be unreliable. For example, opening a large
store in an area with or without assessing the need for it may affect its success.
Like almost all fields of study, statistics has two aspects: theoretical and
applied. Theoretical or mathematical statistics deals with the development,
derivation, and proof of statistical theorems, formulas, rules, and laws. Applied
statistics involves the applications of those theorems, formulas, rules, and laws
to solve real-world problems.

Types of Statistics
Broadly speaking, applied statistics can be divided into two areas: descriptive
statistics and inferential statistics.
Descriptive Statistics
Suppose we have information on the test scores of students enrolled in a
statistics class. In statistical terminology, the whole set of numbers that
represents the scores of students is called a data set, the name of each student
is called an element, and the score of each student is called an observation.

A data set in its original form is usually very large. Consequently, such a data
set is not very helpful in drawing conclusions or making decisions. It is easier
to draw conclusions from summary tables and diagrams than from the original
version of a data set. So, we reduce data to a manageable size by constructing
tables, drawing graphs, or calculating summary measures such as averages.
The portion of statistics that helps us do this type of statistical analysis is
called descriptive statistics.

Page 1 of 19
Descriptive statistics consists of methods for organizing, displaying, and
describing data by using tables, graphs, and summary measures.

Inferential Statistics
In statistics, the collection of all elements of interest is called a population.
The selection of a few elements from this population is called a sample.

A major portion of statistics deals with making decisions, inferences,

predictions, and forecasts about populations based on results obtained from
samples. For example, we may make some decisions about the political views of
all college and university students based on the political views of 1000
students selected from a few colleges and universities. As another example, we
may want to find the starting salary of a typical college graduate. To do so, we
may select 2000 recent college graduates, find their starting salaries, and make
a decision based on this information. The area of statistics that deals with such
decision-making procedures is referred to as inferential statistics. This
branch of statistics is also called inductive reasoning or inductive statistics.

Inferential statistics consists of methods that use sample results to help

make decisions or predictions about a population.

Basic Terms
In statistics, we are interested in obtaining information about a total collection
of elements, which we will refer to as the population. The population is often too
large for us to examine each of its members. For instance, we might have all
the residents of a given state, or all the television sets produced in the last year
by a particular manufacturer, or all the households in a given community. In
such cases, we try to learn about the population by choosing and then
examining subgroup of its elements. This subgroup of a population is called a
sample.

Deﬁnition: The total collection of all the elements that we are interested in is
called a population.
A subgroup of the population that will be studied in detail is called a sample.

The next gives information on the 2007 charitable givings (in millions of Kes) by
six retail companies. We can call this group of companies a sample of six
companies. Each company listed in this table is called an element or a
member of the sample. Table 1.1 contains information on six elements. Note
that elements are also called observational units.

Element or Member: An element or member of a sample or population is a

specific subject or object (for example, a person, firm, item, state, or country)
about which the information is collected.

Page 2 of 19
The 2007 charitable givings in our example is called a variable. The 2007
charitable givings is a characteristic of companies that we are investigating or
studying.

Charitable Giving of Six Retailers in 2007

Company 2007 Charitabe Givings variable
Millions (KeS)
Uchumi 42
Quickmart 35.6
An element 337.9 {An observation
or a member} Carrefuor or measurement
Naivas 31.8
Nakumatta 168.9
Tuskys 27.5

Variable
A variable is a characteristic under study that assumes different values for
different elements. In contrast to a variable, the value of a constant is fixed.
Other examples of variables are the incomes of households, the number of
houses built in a city per month during the past year, the makes of cars owned
by people, the gross profits of companies, and the number of insurance policies
sold by a salesperson per day during the past month. In general, a variable
assumes different values for different elements, as does the 2007 charitable
givings of the six companies in Table. For some elements in a data set,
however, the values of the variable may be the same.

For example, if we collect information on incomes of households, these

households are expected to have different incomes, although some of them may
have the same income. Each of the values representing the 2007 charitable
givings of the six companies is called an observation or measurement.

Definition
Observation or Measurement The value of a variable for an element is called
an observation or measurement.

From the table,the 2007 charitable givings of carrefour were KeS. 337.9
million. The value KeS.337.9 million is an observation or a measurement. Table
contains six observations, one for each of the six retail companies. The
information given in the table on 2007 charitable givings of companies is called
the data or a data set.

Page 3 of 19
Definition
Data Set A data set is a collection of observations on one or more variables.

Types of Variables
The variable is a characteristic under investigation that assumes different
values for different elements. The incomes of families, heights of persons, gross
sales of companies, prices of college textbooks, makes of cars owned by
families, number of accidents, and status (freshman, sophomore, junior, or
senior) of students enrolled at a university are examples of variables.

A variable may be classified as quantitative or qualitative.

Quantitative Variables
Some variables (such as the price of a home) can be measured numerically,
whereas others (such as hair color) cannot. The first is an example of a
quantitative variable and the second that of a qualitative variable.
Definition
Quantitative Variable A variable that can be measured numerically is called a
quantitative variable. The data collected on a quantitative variable are called
quantitative data. Incomes, heights, gross sales, prices of homes, number of
cars owned, and number of accidents are examples of quantitative variables
because each of them can be expressed numerically.
For instance, the income of a family may be KeS 81,520.75 per year, the gross
sales for a company may be KeS567 million for the past year, and so forth.
Such quantitative variables may be classified as either discrete variables or
continuous variables.

Discrete Variables
The values that a certain quantitative variable can assume may be countable or
noncountable. For example, we can count the number of cars owned by a
family, but we cannot count the height of a family member. A variable that
assumes countable values is called a discrete variable. Note that there are no
possible intermediate values between consecutive values of a discrete variable.

Definition
Discrete Variable A variable whose values are countable is called a discrete
variable. In other words, a discrete variable can assume only certain values
with no intermediate values.

For example, the number of cars sold on any day at a car dealership is a
discrete variable because the number of cars sold must be 0, 1, 2, 3,... and we
can count it. The number of cars sold cannot be between 0 and 1, or between 1
and 2. Other examples of discrete variables are the number of people visiting a

Page 4 of 19
bank on any day, the number of cars in a parking lot, the number of cattle
owned by a farmer, and the number of students in a class.

Continuous Variables
Some variables cannot be counted, and they can assume any numerical value
between two numbers. Such variables are called continuous variables.
Definition
Continuous Variable: A variable that can assume any numerical value over a
certain interval or intervals is called a continuous variable.

The time taken to complete an examination is an example of a continuous

variable because it can assume any value, let us say, between 30 and 60
minutes. The time taken may be 42.6 minutes, 42.67 minutes, or 42.674
minutes. (Theoretically, we can measure time as precisely as we want.)
Similarly, the height of a person can be measured to the tenth of an inch or to
the hundredth of an inch. However, neither time nor height can be counted in
a discrete fashion.

Other examples of continuous variables are weights of people, amount of soda

in a 12-ounce can (note that a can does not contain exactly 12 ounces of soda),
and yield of potatoes (in pounds) per acre. Note that any variable that involves
money is considered a continuous variable.

Qualitative or Categorical Variables

Variables that cannot be measured numerically but can be divided into
different categories are called qualitative or categorical variables.

Definition
Qualitative or Categorical: Variable A variable that cannot assume a
numerical value but can be classified into two or more nonnumeric categories
is called a qualitative or categorical variable. The data collected on such a
variable are called qualitative data.

For example, the status of an undergraduate college student is a qualitative

variable because a student can fall into any one of four categories: freshman,
sophomore, junior, or senior. Other examples of qualitative variables are the
gender of a person, the brand of a computer, the opinions of people, and the
make of a car.

Sources of Data
There are 2 sources for data collection namely Primary, and Secondary data
Primary data:- freshly collected ie for the first time. They are original in
character ie they are the first hand information collected, compiled and
published for some purpose. They haven’t undergone any statistical treatment

Page 5 of 19
Secondary Data:- Second hand information mainly obtained from published
sources such as statistical abstracts books encyclopedias periodicals, media
reports eg census report CD-roms and other electronic devices, internet. They
are not original in character and have undergone some statistical treatment at
least once.

Experimental methods are so called because in them the investigator in a

laboratory tests the hypothesis about the cause and effect relationship by
manipulating the independent variables under controlled conditions.
Non-Experimental methods are so called because in them the investigator does
not control or change any aspect of the situation under study but simply
describes what naturally occurs at a certain point or period of time.
Non-Experimental methods are widely used in social sciences. Some of the
Non-Experimental methods used for data collection are outlined below.
Field study:- aims at testing hypothesis in natural life situations. It differs
from field experiment in that the researcher does not control or manipulate the
independent variables but both of them are carried out in natural conditions
Merits:
(i) The method is realistic as it is carried out in natural conditions
(ii) It’s easy to obtain data with large number of variables.

Demerits
(iii) Independent variables are not manipulated.
(iv) Co-operation of the organization is often difficult to obtain.
(v) Data is likely to contain unknown sampling biasness.
(vi) The dross rate (proportion of irrelevant data) may be high in such studies.
(vii) Measurement is not precise as in laboratory because of influence of
confounding variables.
b) Census. A census is a study that obtains data from every member of a
population (totality of individuals /items pertaining to certain characteristics).
In most studies, a census is not practical, because of the cost and/or time
required.
c) Sample survey. A sample survey is a study that obtains data from a subset
of a population, in order to estimate population attributes/ characteristics.
Surveys of human populations and institutions are common in government,
health, social science and marketing research.
d) Case study –It’s a method of intensively exploring and analyzing the life of a
single social unit be it a family, person, an institution, cultural group or even
an entire community. In this method no attempt is made to exercise
experimental or statistical control and phenomena related to the unit are
Page 6 of 19
studied in natural. The researcher has several discretion in gathering
information from a variety of sources such as diaries, letters, autobiographies,
records in office, files or personal interviews.
Merits:
(i) The method is less expensive than other methods.
(ii) ) Very intensive in nature –aims at studying a few units rather than
several
(iii) Data collection is flexible since the researcher is free to approach the
problem from any angle
(iv) Data is collected from natural settings.

Demerits
(i) It lacks internal validity which is basic to scientific evidence.
(ii) Only one unit of the defined population is studied. Hence the findings of
case study cannot be used as a base for generalization about a large
population. They lack external validity.
(iii) Case studies are more time consuming than other methods.

e) Experiment. An experiment is a controlled study in which the researcher

attempts to understand cause-and-effect relationships. In experiments actual
experiment is carried out on certain individuals / units about whom
information is drawn. The study is "controlled" in the sense that the researcher
controls how subjects are assigned to groups and which treatments each group
receives.

f) Observational study. Like experiments, observational studies attempt to

understand cause-and-effect relationships. However, unlike experiments, the
researcher is not able to control how subjects are assigned to groups and/or
which treatments each group receives. Under this method information, is
sought by direct observation by the investigator.

Population and Sample

Population: The entire set of individuals about which findings of a survey refer
to.
Sample: A subset of population selected for a study.
Sample Design: The scheme by which items are chosen for the sample.
Sample unit: The element of the sample selected from the population.
Unit of analysis: Unit at which analysis will be done for inferring about the
population. Consider that you want to examine the effect of health care
facilities in a community on prenatal care. What is the unit of analysis: health
facility or the individual woman?

Page 7 of 19
Sampling Frames
For probability sampling, we must have a list of all the individuals (units) in
the population. This list or sampling frame is the basis for the selection process
of the sample. “A [sampling] frame is a clear and concise description of the
population under study, by virtue of which the population units can be
identified unambiguously and contacted, if desired, for the purpose of the
survey” - Hedayet and Sinha, 1991
Based on the sampling frame, the sampling design could also be classified as:
Individual Surveys if List of individuals is available or when the size of
population is small
Special population
Household Surveys; If it’s Based on the census of the households and if the
individual level information is unlikely to be available In practice, it’s limited to
small geographical areas and know as “area sampling frame” Example:
Demographic and Health Surveys (DHS)
Institutional Surveys If it’s Based on the census of say Hospital/clinic lists eg
i) 1990 National Hospital Discharge Survey
ii) National Ambulatory Medical Care Survey

Sampling
Sampling is a statistical process of selecting a representative sample. We have
probability sampling and non-probability sampling Probability Samples
involves a mathematical chance of selecting the respondent. Every unit in the
population has a chance, greater than zero, of being selected in the sample.
Thus producing unbiased estimates. They include;
(i) Simple random sampling
(ii) Systematic sampling
(iii) Stratified sampling
(iv) Cluster sampling
(v) multi-stage sampling

Non-probability sampling is any sampling method where some elements of

the population have no chance of selection (also referred to as “out of
coverage”/”undercovered”), or where the probability of selection can't be
accurately determined. It yields a non-random sample therefore making it
difficult to extrapolate from the sample to the population. They include;
Judgement sample, purposive sample, convenience sample: subjective Snow-
ball sampling: rare group/disease study

Sampling Procedure
Sampling involves two tasks
How to select the elements?
How to estimate the population characteristics – from the sampling units?

Page 8 of 19
We employ some randomization process for sample selection so that there is no
preferential treatment
in selection which may introduce selectivity bias

Reasons Behind sampling

(i) Cost; the sample can furnish data 0f sufficient accuracy at much lower cost.
(ii) Time; the sample provides information faster than census thus ensuring
timely decision making.
(iii) Accuracy; it is easier to control data collection errors in a sample survey as
opposed to census.
(iv) Risky or destructive test call for sample survey not census eg testing a new
drug.

Stratified Sampling
In stratified sampling the population is partitioned into groups, called strata,
and sampling is performed separately within each stratum.
This sampling technique is used when;
i) Population groups may have different values for the responses of interest.
ii) we want to improve our estimation for each group separately.
iii) To ensure adequate sample size for each group.

In stratified sampling designs:

i) Stratum variables are mutually exclusive (no over lapping), e.g., urban/rural
areas, economic categories, geographic regions, race, sex, etc. The principal
objective of stratification is to reduce sampling errors.
ii) The population (elements) should be homogenous within-stratum, and the
population (elements) should be heterogeneous between the strata.

Advantages
(i) Provides opportunity to study the stratum; variations - estimation could be
made for each stratum
(ii) Disproportionate sample may be selected from each stratum
(iii) The precision is likely to increase as variance may be smaller than simple
random case with same sample size
(iv) Field works can be organized using the strata (e.g., by geographical areas or
regions)
(v) Reduce survey costs.
Disadvantages
(i) Sampling frame is needed for each stratum
(ii) Analysis method is complex
(iii) Correct variance estimation
(iv) Data analysis should take sampling “weight” into account for
disproportionate sampling of strata
(v) Sample size estimation is difficult in practice

Page 9 of 19
Allocation of Stratified Sampling
The major task of stratified sampling design is the appropriate allocation of
samples to different strata.
Types of allocation methods:
(i) Equal allocation
(ii) Proportional to stratum size
(iii) Cost based sample allocation

c. Cluster Sampling
In many practical situations the population elements are grouped into a
number of clusters. A list of clusters can be constructed as the sampling frame
but a complete list of elements is often unavailable, or too expensive to
construct. In this case it is necessary to use cluster sampling where a random
sample of clusters is taken and some or all elements in the selected clusters
are observed. Cluster sampling is also preferable in terms of cost, because it is
much cheaper, easier and quicker to collect data from adjoining elements than
elements chosen at random. On the other hand, cluster sampling is less
informative and less efficient per elements in the sample, due to similarities of
elements within the same cluster. The loss of efficiency, however, can often be
compensated by increasing the overall sample size. Thus, in terms of unit cost,
the cluster sampling plan is efficient.

e. Multi-Stage Samples
Here the respondents are chosen through a process of defined stages. Eg
residents within Kibera
(Nairobi) may have been chosen for a survey through the following process:
Throughout the country (Kenya) the Nairobi may have been selected at random, (
stage 1), within
Nairobi, Langata (constituency) is selected again at random (stage 2), Kibera is
then selected within
Langata (stage 3), then polling stations from Kibera (stage 4) and then
individuals from the electoralvoters’ register (stage 5)! As demonstrated five
stages were gone through before the final selection of respondents were
selected from the electoral voters’ register.

Advantages of probability sample

(i) Provides a quantitative measure of the extent of variation due to random
effects
(ii) Provides data of known quality
(iii)Provides data in timely fashion
(iv) Provides acceptable data at minimum cost
(v) Better control over nonsampling sources of errors

Page 10 of 19
(vi) Mathematical statistics and probability can be applied to analyze and
interpret the data

Non-probability Sampling
Social research is often conducted in situations where a researcher cannot
select the kinds of probability samples used in large-scale social surveys. For
example, say you wanted to study homelessness – there is no list of homeless
individuals nor are you likely to create such a list. However, you need to get
some kind of a sample of respondents in order to conduct your research. To
gather such a sample, you would likely use some form of non-probability
sampling.
There are four primary types of non-probability sampling methods:

a)..Convinience Sampling
It’s a method of choosing subjects who are available or easy to find. This
method is also sometimes
referred to as haphazard, accidental, or availability sampling. The primary
advantage of the method is
that it is very easy to carry out, relative to other methods.

Demerit
 One can never be certain what population the participants in the study
represent. The population is
unknown.
 The method is haphazard, and the cases studied probably don't represent
any population you could come
up with. However, it’s very useful for pilot studies
Advantages of convenience sample
(i) It’s very easy to carry out with few rules governing how the sample should
be collected.
(ii) The relative cost and time required to carry out a convenience sample are
small in comparison to probability sampling techniques. This enables you to
achieve the sample size you want in a relatively fast and inexpensive way.
(iii) The convenience sample may help you gather useful data and information
that would not have been possible using probability sampling techniques, which
require more formal access to lists of populations [see, for example, the article
on simple random sampling].

For example, imagine you were interested in understanding more about

employee satisfaction in a single, large organisation in the United States. You
intended to collect your data using a questionnaire. The manager who has
kindly given you access to conduct your research is unable to get permission to
get a list of all employees in the organisation, which you would need to use a

Page 11 of 19
probability sampling technique such as simple random sampling or systematic
random sampling.
However, the manager has managed to secure permission for you to spend two
days in the organisation to collect as many questionnaire responses as
possible. You decide to spend the two days at the entrance of the organisation
where all employees have to pass through to get to their desks. Whilst a
probability sampling technique would have been preferred, the convenience
sample was the only sampling technique that you could use to collect data.
Irrespective of the disadvantages of convenience sampling, discussed below,
without the use of this sampling technique, you may not have been able to get
access to any data on employee satisfaction in the
organisation.

Disadvantages of convenience sampling

 The convenience sample often suffers from a number of biases. This can be
seen in both of our
examples, whether the 10,000 students we were studying, or the employees at
the large organisation. In
both cases, a convenience sample can lead to the under-representation or over-
representation of
particular groups within the sample. If we take the large organisation:

It may be that the organisation has multiple sites, with employee satisfaction
varying considerably between these sites. By conducting the survey at the
headquarters of the organisation, we may have missed the differences in
employee satisfaction amongst those at different sites, including non-office
workers. We also do not know why some employees agreed to take part in the
survey, whilst others did not. Was it because some employees were simply too
busy? Did they not trust the intentions of the survey? Did others take part out
of kindness or because they had a particular grievance with the
organisation? These types of biases are quite typical in convenience sampling.
 Since the sampling frame is not know, and the sample is not chosen at
random, the inherent bias in convenience sampling means that the sample is
unlikely to be representative of the population being studied. This undermines
your ability to make generalisations from your sample to the population you are
studying.
If you are an undergraduate or master’s level dissertation student considering
using convenience sampling, you may also want to read more about how to put
together your sampling strategy [see the section: Sampling Strategy

b)..Quota Sampling
Quota sampling is designed to overcome the most obvious flaw of availability
sampling. Rather than taking just anyone, you set quotas to ensure that the
sample you get represents certain characteristics in proportion to their

Page 12 of 19
prevalence in the population. Note that for this method, you have to know
something about the characteristics of the population ahead of time. Say you
want to make sure you have a sample proportional to the population in terms
of gender - you have to know what percentage of the population is male and
female, then collect sample until yours matches. Marketing studies are
particularly fond ofthis form of research design.
The primary problem with this form of sampling is that even when we know
that a quota sample is representative of the particular characteristics for which
quotas have been set, we have no way of knowing if sample is representative in
terms of any other characteristics. If we set quotas for gender and age, we are
likely to attain a sample with good representativeness on age and gender, but
one that may not be very representative in terms of income and education or
other factors.

Moreover, because researchers can set quotas for only a small fraction of the
characteristics relevant to a study quota sampling is really not much better
than availability sampling. To reiterate, you must know the characteristics of
the entire population to set quotas; otherwise there's not much point to setting
up quotas. Finally, interviewers often introduce bias when allowed to self-select
respondents, which is usually the case in this form of research. In choosing
males 18-25, interviewers are more likely to choose those that are better-
dressed, seem more approachable or less threatening. That may be
understandable from a practical point of view, but it introduces bias into
research findings.
Imagine that a researcher wants to understand more about the career goals of
students at a single university. Let’s say that the university has roughly
10,000 students. suppose we were interested in comparing the differences in
career goals between male and female students at the single university. If this
was the case, we would want to ensure that the sample we selected had a
proportional number of male and female students relative to the population.
To create a quota sample, there are three steps:
Choose the relevant grouping chsr and divide the population accordingly
gender
Calculate a quota (number of units that should be included in each for group
Continue to invite units until the quota for each group is met

Advantages of quota sampling

i) It particularly useful when you are unable to obtain a probability sample, but
you are still trying to create a sample that is as representative as possible of
the population being studied. In this respect, it is the non-probability based
equivalent of the stratified random sample.
ii) Unlike probability sampling techniques, especially stratified random
sampling, quota sampling is much quicker and easier to carry out because it

Page 13 of 19
does not require a sampling frame and the strict use of random sampling
techniques.
iii) The quota sample improves the representation of particular strata (groups)
within the population, as well as ensuring that these strata are not over-
represented. For example, it would ensure that we have sufficient male
students taking part in the research (60% of our sample size of 100; hence, 60
male students). It would also make sure we did not have more than 60 male
students, which would result in an over-representation of male students in our
research.
iv) It allows comparison of groups.

Disadvantages of quota sampling

i) In quota sampling, the sample has not been chosen using random selection,
which makes it impossible to determine the possible sampling error.
ii) this sampling bias. Thus nostatistical inferences from the sample to the
population. This can lead to problems of external validity.
iii) Also, with quota sampling it must be possible to clearly divide the
population into strata; that is, each unit from the population must only belong
to one stratum. In our example, this would be fairly simple, since our strata are
male and female students. Clearly, a student could only be classified as either
male or female. No student could fit into both categories (ignoring transgender
issues).

c)..Purposive Sampling
Purposive sampling is a sampling method in which elements are chosen based
on purpose of the study.
Purposive sampling may involve studying the entire population of some limited
group or a subset of a population. As with other non-probability sampling
methods, purposive sampling does not produce a sample that is representative
of a larger population, but it can be exactly what is needed in some cases -
study of organization, community, or some other clearly defined and relatively
limited group.

Advantages of purposive sampling

i) There are a wide range of qualitative research designs that researchers can
draw on. Achieving the goals of such qualitative research designs requires
different types of sampling strategy and sampling technique. One of the major
benefits of purposive sampling is the wide range of sampling techniques that
can be used across such qualitative research designs; purposive sampling
techniques that range from homogeneous sampling through to critical case
sampling, expert sampling, and more.
ii) Whilst the various purposive sampling techniques each have different goals,
they can provide researchers with the justification to make generalisations from

Page 14 of 19
the sample that is being studied, whether such generalisations are theoretical,
analytic and/or logical in nature. However, since each of these types of
purposive sampling differs in terms of the nature and ability to make
generalisations, you should read the articles on each of these purposive
sampling techniques to understand their relative advantages.
iii) Qualitative research designs can involve multiple phases, with each phase
building on the previous one. In such instances, different types of sampling
technique may be required at each phase.
Purposive sampling is useful in these instances because it provides a wide
range of non-probability sampling techniques for the researcher to draw on.
For example, critical case sampling may be used to investigate whether a
phenomenon is worth investigating further, before adopting an expert sampling
approach to examine specific issues further.

Disadvantages of purposive sampling

i) Purposive samples, irrespective of the type of purposive sampling used, can
be highly prone to researcher bias. The idea that a purposive sample has been
created based on the judgement of the researcher is not a good defence when it
comes to alleviating possible researcher biases,
ii) specially when compared with probability sampling techniques that are
designed to reduce such biases. However, this judgemental, subjective
component of purpose sampling is only a major disadvantage when such
judgements are ill-conceived or poorly considered; that is, where
judgements have not been based on clear criteria, whether a theoretical
framework, expert elicitation, or some other accepted criteria.
iii) The subjectivity and non-probability based nature of unit selection (i.e.
selecting people,
cases/organisations, etc.) in purposive sampling means that it can be difficult
to defend the representativeness of the sample. In other words, it can be
difficult to convince the reader that the judgement you used to select units to
study was appropriate. For this reason, it can also be difficult to convince the
reader that research using purposive sampling achieved
theoretical/analytic/logical
generalisation. After all, if different units had been selected, would the results
and any generalisations have been the same?

d)..Snowball Sampling
Snowball sampling is a method in which a researcher identifies one member of
some population of interest, speaks to him/her, and then asks that person to
identify others in the population that the researcher might speak to. This
person is then asked to refer the researcher to yet another person, and so on.
Snowball sampling is very good for cases where members of a special
population are difficult to locate.

Page 15 of 19
For example,.populations that are subject to social stigma and marginalisation,
such as suffers of
AIDS/HIV, as well as individuals engaged in illicit or illegal activities, including
prostitution and drug use. Snowball sampling is useful in such scenarios
because:
The method creates a sample with questionable representativeness. A
researcher is not sure who is in the sample. In effect snowball sampling often
leads the researcher into a realm he/she knows little about. It can be difficult
to determine how a sample compares to a larger population. Also, there's an
issue of who respondents refer you to - friends refer to friends, less likely to
refer to ones they don't like, fear, etc.
Snowball sampling is a useful choice of sampling strategy when the population
you are interested in studying is hidden or hard-to-reach.

Advantages of Snowball Sampling

(i) The chain referral process allows the researcher to reach populations that
are difficult to sample when using other sampling methods.
(ii) The process is cheap, simple and cost-efficient.
(iii) This sampling technique needs little planning and fewer workforce
compared to other sampling techniques.

Disadvantages of Snowball Sampling

(i) The researcher has little control over the sampling method. The subjects that
the researcher can obtain rely mainly on the previous subjects that were
observed.
(ii) Representativeness of the sample is not guaranteed. The researcher has no
idea of the true distribution of the population and of the sample.

(iii) Sampling bias is also a fear of researchers when using this sampling
technique. Initial subjects tend
to nominate people that they know well. Because of this, it is highly possible
that the subjects share
the same traits and characteristics, thus, it is possible that the sample that the
researcher will obtain
is only a small subgroup of the entire populatio

1.4.5 Limitations of Sampling

a) Sampling frame: may need complete enumeration
b) Errors of sampling may be high in small areas
c) May not be appropriate for the study objectives/questions
d) Representativeness may be vague, controversial

Page 16 of 19
1.4.6 Characteristics of Good sampling
A good sample should;
a) Meet the requirements of the study objectives
b) Provides reliable results
c) Clearly understandable
d) Manageable/realistic: could be implemented
e) Time consideration: reasonable and timely
f) Cost consideration: economical
g) Interpretation: accurate, representative
h) Acceptability

Sample Size Determination

Sample Size Determination is influenced factors like the purpose of the study,
population size, the risk of selecting a "bad" sample, and the allowable
sampling error.
There are several approaches to determining the sample size. These include
using a census for small populations, imitating a sample size of similar studies,
using published tables, and applying formulas to calculate a sample size.

Using a census for small populations

One approach is to use the entire population as the sample. It’s impractical for
large populations. A census eliminates sampling error and provides data on all
the individuals in the population. Finally, virtually the entire population would
have to be sampled in small populations to achieve a desirable level of
precision

Using a sample size of a similar study

Another approach is to use the same sample size as those of studies similar to
the one you plan. Without reviewing the procedures employed in these studies
you may run the risk of repeating errors that were
made in determining the sample size for another study. However, a review of
the literature in your
discipline can provide guidance about "typical" sample sizes which are used.

Using published tables

One can also rely on published tables which provide the sample size for a given
set of criteria. Yamane,
1967 Table 2.1 and Table 2.2 present sample sizes that would be necessary for
given combinations of precision, confidence levels, and variability.(search for
article on web)

NB, i) these sample sizes reflect the number of obtained responses, and not
necessarily the number of surveys mailed or interviews planned (this number is
often increased to compensate for non-response).

Page 17 of 19
Ii) the sample sizes in Table 2.2 presume that the attributes being measured
are distributed normally or nearly so. If this assumption cannot be met, then
the entire population may need to be surveyed.
RAW DATA
When data are collected, the information obtained from each member of a
population or sample is recorded in the sequence in which it becomes
available. This sequence of data recording is random and unranked. Such data,
before they are grouped or ranked, are called raw data.

Raw Data: Data recorded in the sequence in which they are collected and
before they are processed or ranked are called raw data.

Suppose we collect information on the ages (in years) of 50 students selected

from a university. The data values, in the order they are collected, are recorded
in Table 1 For instance, the first student’s age is 21, the second student’s age
is 19 (second number in the first row), and so forth. The data Table 1 are
quantitative raw data.

Table 1
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24

Suppose we ask the same 50 students about their student status. The
responses of the students are recorded in Table 2. In this table, F, SO, J, and
SE are the abbreviations for freshman, sophomore, junior, and senior,
respectively. This is an example of qualitative (or categorical) raw data.

Table 2
J F SO SE J J SE J J J
F F J F F F SE SO SE J
J F SE SO SO F J F SE SE
SO SE J SO SO J J SO F SO
SE SE F SE J SO F J SO SO

The data presented in Tables 1 and 2 are also called ungrouped data. An
ungrouped data set contains information on each member of a sample or
population individually.

Page 18 of 19
Organizing and Graphing quantitative Data

Page 19 of 19

Types of Statistics Explained
No ratings yet
Types of Statistics Explained
4 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
66 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
24 pages
Understanding Statistics and Its Applications
No ratings yet
Understanding Statistics and Its Applications
57 pages
Basic Concepts of Statistics Explained
100% (3)
Basic Concepts of Statistics Explained
21 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
63 pages
Unit I - Elementary Statistical Concepts
No ratings yet
Unit I - Elementary Statistical Concepts
16 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
16 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
4 pages
Engineering Data Analysis Overview
No ratings yet
Engineering Data Analysis Overview
64 pages
Understanding Statistics: Definitions & Methods
No ratings yet
Understanding Statistics: Definitions & Methods
8 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
33 pages
Population vs. Sample in Statistics
No ratings yet
Population vs. Sample in Statistics
5 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
69 pages
Introduction to Probability and Statistics
No ratings yet
Introduction to Probability and Statistics
45 pages
Understanding Statistics: Definitions & Types
No ratings yet
Understanding Statistics: Definitions & Types
8 pages
Understanding Statistics: Definition & Applications
No ratings yet
Understanding Statistics: Definition & Applications
57 pages
Introduction to Probability and Statistics
No ratings yet
Introduction to Probability and Statistics
67 pages
Module 6
No ratings yet
Module 6
54 pages
Statistics Concepts and Definitions
No ratings yet
Statistics Concepts and Definitions
151 pages
Basic Terminologies in Statistics
No ratings yet
Basic Terminologies in Statistics
3 pages
Business Statistics Overview by J.K. Sharma
No ratings yet
Business Statistics Overview by J.K. Sharma
9 pages
Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
86 pages
Survey Design for Data Collection
No ratings yet
Survey Design for Data Collection
39 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
32 pages
Introduction to Statistics for Managers
No ratings yet
Introduction to Statistics for Managers
30 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
11 pages
Introduction to Business Statistics
No ratings yet
Introduction to Business Statistics
26 pages
Quantitative Analysis in Business
No ratings yet
Quantitative Analysis in Business
57 pages
Understanding Statistics and Its Applications
No ratings yet
Understanding Statistics and Its Applications
4 pages
Understanding Economic Statistics Basics
No ratings yet
Understanding Economic Statistics Basics
18 pages
Lecture 15,16 Probability
No ratings yet
Lecture 15,16 Probability
52 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
19 pages
Introduction to Statistics Overview
100% (1)
Introduction to Statistics Overview
14 pages
Business Statistics Overview and Applications
No ratings yet
Business Statistics Overview and Applications
18 pages
Understanding Statistics and Its Applications
No ratings yet
Understanding Statistics and Its Applications
5 pages
Understanding Statistical Terminology
No ratings yet
Understanding Statistical Terminology
94 pages
Business Statistics Overview and Applications
No ratings yet
Business Statistics Overview and Applications
86 pages
Data Management in Statistics Explained
No ratings yet
Data Management in Statistics Explained
23 pages
Statistics Fundamentals and Applications
No ratings yet
Statistics Fundamentals and Applications
23 pages
Understanding Statistics and Data Types
No ratings yet
Understanding Statistics and Data Types
39 pages
Relative Frequency of Coin Toss Results
No ratings yet
Relative Frequency of Coin Toss Results
211 pages
Understanding Statistics and Analysis
No ratings yet
Understanding Statistics and Analysis
30 pages
Basic Statistical Concepts Overview
No ratings yet
Basic Statistical Concepts Overview
15 pages
Business Statistics: Key Concepts & Methods
No ratings yet
Business Statistics: Key Concepts & Methods
49 pages
SPSS Data Analysis: Statistics Basics
No ratings yet
SPSS Data Analysis: Statistics Basics
4 pages
Understanding Statistics and Its Applications
No ratings yet
Understanding Statistics and Its Applications
56 pages
Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
21 pages
Understanding Data Types and Statistics
No ratings yet
Understanding Data Types and Statistics
27 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
207 pages
Statistics Basics: Variables Explained
No ratings yet
Statistics Basics: Variables Explained
8 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
8 pages
Understanding Statistics: Definitions & Types
No ratings yet
Understanding Statistics: Definitions & Types
13 pages
Introduction to Data Collection Methods
No ratings yet
Introduction to Data Collection Methods
168 pages
Main Branches of Statistics Explained
100% (3)
Main Branches of Statistics Explained
3 pages
Introduction to Statistics Explained
No ratings yet
Introduction to Statistics Explained
4 pages
Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
65 pages
Family Planning Utilization in HIV Women
No ratings yet
Family Planning Utilization in HIV Women
55 pages
Quantity Surveyor Journal 2019 Edition
No ratings yet
Quantity Surveyor Journal 2019 Edition
14 pages
Biostatistics MCQs for Nursing Students
No ratings yet
Biostatistics MCQs for Nursing Students
12 pages
Cultural Awareness in Senior High School
No ratings yet
Cultural Awareness in Senior High School
6 pages
Sample Size Calculation for Proportions
No ratings yet
Sample Size Calculation for Proportions
3 pages
Understanding the t-Distribution
No ratings yet
Understanding the t-Distribution
27 pages
Legal Research Challenges in Tanzania
No ratings yet
Legal Research Challenges in Tanzania
4 pages
Essential Skills for Scientific Writing
No ratings yet
Essential Skills for Scientific Writing
28 pages
Sweet Potato Cupcakes: A Nutritious Twist
No ratings yet
Sweet Potato Cupcakes: A Nutritious Twist
24 pages
Understanding ANOVA and Hypothesis Testing
No ratings yet
Understanding ANOVA and Hypothesis Testing
13 pages
Confidence Intervals in Statistics
No ratings yet
Confidence Intervals in Statistics
37 pages
AI Tools Usage in Higher Education Study
No ratings yet
AI Tools Usage in Higher Education Study
9 pages
Socio-Demographic Study in Mendera Kochi
No ratings yet
Socio-Demographic Study in Mendera Kochi
53 pages
Final Exam
No ratings yet
Final Exam
17 pages
Newest Proposal
No ratings yet
Newest Proposal
38 pages
Defective Item Proportion Estimation
No ratings yet
Defective Item Proportion Estimation
3 pages
Chapter8 New
No ratings yet
Chapter8 New
30 pages
Understanding Sampling Methods and Distributions
No ratings yet
Understanding Sampling Methods and Distributions
26 pages
Z-Score, Hypothesis Testing, and Sample Size
No ratings yet
Z-Score, Hypothesis Testing, and Sample Size
6 pages
Hypothesis Testing Multiple Choice Problems
No ratings yet
Hypothesis Testing Multiple Choice Problems
8 pages
NFL Betting System Development Guide
No ratings yet
NFL Betting System Development Guide
19 pages
Marketing Research: Process & Importance
No ratings yet
Marketing Research: Process & Importance
38 pages
Engineering Probability and Statistics Syllabus
No ratings yet
Engineering Probability and Statistics Syllabus
14 pages
Monitoring Fatigue in Elite Athletes
No ratings yet
Monitoring Fatigue in Elite Athletes
9 pages
Training & Development at Unilever Nepal
No ratings yet
Training & Development at Unilever Nepal
22 pages
Residents' Support for Tourism in Algarve
No ratings yet
Residents' Support for Tourism in Algarve
9 pages
Normative Thinking in Shell-Bearing Sites
No ratings yet
Normative Thinking in Shell-Bearing Sites
51 pages
Raosoft Sample Size Calculator Guide
No ratings yet
Raosoft Sample Size Calculator Guide
2 pages
Abdurahman Hailom Final
No ratings yet
Abdurahman Hailom Final
56 pages
Sample Size Impact on GARCH Estimation
No ratings yet
Sample Size Impact on GARCH Estimation
4 pages

Introduction to Biostatistics Basics

Uploaded by

Introduction to Biostatistics Basics

Uploaded by

INTRODUCTION TO BIOSTATISTICS

Every day we make decisions that may be personal, business related, or of

A major portion of statistics deals with making decisions, inferences,

Inferential statistics consists of methods that use sample results to help

Element or Member: An element or member of a sample or population is a

Charitable Giving of Six Retailers in 2007

For example, if we collect information on incomes of households, these

A variable may be classified as quantitative or qualitative.

The time taken to complete an examination is an example of a continuous

Other examples of continuous variables are weights of people, amount of soda

Qualitative or Categorical Variables

For example, the status of an undergraduate college student is a qualitative

Experimental methods are so called because in them the investigator in a

e) Experiment. An experiment is a controlled study in which the researcher

f) Observational study. Like experiments, observational studies attempt to

Population and Sample

Non-probability sampling is any sampling method where some elements of

Reasons Behind sampling

In stratified sampling designs:

Advantages of probability sample

For example, imagine you were interested in understanding more about

Disadvantages of convenience sampling

Advantages of quota sampling

Disadvantages of quota sampling

Advantages of purposive sampling

Disadvantages of purposive sampling

Advantages of Snowball Sampling

Disadvantages of Snowball Sampling

1.4.5 Limitations of Sampling

Sample Size Determination

Using a census for small populations

Using a sample size of a similar study

Using published tables

Suppose we collect information on the ages (in years) of 50 students selected

You might also like