REPORT: SIMPLE COMPARATIVE EXPERIMENTS
ANDRÉS MAURICIO ATEHORTÚA BENJUMEA
Alexander Correa Espinal
Faviana Gutiérrez Rôa
SUBJECT: Advanced Experimental Design - 3008475
NATIONAL UNIVERSITY OF COLOMBIA
MEDELLIN HEADQUARTERS
SCHOOL OF MINES
2011
TABLE OF CONTENTS
SUMMARY 3
ABSTRACT 3
INTRODUCTION 4
1. BASIC CONCEPTS 5
1.1 DEFINITIONS 5
1.2 GRAPHICAL DESCRIPTION OF VARIABILITY 6
2. INFERENCES ABOUT THE DIFFERENCES IN MEANS, DESIGNS
RANDOMIZED 6
2.1 HYPOTHESIS TESTING 7
2.1.1 EXPRESSIONS AND STATISTICS FOR HYPOTHESIS TESTING 8
2.2 VERIFICATION OF THE ASSUMPTIONS IN THE T TEST 10
3. INFERENCES ABOUT THE DIFFERENCES IN MEANS, DESIGNS OF
PAIRWISE COMPARISON 10
4. INFERENCES ABOUT THE VARIANCES OF NORMAL DISTRIBUTIONS 11
5. APPLIED EXAMPLE 11
CONCLUSIONS 14
RECOMMENDATIONS 15
REFERENCES 16
LIST OF CHARTS AND TABLES
Graph 2.1 Characteristic operational curves for bilateral tests with α=0.05___8
Table 2.1 Tests of Means with Known Variances______________________________9
Table 3.1 Tests of Variances of normal distributions_________________________11
Graph 5.1 Box plot of Expenditure C2F6_________________________________13
SUMMARY
This document studies the basic concepts and the necessary tools for
design and analyze simple comparative experiments, which are characterized by
to have a single study factor with two levels or treatments. For this, it starts
of the key concepts within Experimental Design, and then move on to the
development and analysis of hypothesis tests (in dependent samples or
independent), both for comparison of means and for comparison of
variances. Finally, it will be illustrated with a practical example some of the
concepts presented with the help of the statistical package MINITAB 15.
KEYWORDS: Experimental design, comparison experiments
simple, Minitab, hypothesis tests, levels or treatments, significance.
ABSTRACT
This paper reviews the basic concepts and tools needed to design and analyze.
simple comparative experiments, which are characterized by a single factor of
study with two levels or treatments. The paper begins with the key concepts in the
design of experiments, and then move to the development and the analysis of
hypothesis tests (dependent or independent samples), both for comparison of
means and for comparison of variances. Finally, a practical example will illustrate
some of the concepts presented using the statistical package MINITAB 15.
KEY WORDS: Design of experiments, simple comparative experiments, Minitab,
hypothesis testing, levels or treatments, significance.
INTRODUCTION
In everyday life, one can encounter countless situations in which
it is intended to improve a measure, or approach a specification, through
any industrial process, or of any kind, that is influenced by the
effect of variables or factors, whether they are controllable or not.
Under this condition, both the design and improvement of a process imply the
statistical analysis of these variables, aimed at optimizing development of the
process and its result, and it is in this regard that this document aims to provide a
review on how to develop an analysis of this nature.
Specifically, issues will be addressed here regarding the question to be determined.
it depends on a single factor, or at least only one is the object to be studied, the
which can be operated under two levels or treatments, where the objective is
determine if there is any difference between the means and/or variances of the two
levels.
SIMPLE COMPARATIVE EXPERIMENTS
1. BASIC CONCEPTS
1.1 Definitions
The following will state some important concepts within design
experiments that will later allow us to understand the entire development and analysis
from a simple comparative experiment (González, 2006).
Experiment: Deliberate change in the operating conditions of a process,
in order to measure the effect on one or more properties of the product.
Experimental unit: Basic objects or individuals on which data is obtained
a measurement or representative data:
A piece: In the study of the tensile strength of a certain component
- A lot or group of pieces: In the study of the proportion of defective items in a
operation
Variable response: Characteristic, output variable or property measured in
each unit, whose changes are desired to be studied.
Controllable factors: Process variables that can be set to a value or
operation level, since there is a mechanism to change its level. Also, it
they are called input variables, process conditions, or design variables. e.g.
Temperature, speed, pressure, concentration, application time, etc.
Uncontrollable or noise factors: Variables that cannot be controlled
during the normal operation of the process. E.g. Environmental variables, uses of a
product by the users, etc.
Studied factors: Variables whose effects on the response are investigated
in the experiment: They can be controllable or uncontrollable (possible to
control during the experiment). They need to be tested at least at two levels.
(this is the case study of this document). The factors can in principle
affect the mean or the variability of the response variable.
Levels: Values assigned to each study factor.
Random error: Observed variability that cannot be explained by the factors.
studied, it is due to 'common or random causes', which generate the
inherent variability of the process. It includes small effects of factors not
studied, variability of measurements made under the same conditions, and the
experimental error.
Experimental error: Component of the random error that represents the errors
of the experimenter in the planning and execution of the experiment. If the factor or
studied factors are influential on the response variable, it is expected that the
observed variability in this during the experimentation, be explained
significantly by such factors and to a lesser extent by random error, and
that this last one is really random.
1.2 Graphic description of variability
As initial help, it is very common to use graphical methods to analyze the
data from an experiment as they allow for a notion and an intuition of the
behavior of these. The following will describe some of the methods for
Those that are used in simple comparative experiments.
Dot diagram: Represents a set of data (up to around a few...
20 observations). The scatter plot allows the experimenter to see
immediate localization or central tendency of the observations and their dispersion.
Montgomery
Box diagram: This shows the minimum, the maximum, and the lower quartiles.
and upper, and the median in a horizontally aligned rectangular box or
vertically. The box extends from the lower quartile to the upper quartile and is drawn
a line through the median that crosses the box. Two lines (or whiskers) are drawn
that extend from the edges of the box to the minimum and values
maximum (Montgomery & Runger). These diagrams allow to see if it is possible
to intuit whether there is a difference or not between the means of the treatments, while providing
a sighting of the degree of symmetry of the data and whether there is variability or
similar dispersion.
Histogram: When there are many data points, it can be impossible to distinguish.
these in a dot diagram for which a histogram is used, in which
the central tendency, dispersion, and general shape of the
data distribution.
2. INFERENCES ABOUT THE DIFFERENCES IN MEANS, DESIGNS
Randomized
When one wants to assess the difference in a response variable between two
different conditions of a controllable factor within a process (known
As levels of the factor), one can fall into the mistake of comparing the means
obtained from the samples and conclude based on these estimates.
The difference between the means may suggest that the response variable among the
two samples differ by a non-trivial amount. However, there is no evidence
that the magnitude of this difference is sufficient to affirm that in
In reality, the response variable has a different magnitude with a level and with the
another. Perhaps this difference observed in the averages is the result of
sampling fluctuations and the two levels being identical in reality.
Possibly two other samples will produce the opposite result, where the level
that previously appeared below in relation to the response variable concerning the
another level, now be superior.
In order to avoid falling into the previous situation, a technique is used
the statistical inference known as hypothesis testing, which allows for the
comparison between two levels or treatments is objective, with the knowledge of
the risks associated if a wrong conclusion is reached.
2.1 Hypothesis Testing
The hypothesis test is an inference about the parameters of a
probability distribution or model parameters. The hypothesis reflects
a supposition about the problem situation. The hypothesis testing is
it can be stated as follows:
The statement H0it is called the null hypothesis H01it is called the hypothesis
alternative. During hypothesis testing, two types of errors can be made
errors. If the null hypothesis is rejected when it is true, then it occurs
a type I error and if the null hypothesis is not rejected when it is false, it occurs a
Type II error. The general procedure for testing hypotheses is to specify the
value of the probability for the occurrence of type I error (α), called level of
significance, and then design the testing procedure so that the probability
let the type II error (β) be small.
To carry out an appropriate statistical treatment of the observations it is
it is important to select the appropriate sample size. The selection of the size
of the sample and the probability of type II error (β) are closely related
related. For this, the sample size is selected with the help of the
characteristic operating curves, which are based on the assumption that the
the means of the samples are equal, where on the x-axis the parameter d is given,
the axis y has the value β, and crossing these coordinates the parameter is obtained
n* which allows to obtain the value of n with the following equations:
(2.1)
(2.2)
Graph 2.1 Characteristic operating curves for bilateral tests with α=0.05
Source: (Montgomery, 2005)
In many cases, it is useful to provide an interval in which the parameter or the
parameters that are studied are expected. These intervals are called
confidence intervals. In many experiments, it is known that the means of
the samples differ, then the hypothesis test with it is from
little interest and the experimenters are usually more interested in the
differences in averages.
2.1.1 Expressions and statistics for hypothesis testing
When performing a hypothesis test, it is necessary to calculate a test statistic.
which will be compared to a percentage of the statistical distribution from which it
assume that the statistic comes with the aim of forming a criterion for
reject or accept the null hypothesis.
For the comparison of means in simple comparison experiments, the
expressions of statisticians depend on the knowledge of variability, thus,
if the variability is known, a statistic from a normal distribution is used, in
otherwise, a statistic from a t distribution is calculated. Furthermore, if in addition to
the unknown variability, it is considered that the variability of the two
populations are not the same, the expression for the statistic must be modified.
The following is a summary of these expressions:
Table 2.1 Tests of Means with Known Variances
Source: (Montgomery, 2005)
Table 2.2 Tests of Means with Unknown Variances
Source: (Montgomery, 2005)
Where:
(2.3)
(2.4)
(2.5)
Clarification: In some of the previous expressions, the term appears
in this case, what is to be tested is whether the mean of a sample differs
or a default value.
2.2 Verification of the assumptions in the t test
To use the t-test procedure, the assumptions are established that
both samples are taken from independent populations that can be described
with a normal distribution, that the variances of both populations are equal,
and that the observations are independent random variables (Montgomery,
2005). If within the planning of an experiment it can be guaranteed a
randomization of the runs, including the random selection of different
units and experimental materials, the assumption of independence for the
general will be satisfied. While the assumptions of equal variances and of
Normality can be easily verified using a probability graph.
normal, in which if the points can be adjusted to a straight line, it can be said
that the data comes from a normal distribution. Also, if the lines of the
two samples have a very similar slope, the variance is assumed to be
constant.
3. INFERENCES ABOUT THE DIFFERENCES IN MEANS, DESIGNS
PAIRWISE COMPARISON
On many occasions, the interest focuses on determining whether or not there is
difference between the magnitudes of the measurements, at one level or another, of a
variable response in the same experimental material (To these experiments it
they are known as pairwise comparison design.
For these tests, the following statistic is used:
(3.1)
Where:
, Matched difference j(3.2)
(3.3)
4. INFERENCES ABOUT THE VARIANCES OF DISTRIBUTIONS
NORMAL
In many experiments, what is of interest is to know the degree of variability, since
is comparing the variance of a population with a reference variance , or
comparing it with the variance of another population. In this case, the tests are the
following:
Table 3.1 Variance Tests of Normal Distributions
Source: (Montgomery, 2005)
5. APPLIED EXAMPLE
To illustrate some of the concepts presented, exercise 2-12 will be solved.
Guide text:
In an article from Solid State Technology, "Orthogonal design for optimization of
processes and their application in plasma chemical etching by G.Z. Yin and D. W.
Jillie (May 1987), describes an experiment to determine the effect of
C expenditure2F6on the uniformity of selective corrosion (chemigraphing) in a
silicon wafer used in the manufacturing of integrated circuits. The data for
The expenses are as follows:
a) What is the flow velocity of C?2F6Does it affect the uniformity of the average engraving?
Use α=0.05
After entering the data in Minitab, the following outputs are obtained for
the analysis of the hypothesis test
Source GL SC MC F P
Factor 1 1,141 1,141 1,82 0,207
10 6,262 0,626
11 7,402
S = 0,7913 R-cuad. = 15,41% R-cuad.(ajustado) = 6,95%
Individual 95% confidence intervals for the mean
based on grouped [Link].
Level
125 6 3.3167 0.7600 (-----------*-----------)
200 6 3.9333 0.8214 (-----------*-----------)
-------+---------+---------+---------+--
3,00 3,60 4,20 4,80
Since the p-value is greater than α=0.05, the null hypothesis cannot be rejected and it
It concludes that statistically the means do not differ.
b) What is the P value for the test in section a? The value is P = 0.207
c) What is the flow rate of C?2F6it affects the variability from one wafer to another in the
uniformity of the engraving? Use α=0.05
Using the tests from Table 3.1:
It is concluded that the flow velocity of C2F6does not affect the variability of one
wafer to another in the uniformity of the engraving since Fois less than F0.05,5,5
d) Draw box diagrams that help interpret the data of this
experiment
Graph 5.1 Box diagram of Expenditure of C2F6
Source: Author's own work based on MINITAB 15
The box plot shows that there is a significant probability that the means
be equal since there is overlap between the boxes, which has already been stated in the
paragraph a, also shows a certain difference in variability, but from the point
previously it is already known that this difference is not statistically significant.
CONCLUSIONS
Simple comparative experiments can answer a great
number of research questions with a scientific methodology, and therefore
so objective, that it leads to very valid and acceptable conclusions (always and
when the design has been done properly
In many cases, choosing the appropriate test statistic is a step
subjective, since under the hands of an expert they can be considered variances
homogeneous or non-homogeneous.
Validating the assumption of normality can also be a subjective step, as it does not
it is always easy to determine whether the data fits a straight line in a
normal probability graph.
RECOMMENDATIONS
When designing an experiment, it is very important that beforehand you
get to know very well the process that is going to be investigated, in addition to having
expert advice on the topic, in order to take into account all the
considerations and variables that are relevant.
Statistical results cannot override technical knowledge of
problem, since even if statistically significant conclusions are reached, the
reality may be insignificant or impractical.
It is very important to train those in charge of data collection in the
experiments, since otherwise these people can be a great
source of variability, which would mask the results.
REFERENCES
Statistics for Experimenters
edition. John Wiley & Sons. United States of America.
González, N. (2006). Analysis of variance one factor fixed effects. University
National of Colombia
Montgomery, D.C (2005). Design and Analysis of Experiments. 5th edition. John
Wiley & Sons, Inc
Montgomery, D.C. & Runger G.C. Applied probability and statistics for
engineers. 2nd edition, Limusa-Wiley, Mexico.