ES – 71
ENGINEERING DATA
ANALYSIS
ENGR. MARY CRIS L. AYING-TAMPOS
INSTRUCTOR
COURSE DESCRIPTION
This course is designed for undergraduate engineering students with emphasis
on problem solving related to societal issues that engineers and scientists are
called upon to solve. It introduces different methods of data collection and
the suitability of using a particular method for a given situation.
The relationship of probability to statistics is also discussed, providing students
with the tools they need to understand how “chance” plays a role in
statistical analysis. Probability distributions of random variables and their uses
are also considered, along with a discussion of linear functions of random
variables within the context of their application to data analysis and
inference. The course also includes estimation techniques for unknown
parameters; and hypothesis testing used in making inferences from sample to
population; inference for regression parameters and build models for
estimating means and predicting future values of key variables under study.
Finally, statistically based experimental design techniques and analysis of
outcomes of experiments are discussed with the aid of statistical software.
Number of Units for Lecture: 3 units lecture
Number of Contact Hours per Week: 3 hours per week
Class Schedule:
Prerequisites: Math 71
Course Outcomes:
After completing this course, the students must be able to:
1. Apply statistical methods in the analysis of data.
2. Design experiments involving several factors.
COURSE OUTLINE
1. OBTAINING DATA
1.1 Methods of Data Collection
1.2 Planning and Conducting Surveys
1.3 Planning and Conducting Experiments:
Introduction to Design of Experiments
2. PROBABILITY
2.1 Relationship among Events
2.2 Counting Rules Useful in Probability
2.3 Rules of Probability
3. Discrete Probability Distribution
3.1 Random Variables and their Probability Distribution
3.2 Cumulative Distribution Functions
3.3 Expected Values of Random Variables
3.4 Binomial Distribution
3.5 Poisson Distribution
4. Continuous Probability Distribution
4.1 Continuous Random Variables and their Probability
Distribution
4.2 Expected Values of Continuous Random Variables
4.3 Normal Distribution
4.4 Normal Approximation to the Binomial and Poisson
Distribution
4.5 Exponential Distribution
5. Joint Probability Distribution
5.1 Two or Random Variables
5.1.1 Joint Probability Distributions
5.1.2 Marginal Probability Distribution
5.1.3 Conditional Probability Distribution
5.1.4 More than Two Random Variables
5.2 Linear Functions of Random Variables
5.3 General Functions of Random Variables
6. Sampling Distributions and Point Estimation of Parameters
6.1 Point Estimation
6.2 Sampling Distribution and the Central Limit Theorem
6.3 General Concept of Point Estimation
6.3.1 Unbiased Estimator
6.3.2 Variance of a Point Estimator
6.3.3 Standard Error
6.3.4 Mean Squared Error of an Estimator
7. Statistical Intervals
7.1 Confidence Intervals: Simple Sample
7.2 Confidence Intervals: Multiple Samples
7.3 Prediction Intervals
7.4 Tolerance Intervals
8. Test of Hypothesis for a Single Sample
8.1 Hypothesis Testing
8.1.1 One-sided and Two-sided Hypothesis
8.1.2 P-value in Hypothesis Tests
8.1.3 General Procedure for Test of Hypothesis
8.2 Test on the Mean of a Normal Distribution, Variance
Known
8.3 Test on the Mean of a Normal Distribution, Variance
Unknown
8.4 Test on the Variance and Statistical Deviation of a Normal
Distribution
8.5 Test on a Population Proportion
9. Statistical Inference of Two Samples
9.1 Inference on the Difference in Means of Two Normal
Distributions, Variances Known
9.2 Inference on the Difference in Means of Two Normal
Distributions, Variances Unknown
9.3 Inference on the Variance of Two Normal
9.4 Inference on Two Population Proportions
10. Simple Linear Regression and Correlation
10.1 Empirical Models
10.2 Regression: Modelling Linear Relationships –
The Least-Squares Approach
10.3 Correlation: Estimating the Strength of Linear Relation
10.4 Hypothesis Tests in Simple Linear Regression
10.4.1 Use of t-tests
10.4.2 Analysis of Variance Approach to Test
Significance of Regression
10.5 Prediction of New Observations
10.6 Adequacy of the Regression Model
10.6.1 Residual Analysis
10.6.2 Coefficient of Determination
Course References
(1) Myers, R. et. al,. 2012., “Probability and Statistics for
Engineers and Scientist”. 9th Ed.
(2) Hayter, A., 2012., “Probability and Statistics for
Engineers and Scientist”., 4th Ed.
(3) Soong, T., 2004., “Fundamentals of Probability and
Statistics for Engineers”. 1st Ed.
Grading System
Attendance : 5%
Quizzes/Participation : 15%
Prelim Exam : 20%
Midterm Exam : 20%
Prefinal Exam : 20%
Final Exam : 20%
100%
Passing Rate : 50%
1. OBTAINING DATA
THE BEGINNINGS OF STATISTICS
❖ The History of statistics can be said to start around 1749
although, over time, there have been changes to the
interpretation of the word statistics. In early times, the
meaning was restricted to information about states. In
modern terms, “statistics” means both sets of collected
information, as in national accounts and temperature
records, and analytical work which requires statistical
inference.
DEFINITION
❖ Statistics is the sciences and art of dealing with figures
and facts.
❖ Statistics is well defined as the science that deals with
the collection, organization, presentation, analysis and
interpretation of data in order to be able to draw
judgements or conclusions that help in the decision-
making process.
Two Main Divisions of Statistics
❖ Descriptive Statistics deals with the procedures that
organize, summarize and describe quantitative data.
❖ Inferential Statistics deals with making a judgement or a
conclusion about a population based on the findings
from a sample that is taken from the population.
Statistical Terms
❖ Population or Universe refers to the totality of objects,
persons, places, things used in a particular study.
❖ Sample is any subset of population or few members of a
population.
❖ Data are facts, figures and information collected on some
characteristics of a population or sample.
❖ Parameter is the descriptive measure of a characteristic of
a population.
❖ Statistic is a measure of a characteristic of a sample.
❖ Constant is a characteristic or property of a population or
sample which is common to all members of the group.
❖ Variable is a measure or characteristic or property of a
population or sample that may have a number of
different values.
STATISTICAL DATA
❖ A sequence of observation, made on set of objects
included in the sample drawn from population is known
as statistical data.
❖ Data can be defined as the quantitative or qualitative
value of a variable (e.g. number, images, words, figures,
facts or ideas)
❖ It is a lowest unit of information from which other
measurements and analysis can be done.
❖ Data is one of the most important and vital aspect of
any research study.
DATA TYPES
DATA TYPES
Quantitative
Data
Qualitative
QUANTITATIVE
➢ are measures of values or counts and expressed as
numbers.
QUALITATIVE
➢ Defined as the data that approximates and
characterize.
➢ Non-numerical in nature.
➢ Collected through methods of observations, one-on-
one interviews and similar methods.
DATA TYPES
Continuous
Quantitative
Discrete
Data
Qualitative
CONTINUOUS DATA (VARIABLE)
➢ Data that can take the form of decimals or continuous
values of varying degrees of precision.
-Ex. Height, Weight
DISCRETE DATA (DISCONTINUOUS)
➢ Data whose value cannot take the form of decimals.
-Ex. Family Size, Enrolment Size
DATA TYPES
Continuous
Quantitative
Discrete
Data
Attribute
Qualitative
Open
ATTRIBUTE DATA
➢ Data that can be counted for recording and analysis.
-Ex. Size of T-Shirt: XS, M, L, XL, XXL
OPEN
➢ Data that is depending on the sample and not given a
specific value on a possible set of responses or answers.
DATA TYPES
Continuous
Quantitative
Discrete
Data
Nominal
Attribute
Qualitative Ordinal
Open
NOMINAL DATA
➢ Data defined by an operation which allows making
statements only equality or difference.
-Ex. Gender, Race, Religion, Political Affiliation
ORDINAL DATA
➢ Data defined affiliation operation whereby members of
a particular group are ranked.
-Ex. Awareness, IQ
Ungrouped (Raw) DATA
➢ Are data which are not organized in any specific way.
They are simply the collection of data as they are
gathered.
Grouped DATA
➢ Are raw data organized into groups or categories with
corresponding frequencies. Organized in this manner,
the data is referred to as frequency distribution.
1.1 METHODS OF DATA
COLLECTION
Data collection is the process of gathering and measuring
information on variables of interest, in an established systematic
fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes.
FACTORS TO BE CONSIDERED
BEFORE COLLECTION OF DATA:
❖ Objective and scope of the enquiry
❖ Sources of information
❖ Quantitative expression
❖ Techniques of data collection
❖ Unit of collection
CLASSIFICATION OF DATA
1. PRIMARY DATA
data which are collected as fresh
and for the first time and thus
happen to be original in character
are known as PRIMARY DATA.
2. SECONDARY DATA
data which have been collected by
someone else and which have already
been passed through the statistical
process.
METHODS OF DATA COLLECTION:
PRIMARY DATA
1. Observation
2. Interview
3. Questionnaire
4. Case Study
5. Survey
METHODS OF DATA COLLECTION: PRIMARY DATA
OBSERVATION
Observation method is a method
under which data from the field is collected
with the help of observation by the observer
or by personally going to the field.
ADVANTAGES DISADVANTAGES
Subjective bias Time consuming
eliminated
Current information Limited information
Independent to Unforeseen factors
respondent’s variable
TYPES OF OBSERVATION
STRUCTURED AND UNSTRUCTURED
1. Structured Observation
when observation is done by characterizing style of
recording the observed information, standardized
conditions of observation , definition of the units to be
observed , selection of pertinent data of observation.
Example: An auditor performing inventory analysis in store
2. Unstructured Observation
when observation is done without any thought before
observation.
Example: Observing children playing with new toys.
TYPES OF OBSERVATION
PARTICIPANT AND NON-PARTICIPANT
1. Participant
when the Observer is member of the group which he is
observing.
Advantages: 1. Observation of natural behavior
2. Closeness with the group
3. Better understanding
2. Non-participant
when observer is observing people without giving any
information to them.
Advantages: 1. Objectivity and neutrality
2. More willingness of the respondent
TYPES OF OBSERVATION
CONTROLLED AND UNCONTROLLED
1. Controlled
when the observation takes place in natural
condition. It is done to get spontaneous picture of
life and persons.
2. Uncontrolled
when observation takes place according to
definite pre arranged plans , with experimental
procedure then it is controlled observation
generally done in laboratory under controlled
condition.
METHODS OF DATA COLLECTION: PRIMARY DATA
INTERVIEW METHOD
INTERVIEW METHOD
This method of collecting data
involves presentation or oral-
verbal stimuli and reply in terms
of oral-verbal responses.
Interview Method is an oral verbal communication
where interviewer asks questions (which are aimed
to get information required for study) to
respondent.
TYPES OF INTERVIEW
• Personal interviews : The interviewer asks questions
generally in a face to face contact to the other person or
persons.
• Structured interviews : in this case, a set of pre- decided
questions are there.
• Unstructured interviews : in this case, we don’t follow a
system of pre-determined questions.
• Focused interviews : attention is focused on the given
experience of the respondent and its possible effects.
• Clinical interviews : concerned with broad underlying
feelings or motivations or with the course of individual’s life
experience, rather than with the effects of the specific
experience, as in the case of focused interview.
TYPES OF INTERVIEW
• Group interviews : a group of 6 to 8 individuals is
interviewed.
• Qualitative and quantitative interviews : divided on the basis
of subject matter i.e. whether qualitative or quantitative.
• Individual interviews : interviewer meets a single person and
interviews him.
• Selection interviews : done for the selection of people for
certain jobs.
• Depth interviews : it deliberately aims to elicit unconscious
as well as other types of material relating especially to
personality dynamics and motivations.
• Telephonic interviews : contacting samples on telephone.
METHODS OF DATA COLLECTION: PRIMARY DATA
QUESTIONNAIRE METHOD
QUESTIONNAIRE METHOD
This method of data collection is
quite popular, particularly in
case of big enquiries.
The questionnaire is mailed to respondents who are
expected to read and understand the questions
and write down the reply in the space meant for
the purpose in the questionnaire itself. The
respondents have to answer the questions on their
own.
METHODS OF DATA COLLECTION: PRIMARY DATA
QUESTIONNAIRE METHOD
ADVANTAGES DISADVANTAGES
Low cost even if the Low rate of return of duly filled
geographical area is too large questionnaire.
Answers are in respondents' word Slowest method of data
so free from bias. collection.
Adequate time to think for Difficult to know if the expected
answers. respondent have filled the form
or it is filled by someone else.
Non approachable respondents
may be conveniently contacted.
Large samples can be used so
results are more reliable.
METHODS OF DATA COLLECTION: PRIMARY DATA
CASE STUDY METHOD
CASE STUDY METHOD is essentially
an intensive investigation of the
particular unit under
consideration.
ADVANTAGES DISADVANTAGES
They are less costly and less They are subject to selection
time-consuming; they are bias
advantageous when exposure
data is expensive or hard to
obtain.
They are advantageous when They generally do not allow
studying dynamic populations in calculation of incidence
which follow-up is difficult. (absolute risk).
METHODS OF DATA COLLECTION: PRIMARY DATA
SURVEY METHOD
SURVEY METHOD is one of the
common methods of diagnosing
and solving of social problems is
that of undertaking surveys.
ADVANTAGES DISADVANTAGES
Relatively easy to administer Respondents may not feel
encouraged to provide
accurate, honest answers
Can be developed in less time Surveys with closed-ended
(compared to other data- questions may have a lower
collection methods) validity rate than other question
types.
Cost-effective, but cost Data errors due to question non-
depends on survey mode responses may exist.
SECONDARY DATA:
SOURCES OF DATA
• Publications of Central, state , local government
• Technical and trade journals
• Books, Magazines, Newspaper
• Reports & publications of industry ,bank, stock
exchange
• Reports by research scholars, Universities,
economist
• Public Records
FACTORS TO BE CONSIDERED BEFORE USING
SECONDARY DATA
• Reliability of data – Who, when , which
methods, at what time etc.
• Suitability of data – Objective ,scope, and
nature of original inquiry should be studied, as if
the study was with different objective, then that
data is not suitable for current study
• Adequacy of data– Level of accuracy,
• Area differences then data is not adequate
for study
SELECTION OF PROPER METHOD FOR
COLLECTION OF DATA
• Nature, Scope and objective of inquiry
• Availability of Funds
• Time Factor
• Precision Required
EXAMPLE
“Suppose we are interested to find the average
age of Engineering students. We collect the age’s
data by two methods; either by directly collecting
from each student himself personally or getting their
ages from the university record. The data collected
by the direct personal investigation is called primary
data and the data obtained from the university
record is called secondary data.”
1.2 PLANNING AND
CONDUCTING SURVEYS
A survey is a method of asking respondents some
well-constructed questions. It is an efficient way of
collecting information and easy to administer
wherein a wide variety of information can be
collected.
Surveys can take different forms. They can be used to ask only
one question, or they can ask a series of questions. We can use
surveys to test out people’s opinions or to test a hypothesis.
When designing a survey, the following steps are useful:
1. Determine the goal of your survey: What question do you
want to answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method: face-to-face interview,
phone interview, self-administered paper survey, or internet
survey.
4. Decide what questions you will ask in what
order, and how to phrase them. (This is important if
there is more than one piece of information you
are looking for.)
5. Conduct the interview and collect the
information.
6. Analyze the results by making graphs and
drawing conclusions.
In choosing the respondents, sampling techniques are
necessary. Sampling is the process of selecting units (e.g.,
people, organizations) from a population of interest. Sample
must be a representative of the target population. The target
population is the entire group a researcher is interested in;
the group about which the researcher wishes to draw
conclusions.
There are two ways of selecting a sample. These are the non-
probability sampling and the probability sampling.
Non-Probability Sampling
Non-probability sampling is also called judgment or
subjective sampling. This method is convenient and
economical but the inferences made based on the findings are
not so reliable.
Most common types of non-probability sampling:
In convenience sampling, the researcher use a device in
obtaining the information from the respondents which favors
the researcher but can cause bias to the respondents.
In purposive sampling, the selection of respondents is
predetermined according to the characteristics of interest
made by the researcher. Randomization is absent in this type of
sampling.
There are two types of quota sampling: proportional and non
proportional.
In proportional quota sampling the major characteristics of
the population by sampling a proportional amount of each is
represented.
Non-proportional quota sampling is a bit less restrictive. In
this method, a minimum number of sampled units in each
category is specified and not concerned with having numbers
that match the proportions in the population.
Probability Sampling
In probability sampling, every member of the population is given
an equal chance to be selected as a part of the sample. There are
several probability techniques.
Simple Random Sampling
Simple random sampling is the basic sampling technique where
a group of subjects (a sample) is selected for study from a larger
group (a population). Each individual is chosen entirely by chance
and each member of the population has an equal chance of being
included in the sample.
Stratified Sampling
A stratified sample is obtained by taking samples from each
stratum or sub-group of a population. When a sample is to be taken
from a population with several strata, the proportion of each stratum
in the sample should be the same as in the population.
Stratified sampling techniques are generally used when the
population is heterogeneous, or dissimilar, where certain
homogeneous, or similar, sub-populations can be isolated (strata).
Simple random sampling is most appropriate when the entire
population from which the sample taken is homogeneous. Some
reasons for using stratified sampling over simple random sampling
are:
1. The cost per observation in the survey may be reduced;
2. Estimates of the population parameters may be wanted for each
subpopulation;
3. Increased accuracy at given cost.
Cluster Sampling
Cluster sampling is a sampling technique where the entire
population is divided into groups, or clusters, and a random sample
of these clusters are selected.
DESIGNING A SURVEY
Example:
1. Martha wants to construct a survey that shows which
sports students at her school like to play the most.
Step 1: List the goal of the survey
Step 2: What population should she interview?
Step 3: How should she administer the survey?
Step 4: Create a data collection sheet that she can use
to record her results
DESIGNING A SURVEY
Step 1: GOAL
The goal of the survey is to find the answer to the question:
“Which sports do students at Martha’s school like to play the
most?”
Step 2: POPULATION
A sample of the population would include a random sample
of the student population in Martha’s school. A good
strategy would be to randomly select students (using dice or
a random number generator) as they walk into an all-school
assembly.
DESIGNING A SURVEY
Step 3: METHODS
Face-to-face interviews are a good choice in this case.
Interviews will be easy to conduct since the survey consists of
only one question which can be quickly answered and
recorded, and asking the question face to face will help
eliminate non-response bias.
Step 4: DATA
DESIGNING A SURVEY
Example:
1. Juan wants to construct a survey that shows how
many hours per week the average student at his school
works.
Step 1: List the goal of the survey
Step 2: What population should he interview?
Step 3: How should he administer the survey?
Step 4: Create a data collection sheet that he can use
to record her results
DESIGNING A SURVEY
Step 1: GOAL
The goal of the survey is to find the answer to the question “How
many hours per week do you work?”
Step 2: POPULATION
Juan suspects that older students might work more hours per
week than younger students. He decides that a stratified sample
of the student population would be appropriate in this case. The
strata are grade levels 9th through 12th. He would need to find
out what proportion of the students in his school are in each
grade level, and then include the same proportions in his
sample.
DESIGNING A SURVEY
Step 3: METHODS
Face-to-face interviews are a good choice in this case since
the survey consists of two short questions which can be
quickly answered and recorded.
Step 4: DATA
1.3 PLANNING AND CONDUCTING
EXPERIMENTS: INTRODUCTION TO
DESIGN OF EXPERIMENTS
The products and processes in the engineering and scientific
disciplines are mostly derived from experimentation. An experiment
is a series of tests conducted in a systematic manner to increase the
understanding of an existing process or to explore a new product or
process.
Design of Experiments, or DOE, is a tool to develop an
experimentation strategy that maximizes learning using minimum
resources. It is a technique needed to identify the “vital few”
factors in the most efficient manner and then directs the process to
its best setting to meet the ever-increasing demand for improved
quality and increased productivity.
1.3 PLANNING AND CONDUCTING
EXPERIMENTS: INTRODUCTION TO
DESIGN OF EXPERIMENTS
The methodology of DOE ensures that all factors and their
interactions are systematically investigated resulting to reliable and
complete information.
Five stages to be carried out for the design of experiments:
1. Planning
At this stage, identification of the objectives of conducting the
experiment or investigation, assessment of time and available
resources to achieve the objectives.
Experiments which are carefully planned always lead to increased
understanding of the product or process.
1.3 PLANNING AND CONDUCTING
EXPERIMENTS: INTRODUCTION TO
DESIGN OF EXPERIMENTS
2. Screening
Screening experiments are used to identify the important
factors that affect the process under investigation out of large pool
of potential factors.
3. Optimization
After narrowing down the important factors affecting the
process, then determine the best setting of these factors to achieve
the objectives of the investigation. The objectives may be either
increase yield or decrease variability or to find settings that achieve
both at the same time depending on the product or process under
investigation.
1.3 PLANNING AND CONDUCTING
EXPERIMENTS: INTRODUCTION TO
DESIGN OF EXPERIMENTS
4. Robustness Testing
Once the optimal settings of the factors have been
determined, it is important to make the product or process
insensitive to variations resulting from changes in factors that affect
the process but are beyond the control of the analyst. Such factors
are referred to as noise or uncontrollable factors that are likely to
be experienced in the application environment. It is important to
identify such sources of variation and take measures to ensure that
the product or process is made robust or insensitive to these
factors.
1.3 PLANNING AND CONDUCTING
EXPERIMENTS: INTRODUCTION TO
DESIGN OF EXPERIMENTS
5. Verification
The final stage involves validation of the optimum settings by
conducting a few follow-up experimental runs. This is to confirm
that the process functions as expected and all objectives are
achieved.
THE BASIS OF CONDUCTING AN
EXPERIMENT
1. With an experiment, the researcher is trying to learn
something new about the world, an explanation of 'why'
something happens.
2. The experiment must maintain internal and external
validity, or the results will be useless.
3. When designing an experiment, a researcher must
follow all of the steps of the scientific method, from
making sure that the hypothesis is valid and testable, to
using controls and statistical tests