0% found this document useful (0 votes)

14 views37 pages

Statistical Methods for Environmental Research

Uploaded by

Criss Pizarro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views37 pages

Statistical Methods for Environmental Research

Uploaded by

Criss Pizarro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Environmental

Engineering
Research Workshop
Statistical Methods in research

Sarath Raj, PhD

I. What is Statistics?

Science of collecting, organizing, analyzing, and

interpreting data

It helps make informed decisions using data

Types

Descriptive Statistics:
Summarizes data (mean, median, charts).

Inferential Statistics:
Makes predictions or inferences from a sample to a population
Descriptive Statistics

Used generically in place of measures of central

tendency and dispersion for inferential statistics.

🟥 These statistics describe or summarize the qualities of data.

Another name is “summary statistics”, which are univariate.

Mean
Median
Mode
Range
Standard Deviation, etc.
Measures of Central
Tendency
These measures tap into the average distribution
of a set of scores or values in the data.

Mean
Median
Mode
The Mean

The “mean” of some data is the average score or

value, such as the average age of an MPA student or
average weight of professors that like to eat donuts.

Inferential mean of a sample: X=( X)/n

Mean of a population: =( X)/N
The Mean Problem
The main problem associated with the mean value of
some data is that it is sensitive to outliers.

The average weight of people might be

affected if there was one in the group that
weighed 600 pounds.
The Median

Because the mean average can be sensitive

to extreme values, the median is sometimes
useful and more accurate.

The median is simply the middle value

among some scores of a variable. (no
standard formula for its computation).
Percentiles

If we know the median, then we can go up or

down and rank the data as being above or
below certain thresholds.

You may be familiar with standardized tests.

90ᵗʰ percentile, your score was higher than
90% of the rest of the sample.
The Mode

The most frequent response or

value for a variable.

Multiple Modes

Bimodal
Multimodal
Measures of dispersion

Measures of dispersion tell us

about variability in the data.

How much do values differ for a variable from the min

to max, and distance among scores in between. We
use:
Range
Standard Deviation
Variance (standard deviation squared)
Measures of dispersion

To glean information from data, i.e. to make an inference,

we need to see variability in our variables.

Measures of dispersion give us information about how

much our variables vary from the mean, because if they
don’t it makes it difficult infer anything from the data.
Dispersion is also known as the spread or range of
variability.
The Range

r=h–l Where h is high and l is low

In other words, the range gives us the value

between the minimum and maximum values
of a variable.

Understanding this statistic is important in

understanding your data, especially for
management and diagnostic purposes.
Standard Deviation

A standardized measure of
distance from the mean.

In other words, it allows you to know how

far some cases are located from the mean.
How extreme our your data?

68% of cases fall within one standard

deviation from the mean, 97% for two
deviations.
Standard Deviation

X = score for each point in data

_
X = mean of scores for the variable

n = sample size (number of observations or

cases
Confidence Intervals

Gives a range of values that is likely to contain the true population

parameter (like the mean or proportion). It reflects the precision of
your estimate.

🟥 The level C of a confidence interval gives the probability that

the interval produced by the method employed includes the true
value of the parameter.

A study finds the average exam score of a sample of

students is 75, with a 95% confidence interval of [72, 78].

Interpretation: “We are 95% confident that the true

average score of all students lies between 72 and 78.
Inferential statistics
While descriptive statistics summarize the characteristics
of a data set, inferential statistics help you come to
conclusions and make predictions based on your data.

When you have collected data from a sample, you can use
inferential statistics to understand the larger population
from which the sample is taken.

Inferential statistics have two main uses:

✅ making estimates about populations.

✅ testing hypotheses to draw conclusions about

populations.
Statistical Significance

A result is called statistically significant if it is unlikely to have

occurred by chance. A “statistically significant difference” means
there is statistical evidence that there is a difference.

In simple cases, it is defined as the probability of making a decision to

reject the null hypothesis when the null hypothesis is actually true.

The decision is often made using the p-value: the p-value is the
probability of obtaining a value of the test statistic at least as extreme
as the one that was actually observed, given that the null hypothesis is
true.

if the p-value is less than the significance level, then the null hypothesis
is rejected. The smaller the p-value, the more significant the result is
said to be.
Degrees of freedom
Degrees of freedom, often represented by df, is the number of
independent pieces of information used to calculate a statistic.
It’s calculated as the sample size minus the number of
restrictions.

Simple Analogy
Imagine you have 3 test scores that must average to 70.
You pick the first score: 65
You pick the second score: 75
The third score? It’s fixed, it must be 70 to keep the average at 70.

✅ You were free to choose only 2 values.

🔒 The third one is not free—it’s constrained by the total.
So in this case, the degrees of freedom = 2.
Student’s t-test
A t-test compares the means of two independent groups to
determine if they are significantly different.

Type When to Use Example

Group A vs Group B
Independent t-test Compare two different groups
test scores

Compare the same group before Pre-test vs Post-test

Paired t-test
and after a treatment scores
Student’s t-test
Group A Group B
70 78
75 80
72 79
74 81
73 77
Paired t-test (Before and After)
Used to compare two measurements taken from the same
subject, before and after a treatment.

Difference
Student Before After
(D)
1 65 70 5
2 67 72 5
3 70 75 5
4 72 76 4
5 68 73 5
ANOVA
Analysis of variance (ANOVA) is a statistical test used to
assess the difference between the means of more than two
groups.
At its core, ANOVA allows you to simultaneously compare
arithmetic means across groups.
You can determine whether the differences observed are
due to random chance or if they reflect genuine, meaningful
differences.

Type When to Use Example

Uses one independent variable or
Group A vs Group B vs
One-way ANOVA factor
Group C test scores

Uses two independent variables

Previous groups with
Two-way ANOVA or factors
different species
ANOVA
Correlation & Regression

Is there a relationship between x and y?

What is the strength of this relationship

Pearson’s r

Can we describe this relationship and use this to

predict y from x?
Regression

Is the relationship we have described statistically

significant?
t test
The relationship between x and y

Correlation: is there a relationship between 2 variables?

Regression: how well a certain independent variable predict

dependent variable?

Correlation I Causation

In order to infer causality: manipulate independent variable

and observe effect on dependent variable
Scattergrams

Y Y Y
Y Y Y

X X X

Positive correlation Negative correlation No correlation

Variance vs Covariance

Notes on your sample:

If you’re wishing to assume that your sample is

representative of the general population (RANDOM
EFFECTS MODEL), use the degrees of freedom (n – 1) in
your calculations of variance or covariance.

But if you’re simply wanting to assess your current sample

(FIXED EFFECTS MODEL), substitute n for the degrees of
freedom.
Variance vs Covariance
Do two variables change together?

Covariance Variance

Gives information on the Gives information on

degree to which two variables variability of a single variable.
vary together

Note how similar the covariance is to variance: the equation

simply multiplies x’s error scores by y’s error scores as
opposed to squaring x’s error scores.
Covariance

When X and Y : cov (x,y) = pos.

When X and Y : cov (x,y) = neg.
When no constant relationship: cov (x,y) = 0
Example of Covariance

x y xi - x yi
- y ( x - x )( y - y )
i i
0 3 -3 0 0
2 2 -1 -1 1
3 4 0 1 0
4 0 1 -3 -3
6 6 3 3 9
y 3  7
x 3

What does this number tell us?

Problem with Covariance

The value obtained by covariance is dependent on the size of

the data’s standard deviations:

if large, the value will be greater than

if small…

even if the relationship between x and y is exactly the same in

the large versus small standard deviation datasets.
Solution: Pearson’s r

Covariance does not really tell us anything

Solution: standardise this measure

Pearson’s R: standardises the covariance value.

Divides the covariance by the multiplied standard

deviations of X and Y:
Regression

Correlation tells you if there is an association between

x and y but it doesn’t describe the relationship or allow
you to predict one variable from the other.

To do this we need REGRESSION!

Best - Fit Line

Aim of linear regression is to fit a straight line, ŷ = ax + b, to data

that gives best prediction of y for any value of x
This will be the line that
minimises distance between ŷ = ax + b
data and fitted line, i.e.
the residuals slope intercept

= ŷ, predicted value
= y i , true value
ε = residual error
General Linear Model

Linear regression is actually a form of the

General Linear Model where the parameters
are a, the slope of the line, and b, the
intercept.

y = ax + b +ε

A General Linear Model is just any model that

describes the data in terms of a straight line
Multiple Regression

Multiple regression is used to determine the effect of a number of

independent variables, x₁, x₂, x₃ etc, on a single dependent
variable, y

The different x variables are combined in a linear way and each

has its own regression coefficient:

y = a₁x₁+ a₂x₂ +…..+ anxn + b + ε

The a parameters reflect the independent contribution of each

independent variable, x, to the value of the dependent variable, y.

i.e. the amount of variance in y that is accounted for by each x

variable after all the other x variables have been accounted for
THANK YOU!
Sarath Raj, PhD

Inferential Statistics for Data Science
100% (1)
Inferential Statistics for Data Science
10 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
51 pages
Inferential Statistics for Environmental Data
No ratings yet
Inferential Statistics for Environmental Data
19 pages
Statistics Cheat Sheet Overview
100% (2)
Statistics Cheat Sheet Overview
2 pages
Descriptive Stats & Probability Basics
No ratings yet
Descriptive Stats & Probability Basics
19 pages
Statistical Analysis in Research
No ratings yet
Statistical Analysis in Research
53 pages
Understanding Inferential Statistics
No ratings yet
Understanding Inferential Statistics
3 pages
Essential Statistical Testing Methods
No ratings yet
Essential Statistical Testing Methods
29 pages
Understanding Biostatistics Concepts
No ratings yet
Understanding Biostatistics Concepts
14 pages
Intro To Stats Jan 25 2025 - Tagged
No ratings yet
Intro To Stats Jan 25 2025 - Tagged
46 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
59 pages
Overview of Statistical Concepts
No ratings yet
Overview of Statistical Concepts
6 pages
Understanding Statistics: Types & Measures
No ratings yet
Understanding Statistics: Types & Measures
9 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
6 pages
Data Analysis Techniques Overview
100% (1)
Data Analysis Techniques Overview
36 pages
Non-Parametric Correlation Measures
No ratings yet
Non-Parametric Correlation Measures
29 pages
Data Collection and Measurement Techniques
No ratings yet
Data Collection and Measurement Techniques
16 pages
Understanding Statistical Methods in Psychology
No ratings yet
Understanding Statistical Methods in Psychology
11 pages
Data Analysis: Descriptive & Inferential Stats
No ratings yet
Data Analysis: Descriptive & Inferential Stats
25 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
80 pages
AP Statistics Study Guide Overview
No ratings yet
AP Statistics Study Guide Overview
21 pages
Quantitative Research Statistical Tools
No ratings yet
Quantitative Research Statistical Tools
41 pages
Key Themes in Statistics: Chapters 1-5 Review
No ratings yet
Key Themes in Statistics: Chapters 1-5 Review
21 pages
Statistics Concepts and R Implementations
No ratings yet
Statistics Concepts and R Implementations
20 pages
Introduction to Statistics in Business
No ratings yet
Introduction to Statistics in Business
52 pages
Statistical Analysis Basics: Key Concepts
No ratings yet
Statistical Analysis Basics: Key Concepts
38 pages
Descriptive and Inferential Statistics Basics
No ratings yet
Descriptive and Inferential Statistics Basics
22 pages
Analysis of Variance (Anova) : Major Quiz
No ratings yet
Analysis of Variance (Anova) : Major Quiz
12 pages
Descriptive vs Inferential Statistics
100% (2)
Descriptive vs Inferential Statistics
44 pages
Data Analysis Planning Strategies
100% (2)
Data Analysis Planning Strategies
40 pages
Measures of Variability in Statistics
No ratings yet
Measures of Variability in Statistics
52 pages
Understanding Correlation and Covariance
No ratings yet
Understanding Correlation and Covariance
37 pages
Essential Guide to Data Analysis Techniques
No ratings yet
Essential Guide to Data Analysis Techniques
47 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
49 pages
Understanding Coefficient of Variation
100% (1)
Understanding Coefficient of Variation
20 pages
Descriptive vs Inferential Statistics
No ratings yet
Descriptive vs Inferential Statistics
33 pages
Quantitative Analysis Cheat Sheet
No ratings yet
Quantitative Analysis Cheat Sheet
1 page
Central Tendency & Dispersion Methods
No ratings yet
Central Tendency & Dispersion Methods
8 pages
Statics 2
No ratings yet
Statics 2
56 pages
Understanding Inferential Statistics
No ratings yet
Understanding Inferential Statistics
40 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
54 pages
AP Statistics Study Guide Overview
No ratings yet
AP Statistics Study Guide Overview
17 pages
Understanding Hypothesis Testing Basics
No ratings yet
Understanding Hypothesis Testing Basics
19 pages
Data Analysis Techniques in SPSS
No ratings yet
Data Analysis Techniques in SPSS
26 pages
Introduction to Applied Statistics
No ratings yet
Introduction to Applied Statistics
39 pages
Lesson 14 - Statistical Methods
No ratings yet
Lesson 14 - Statistical Methods
5 pages
Comparing Two Samples in Statistics
No ratings yet
Comparing Two Samples in Statistics
43 pages
Descriptive and Inferential Statistics Overview
No ratings yet
Descriptive and Inferential Statistics Overview
65 pages
Descriptive vs. Inferential Statistics
No ratings yet
Descriptive vs. Inferential Statistics
12 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
13 pages
Key Concepts in Psychological Statistics
No ratings yet
Key Concepts in Psychological Statistics
6 pages
Health Report
No ratings yet
Health Report
6 pages
Why We Need Statistics
No ratings yet
Why We Need Statistics
58 pages
Statistical Methods for Data Analysis
No ratings yet
Statistical Methods for Data Analysis
4 pages
Introduction to Statistics Lab Guide
100% (1)
Introduction to Statistics Lab Guide
75 pages
Data Analysis
No ratings yet
Data Analysis
66 pages
Statistical Concepts and Analysis Overview
No ratings yet
Statistical Concepts and Analysis Overview
9 pages
Business Analytics Cheat Sheet
No ratings yet
Business Analytics Cheat Sheet
16 pages
Loss Functions in Linear Regression
No ratings yet
Loss Functions in Linear Regression
23 pages
Maa137 Ten1 241129-1
No ratings yet
Maa137 Ten1 241129-1
8 pages
Quantile Regression Methods Overview
No ratings yet
Quantile Regression Methods Overview
35 pages
t-Test Excel Template Guide
No ratings yet
t-Test Excel Template Guide
4 pages
Monte Carlo Simulation with XLSTAT Guide
No ratings yet
Monte Carlo Simulation with XLSTAT Guide
8 pages
Walmart Stock Returns Analysis 2015-2021
No ratings yet
Walmart Stock Returns Analysis 2015-2021
11 pages
Sample Size Calculation Clinical Trials
No ratings yet
Sample Size Calculation Clinical Trials
11 pages
Syllabus BBA II - MGT25-B-SEC102 - Business Statistics
No ratings yet
Syllabus BBA II - MGT25-B-SEC102 - Business Statistics
2 pages
Multivariate Analysis of Mortality Data
No ratings yet
Multivariate Analysis of Mortality Data
12 pages
A Level Statistics Mock Test Guide
No ratings yet
A Level Statistics Mock Test Guide
16 pages
Online Matching and Combinatorial Bandits
No ratings yet
Online Matching and Combinatorial Bandits
50 pages
CE23B010
No ratings yet
CE23B010
11 pages
Capstone Project: AI Problem Solving Guide
No ratings yet
Capstone Project: AI Problem Solving Guide
6 pages
Lagged Regression in Time Series Analysis
No ratings yet
Lagged Regression in Time Series Analysis
17 pages
Stochastic Calculus Final Exam Solutions
100% (1)
Stochastic Calculus Final Exam Solutions
11 pages
Hidden Markov Models and Algorithms
No ratings yet
Hidden Markov Models and Algorithms
39 pages
Fast Bayesian Soil Stratification CPT
No ratings yet
Fast Bayesian Soil Stratification CPT
17 pages
Gni (Usd) : Regression Statistics
No ratings yet
Gni (Usd) : Regression Statistics
4 pages
Regression Analysis Insights
No ratings yet
Regression Analysis Insights
7 pages
Simple Regression Analysis Overview
No ratings yet
Simple Regression Analysis Overview
38 pages
ADF Test
No ratings yet
ADF Test
7 pages
Binomial and Poisson Probability Worksheet
No ratings yet
Binomial and Poisson Probability Worksheet
2 pages
Wine Sales Time Series Forecasting
No ratings yet
Wine Sales Time Series Forecasting
39 pages
Profile of Ilias Bilionis at Purdue
No ratings yet
Profile of Ilias Bilionis at Purdue
3 pages
SFM Final Exam Mock Test Overview
No ratings yet
SFM Final Exam Mock Test Overview
2 pages
Regression Analysis Overview
100% (1)
Regression Analysis Overview
21 pages
Probability Calculations in Class Activities
No ratings yet
Probability Calculations in Class Activities
12 pages
Quantitative Methods for Decision Making
No ratings yet
Quantitative Methods for Decision Making
4 pages
Heart Attack Prediction Model EDA
100% (1)
Heart Attack Prediction Model EDA
24 pages
The Analysis of Means (ANOM) : S N S N
No ratings yet
The Analysis of Means (ANOM) : S N S N
3 pages

Statistical Methods for Environmental Research

Uploaded by

Statistical Methods for Environmental Research

Uploaded by

Environmental

Sarath Raj, PhD

Science of collecting, organizing, analyzing, and

It helps make informed decisions using data

Used generically in place of measures of central

🟥 These statistics describe or summarize the qualities of data.

The “mean” of some data is the average score or

Inferential mean of a sample: X=( X)/n

The average weight of people might be

Because the mean average can be sensitive

The median is simply the middle value

If we know the median, then we can go up or

You may be familiar with standardized tests.

The most frequent response or

Measures of dispersion tell us

How much do values differ for a variable from the min

To glean information from data, i.e. to make an inference,

Measures of dispersion give us information about how

r=h–l Where h is high and l is low

In other words, the range gives us the value

Understanding this statistic is important in

In other words, it allows you to know how

68% of cases fall within one standard

X = score for each point in data

n = sample size (number of observations or

Gives a range of values that is likely to contain the true population

🟥 The level C of a confidence interval gives the probability that

A study finds the average exam score of a sample of

Interpretation: “We are 95% confident that the true

Inferential statistics have two main uses:

✅ making estimates about populations.

✅ testing hypotheses to draw conclusions about

A result is called statistically significant if it is unlikely to have

In simple cases, it is defined as the probability of making a decision to

✅ You were free to choose only 2 values.

Type When to Use Example

Compare the same group before Pre-test vs Post-test

Type When to Use Example

Uses two independent variables

Is there a relationship between x and y?

What is the strength of this relationship

Can we describe this relationship and use this to

Is the relationship we have described statistically

Correlation: is there a relationship between 2 variables?

Regression: how well a certain independent variable predict

In order to infer causality: manipulate independent variable

Positive correlation Negative correlation No correlation

Notes on your sample:

If you’re wishing to assume that your sample is

But if you’re simply wanting to assess your current sample

Gives information on the Gives information on

Note how similar the covariance is to variance: the equation

When X and Y : cov (x,y) = pos.

What does this number tell us?

The value obtained by covariance is dependent on the size of

if large, the value will be greater than

even if the relationship between x and y is exactly the same in

Covariance does not really tell us anything

Pearson’s R: standardises the covariance value.

Divides the covariance by the multiplied standard

Correlation tells you if there is an association between

To do this we need REGRESSION!

Aim of linear regression is to fit a straight line, ŷ = ax + b, to data

Linear regression is actually a form of the

A General Linear Model is just any model that

Multiple regression is used to determine the effect of a number of

The different x variables are combined in a linear way and each

y = a₁x₁+ a₂x₂ +…..+ anxn + b + ε

The a parameters reflect the independent contribution of each

i.e. the amount of variance in y that is accounted for by each x

You might also like