100% found this document useful (1 vote)

14 views75 pages

Introduction to Statistics Lab Guide

This document provides an introduction to statistics. It defines key statistical concepts such as descriptive statistics, parameters, statistics, variables, and normal distributions. Descriptive statistics introduced include the mean, median, mode, variance, standard deviation, and graphical representations like histograms and boxplots. The document also discusses inferential statistics and statistical hypotheses. Common inferential statistical tests introduced are the t-test, which can be used to compare one or two samples, and ANOVA, which can be used to compare multiple samples. Key assumptions and concepts for inferential statistics like significance, p-values, and types of errors are also outlined.

Uploaded by

sarfaraz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

14 views75 pages

Introduction to Statistics Lab Guide

Uploaded by

sarfaraz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

IB 372

Lab 1: Introduction to
Statistics
Fall 2010

Thanks to Steve Paton - Smithsonian

Tropical Research Institute for providing
original spanish version of this file

What are statistics?

Statistics are numbers used to:
Describe and draw conclusions about
DATA
These are called descriptive (or
univariate) and inferential (or analytical)
statistics, respectively.

Statistic vs. Parameter

Formally (and somewhat confusingly):
A statistic is a measure of some attribute
of a sample.
Whereas, a parameter is the real and
unique measure of the attribute for the
whole population.
Usually, the population is too big to
measure, so in practice, statistics
represent parameters.
(thus, even descriptive stats are usually inferential too)

Variables
A variable is anything we can measure/observe
Three types:
Continuous: values span an uninterrupted range (e.g.
height)
Discrete: only certain fixed values are possible (e.g. counts)
Categorical: values are qualitatively assigned (e.g.
low/med/hi)

Dependence in variables:
Dependent variables depend on independent ones
Independent variable may or may not be prescribed
experimentally
Determining dependence is often not trivial!

Descriptive Statistics

Descriptive statistics
Techniques to summarize
data
Numerical
Mean
Variance
Standard
deviation
Standard error

Median
Mode
Skew
etc.

Graphical

Histogram
Boxplot
Scatterplot
etc.

The Mean:
Most important measure of central
tendency

Population Mean
N

i=1

The Mean:
Most important measure of central
tendency

Sample Mean
n

i=1

Additional central tendency

measures
M = X(n+1)/2(n is odd)
Median: the 50th percentile Xn/2 + X(n/2)+1
2
M=

(n is even)

Mode: the most common value

1, 1, 2, 4, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 9, 10
Which to use: mean, median or mode?

Variance:
Most important measure of dispersion

Population Variance

=
2

(X
)
i

Variance:
Most important measure of dispersion

Sample Variance

s =
2

(X
i
2
n
X)- 1

From now on, well ignore sample vs. population. But remember:
We are almost always interested in the population, but can measure only a

Additional dispersion
measures
Standard deviation:
average distance from the mean
Standard error:
the accuracy of our estimate of
the population mean

s = s2
(duh!)

s
SE =
n

Bigger sample
size (n) smaller
error

Range: total span of the data (Xmax -

Additional dispersion
measures
Coefficient of Determination:
Standard deviation scaled to
data
Example: 1.2
1.4
1.6
1.8
2.0
2.2
2.4

X = 1.8kg
s = 0.43kg vs.
V = 24%

1200
1400
1600
1800
2000
2200
2400

s
V=
X
X = 1800kg
s = 432kg
V = 24%

Graphical Statistics

The Friendly Histogram

Histograms represent the distribution
of data
They allow you to visualize the mean,
median, mode, variance, and skew at
once!

Constructing a Histogram is
Easy
Histogram of X

X (data)

Frequency
(count)

7.4
7.6
8.4
8.9
10.0
10.3
11.5
11.5
12.0
12.3

0
6

Value

Interpreting Histograms
Mean?
Median?
Mode?
Standard
deviation?
Variance?
Skew?
(which way does
the tail
point?)

Shape?

Frequency

0
0

Value

Interpreting Histograms
Mean?
= 9.2
Median? = 6.5
Mode? = 3
Standard
deviation? =
8.3
Variance?
Skew?
(which way does
the tail point?)

Shape?

Frequency

0
0

Value

An alternative:
Boxplots

Frequency

Value

Boxplots also
summarize a
lot of
information...

Within each sample:

Outliers

Weight (kg)

Compared across samples:

75% percentile
Median
25% percentile

6
5
4
3
2
1

Island

Normality

The Normal Distribution

aka Gaussian distribution
Occurs frequently in nature
Especially for measures that
are based on sums, such as:
sample means
body weight
error
(aka the Central Limit
Theorem)

Many statistics are based on

the assumption of normality
You must make sure your data
are normal, or try something
else!

Sample normal data:

Histogram + theoretical distribution
(i.e. sample vs. population)

Properties of the Normal

Distribution
Symmetric
Mean = Median = Mode

Theoretical percentiles can be computed

exactly
~68% of data are within 1 standard deviation of the
mean
>99% within 3 s.d.
skinny tails

>99%
~95%
~68%

Amazing!
Handy!
Important!

What if my data arent

Normal?

Its OK!
Although lots of data are Gaussian (because
of the CLT), many simply arent.
Example: Fire return intervals

Solutions:
Transform data to make
it normal (e.g. take
logs)
Use a test that doesnt
assume normal data

Time between fires (yr)

Dont worry, there are

plenty
Many stats
work OK as long as data are reasonably normal
Especially these days...

Break!

Inferential Statistics:
Introduction

Inference: the process by which we

draw conclusions about an
unknown based on evidence or
prior experience.
In statistics: make conclusions
about a population based on
samples taken from that
population.
Important: Your sample must
reflect the population youre
interested in, otherwise your
conclusions will be misleading!

Statistical Hypotheses
Should be related to a scientific hypothesis!
Very often presented in pairs:
Null Hypothesis (H0):
Usually the boring hypothesis of no difference

Alternative Hypothesis (HA)

Usually the interesting hypothesis of there is an effect

Statistical tests attempt to (mathematically)

reject the null hypothesis

Significance
Your sample will never match H 0
perfectly, even when H0 is in fact true
The question is whether your sample is
different enough from the expectation
under H0 to be considered significant
If your test finds a significant
difference, then you reject H0.

p-Values Measure
Significance
The p-value of a test is the probability of
observing data at least as extreme as your
sample, assuming H0 is true
If p is very small, it is unlikely that H0 is true
(in other words, if H0 were true, your observed sample would be
unlikely)

How small does p have to be?

Its up to you (depends on question)
0.05 is a common cutoff
If p<0.05, then there is less than 5% chance that you
would observe your sample if the null hypothesis was true.

Proof in statistics
Failing to reject (i.e. accepting) H 0 does
not prove that H0 is true!
And accepting HA doesnt prove that HA is
true either!
Why?
Statistical inference tries to draw
conclusions about the population from a
small sample
By chance, the samples may be misleading
Example: if you always accept H0 at p=0.05,
then 1 in 20 times you will be wrong!

Errors in Hypothesis Testing

Type I Error: Reject H0 when H0 is actually
true
i.e. You find a difference when really there is none
The probability of Type I error is called the
significance level of a test, and denoted

Type II Error: Accept H0 when H0 is actually

false
i.e. There really is a difference, but you conclude
there is none
The probability of Type II error is denoted
(and [1 is called the power of the test)

Assumptions of inferential
statistics
All inferential tests are based on
assumptions
If your data cannot meet the assumptions,
the test results are invalid!

In particular:
Inferential tests assume random sampling
Many tests assume the data fit a theoretical
distribution (often normal)
These are parametric tests
Luckily, there are non-parametric alternatives

Inferential Statistics:
Methods

Students t-Test

Students t-test
Several versions, all using inference on a
sample to test whether the true
population mean () is different from
__________
The one-sample version tests whether the
population mean equals a specified value, e.g.
H0: = 0

The two-sample version tests whether the

means of two populations are equal
H 0: 1 = 2

t-Test Methods
Compute t

x-
For one-sample test: t =
SE

Remember:

s
SE =
n

For two-sample test:

x1 x2
Sx1-x2

Sx1-x2 =

S12
Sn212 + n2

t-Test Methods
Once you have t, use a
look-up table or computer
to check for significance
Significance depends on
degrees of freedom
(basically, sample size)
Bigger difference in
means and bigger sample
size both improve ability
to reject H0

How does the t-Test work?

Statistical magic!
Student figured out the
t-distribution shown to
the left.
Given a sample mean and
df, we can see where the
mean is likely to be.

If the null-hypothesis
mean is very unlikely
to fall under the
curve, we reject H0

Reject H0 if your t is
in one of the red

= 0.05

Try it!
Work through the Excel
exercises for one- and twosample t-tests now

ANOVA

ANOVA: ANalysis Of
VAriance
Tests for significant effect of 1 or more factors
Each factor may have 2 or more levels
Can also test for interactions between factors
For just 1 factor with 2 levels, ANOVA = t-test
So why cant we just do lots of t-tests for more
complicated experiments?

Example: We study tree growth rates on clay

vs. sand vs. loam soil (10 trees each)
How many factors? How many levels for each
factor?

ANOVA: Concept
Despite the name, ANOVA really looks for
difference in means between groups
(factors & levels)
To do so, we partition the variability in our
data into:
(1) The variability that can be explained by
factors
(2) The leftover unexplained variability (error or
residual variability)

Total variability = Variability due to factors +

error
(we only have to calculate two of these values)

ANOVA example continued

Here are the raw data:

Square = clay so
Diamond = sand

Triangle = loam s
replicate = plot
y = growth

ANOVA example continued

First find total
variability using Sum
of Squares
Find overall mean
(horizontal line)
Each square is the
distance from one data
point to the mean,
squared
Total Sum of Squares
(SST) is the sum of all
the squared deviations

ANOVA example continued

Now measure
variability unexplained
by factor of interest
(soil)
Find means for each
level
Error Variability (SSE) is
the sum of all squared
deviations from these
level means
Which is greaterSSE
or SST?
The remaining variability is due to soil factor (say, SSF). Its easy
to compute, since
SST = SSE + SSF

ANOVA example continued

Next, we calculate degrees of freedom
(df )
df is based mainly on sample size
Every time you estimate a parameter from
your data, you lose one df
Example: Since we computed the mean of our
30 observations, we only need to know 29 of
the values now to determine the last one!

For our example we have:

dfSST = 30 1 = 29
dfSSE = (10 1) + (10 1) + (10 1) = 27
dfSSF = dfSST dfSSE = 2

ANOVA example continued

From SS and df, we compute Mean Square (MS) variability
Finally (!) we test whether the variability explained by our factor is significant,
relative to the remaining variability
The ratio MSsoil/MSerror is F
By statistical magic, we can look up the probability of observing such a large F just by chance.
In other words, we find the p-value associated with H0: Soil has no effect on growth

We can then go back and see which groups differ (e.g. by t-test)

Source

Soil
0.025
Error
Total

99.2

49.6

4.24

315.5
414.7

11.7
29

What do
we
conclude
?

Try it!
Work through the Excel exercise
for
ANOVA now

Chi-square ( ) Test
2

In biology, its common to measure

frequency (or count) data
2 is used to measure the deviation of
observed frequencies from an expected or
theoretical distribution:

(O
E)2
E

Where: O is the observed frequency (# of events,

etc.)
E is the expected frequency under H0
Should 2 be big or small to reject H0?

Example
2

Again well also need to know degrees

of freedom
For the 2 test,
df = number of groups 1

Then (again by statistical magic), we

can look up how big 2 needs to be (the
critical value) to reject H0 at a given
significance level

Example
2

Imagine we conduct an experiment to determine

the food preference of rodents, with the
following results:
Food
# Eaten
Tuna
Peanut butter
Fresh fish
Cheese
n=

31
69
35
65
200

A reasonable expectation (our null hypothesis) is:

H0 = Rodents like all foods equally well
Thus, under H0 our expected frequency for each food is:
200 / 4 = 50

Example
2

First, we draw up a contingency table:

tuna

fish

cheese

Observed

200

Expected

200

Then we compute 2:

=
2

(31 - 50)2
+
50

(69 - 50)2
+
50

(35 - 50)2
+50

(65 - 50)2
50

= 22.0

Example
2

In our
example:
2

= 22.0

and

df = 4 1 = 3

The critical value for df = 3, = 0.05 is

20.05,3

= 7.815

Since our 2 is greater than 2critical, we reject

H0. Our results differ significantly from the
expectation under H0, suggesting that there

Try it!
Work through the Excel exercise
for
the Chi-square test now

Correlation
Correlation measures the strength of the
relationship between two variables
When X gets larger, does Y consistently get
larger (or smaller)?

Often measured with Pearsons

correlation coefficient
Usually just called correlation coefficient
Almost always represented with the letter

Correlation

Computing Pearsons correlation coefficient

r=
y
2
2
x y

Amount that X
and Y vary
together
Total amount of
variability in X
and Y

-1 r 1

Correlation Examples

Correlation Cautions
Correlation does not imply
causality!
(note: doesnt matter which data are X
vs. Y)

r can be misleading
it implies nothing about slope
it is blind to outliers and obvious
nonlinear relationships
Same r in
each panel!

Try it!
Work through the Excel exercise
for
correlations now

Regression
Unlike correlation, regression does imply
a functional relationship between
variables
The dependent variable is a function of the
independent variable(s)

In regression, you propose an algebraic

model, then find the best fit to your
data
Most commonly, the model is a simple line (Y
is a linear function of X)

Regression
There are many possible relationships between two
variables...

Linear

Exponential

Quadratic

Hyperbolic

Logistic

Trigonometric

Regression
Well focus on simple linear regression of
one dependent (Y) and one independent
(X) variable
The model is:

Y = a + bX +

Y = values of the dependent variable

X = values of the independent variable
a, b = regression coefficients (what we wan
= residual or error

Potential Regression
Outcomes
Positive
Relationshi
p

b>0

Negative
Relationshi
p

b<0

No
Relationshi
p

b=0

In regression we always plot the independent variable on

How do we fit a
Regression?

Most common method is least-squares

Find a and b to minimize the (squared)
distances of data points from the(Xregression
4, Y4) .
line
^
(X3, Y3)
Y5Y5 =
.
.

(X5, Y5)

(X2, Y2)

(X6, Y6)

. (X1, Y1)

How do we fit a
Regression?
Find individual
residuals ():

Yi Yi =
Observed value

Residual

Predicted value
(from regression line)

Then the sum of all

(squared) residuals
is:

2
A computer or clever mathematician can find the a
i
and b that minimize this expression (producing
the

Regression Example
Altitude (m)
Temperature
(C)
0
25.0
50
24.1
190
23.5
305
21.2
456
20.6
501
20.0
615
18.5
700
17.2
825
17.0

Which is the independent variable?

Regression Example
From the data on the previous slides, we fit a
regression line and obtain these parameter
estimates:
a = 24.88

b = 0.0101
Thus, our best-fit line has the equation:
temperature = 24.88 0.0101 x altitude
Does this work if we change units?

Regression: How good is the

fit?

.
.
.
..

Perfect fit - all

the points on
the line

.
.
. .
. .
..

. .

Good fit

.
.

.
.
OK fit (Ill take
it!)

Regression: How good is the

fit?
Formally, we often measure the fit with
the coefficient of determination, R2.
R2 is the proportion of variation in Y
explained by the regression
Values range from 0 to 1
0 indicates no relationship, 1 indicates perfect
relationship
Note: In simple linear regression, yes, R2 is actually the
same as Pearsons correlation coefficient (r) squared. But
this is just a special caseregressions get much more
complicated. Dont get in the habit of confusing the two
statistics!

Regression: Is the fit

significant?
We usually ask whether R2 is significant by
testing the null hypothesis:
H0: b = 0 (the line may as well be flat)
against the alternative:
HA: b 0 (the best fit isnt a flat line)
Luckily (statistical magic), this can be
tested with the t-test!
Depends on degrees of freedom
(increase sample size to improve significance)

For our purposes, statistics software will do this

for you

Regression Reservations
Again, regression does imply causality
(unlike correlation), but importantly, it still
does not test a causal biological
relationship
Some other variable might affect both X and Y
You could even have the relationship
backwards!

Be careful extrapolating regression lines:

beyond the original data range, or
to other populations

Try it!
Work through the Excel exercise
for
simple linear regression now

Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
40 pages
Descriptive and Inferential Statistics Bowen
No ratings yet
Descriptive and Inferential Statistics Bowen
57 pages
Statistical Analysis Overview
No ratings yet
Statistical Analysis Overview
42 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
42 pages
Statistical Analysis: Descriptive & Inferential
No ratings yet
Statistical Analysis: Descriptive & Inferential
43 pages
Introduction to Statistics and Types
No ratings yet
Introduction to Statistics and Types
64 pages
Inferential Statistics for Data Science
100% (1)
Inferential Statistics for Data Science
10 pages
Understanding Population and Sample Statistics
No ratings yet
Understanding Population and Sample Statistics
28 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
4 pages
Intro To Stats Jan 25 2025 - Tagged
No ratings yet
Intro To Stats Jan 25 2025 - Tagged
46 pages
Inferential Statistics Overview Guide
No ratings yet
Inferential Statistics Overview Guide
58 pages
Research Methodology Workshop Overview
No ratings yet
Research Methodology Workshop Overview
72 pages
Statics 2
No ratings yet
Statics 2
56 pages
Introduction to Inferential Statistics
100% (6)
Introduction to Inferential Statistics
28 pages
Statistical Analysis: Key Concepts Explained
No ratings yet
Statistical Analysis: Key Concepts Explained
85 pages
Understanding Statistical Inference
No ratings yet
Understanding Statistical Inference
107 pages
Understanding Descriptive and Inferential Statistics
No ratings yet
Understanding Descriptive and Inferential Statistics
69 pages
Descriptive and Inferential Statistics Guide
No ratings yet
Descriptive and Inferential Statistics Guide
5 pages
Data Analysis Cheat Sheet 1
No ratings yet
Data Analysis Cheat Sheet 1
4 pages
HCI Data Analysis & Visualization
No ratings yet
HCI Data Analysis & Visualization
52 pages
Understanding Inferential Statistics
No ratings yet
Understanding Inferential Statistics
48 pages
Data Analytics: Statistical Methods Overview
No ratings yet
Data Analytics: Statistical Methods Overview
38 pages
Sampling Distribution & Hypothesis Testing
No ratings yet
Sampling Distribution & Hypothesis Testing
31 pages
WEEK15
No ratings yet
WEEK15
39 pages
Centrality and Spread in Statistics
No ratings yet
Centrality and Spread in Statistics
62 pages
Overview of Statistical Concepts
No ratings yet
Overview of Statistical Concepts
6 pages
Advanced Educational Statistics Overview
No ratings yet
Advanced Educational Statistics Overview
5 pages
Introduction to Statistics and Analysis
No ratings yet
Introduction to Statistics and Analysis
125 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
50 pages
Pharmacoeconomics Statistical Methods Guide
No ratings yet
Pharmacoeconomics Statistical Methods Guide
51 pages
Statistical Analysis and Hypothesis Testing Guide
No ratings yet
Statistical Analysis and Hypothesis Testing Guide
15 pages
Introduction to Inferential Statistics
No ratings yet
Introduction to Inferential Statistics
34 pages
Statistical Cutoffs and Distributions
No ratings yet
Statistical Cutoffs and Distributions
91 pages
Introduction to Hypothesis Testing in R
No ratings yet
Introduction to Hypothesis Testing in R
47 pages
Descriptive vs Inferential Statistics Guide
No ratings yet
Descriptive vs Inferential Statistics Guide
30 pages
Essential Statistics Guide for Analysts
No ratings yet
Essential Statistics Guide for Analysts
9 pages
Module 3
No ratings yet
Module 3
21 pages
Data Science Statistics Overview
No ratings yet
Data Science Statistics Overview
21 pages
Descriptive vs Inferential Statistics
100% (2)
Descriptive vs Inferential Statistics
44 pages
Understanding Inferential Statistics
No ratings yet
Understanding Inferential Statistics
3 pages
Inferential Statistics Explained: Key Concepts
No ratings yet
Inferential Statistics Explained: Key Concepts
46 pages
Statistical Terms and Tests Overview
No ratings yet
Statistical Terms and Tests Overview
52 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
2 pages
Understanding Descriptive and Inferential Statistics
No ratings yet
Understanding Descriptive and Inferential Statistics
29 pages
Statistics: Descriptive & Inferential Basics
No ratings yet
Statistics: Descriptive & Inferential Basics
11 pages
Action Research Methodology Guide
No ratings yet
Action Research Methodology Guide
66 pages
Statistical Methods for Environmental Research
No ratings yet
Statistical Methods for Environmental Research
37 pages
Types of Statistical Graphs and Tests
No ratings yet
Types of Statistical Graphs and Tests
10 pages
Essential Statistics for Data Science
No ratings yet
Essential Statistics for Data Science
21 pages
Data Processing and Statistical Analysis Guide
No ratings yet
Data Processing and Statistical Analysis Guide
19 pages
Statistical Modeling with Python Insights
No ratings yet
Statistical Modeling with Python Insights
25 pages
Descriptive Stats & Probability Basics
No ratings yet
Descriptive Stats & Probability Basics
19 pages
Inferential Statistics
No ratings yet
Inferential Statistics
4 pages
2 Final
No ratings yet
2 Final
9 pages
Statistics Cheat Sheet: Formulas & Problems
No ratings yet
Statistics Cheat Sheet: Formulas & Problems
6 pages
? Test of Hypothesis Without Math
No ratings yet
? Test of Hypothesis Without Math
13 pages
Inferential Statistics
No ratings yet
Inferential Statistics
48 pages
Overview of Biostatistics Basics
No ratings yet
Overview of Biostatistics Basics
56 pages
Statistics Revision Notes
No ratings yet
Statistics Revision Notes
9 pages
Wildlife Conservation MCQs and Insights
100% (1)
Wildlife Conservation MCQs and Insights
18 pages
BS Forestry Curriculum Overview
No ratings yet
BS Forestry Curriculum Overview
150 pages
Biology MCQs with Answers PDF
100% (4)
Biology MCQs with Answers PDF
38 pages
MCAT Review Raven
No ratings yet
MCAT Review Raven
23 pages
Biology Quiz Contest Sample Papers
No ratings yet
Biology Quiz Contest Sample Papers
60 pages
Biology Questions for Science Bowl
92% (13)
Biology Questions for Science Bowl
59 pages
Bio-Rational Control of Whitefly in Tomatoes
No ratings yet
Bio-Rational Control of Whitefly in Tomatoes
15 pages
Larkana Weather Alert: Monsoon Forecast
No ratings yet
Larkana Weather Alert: Monsoon Forecast
1 page
Tomato Value Chain in Bangladesh
No ratings yet
Tomato Value Chain in Bangladesh
36 pages
IGCSE Graphs PDF
No ratings yet
IGCSE Graphs PDF
20 pages
Ecology of Immature Bemisia tabaci on Cassava
No ratings yet
Ecology of Immature Bemisia tabaci on Cassava
4 pages
Sindh Flood 2011 Union Council Ranking
No ratings yet
Sindh Flood 2011 Union Council Ranking
1 page
Host Plants Impact on Sweetpotato Whitefly
No ratings yet
Host Plants Impact on Sweetpotato Whitefly
12 pages
White Fly Dynamics on Tomato in Bengal
No ratings yet
White Fly Dynamics on Tomato in Bengal
6 pages
MPH & MSPH Program Fee Structure
No ratings yet
MPH & MSPH Program Fee Structure
1 page
Whitefly Impact on TYLCV in Tomatoes
No ratings yet
Whitefly Impact on TYLCV in Tomatoes
8 pages
Groundnut Pest Surveillance Manual
No ratings yet
Groundnut Pest Surveillance Manual
37 pages
Sugarcane Whitefly Control with Imidacloprid
No ratings yet
Sugarcane Whitefly Control with Imidacloprid
4 pages
Senior Teacher Recruitment Notice 2017
No ratings yet
Senior Teacher Recruitment Notice 2017
1 page
Federal Public Service Commission Exam Syllabi
No ratings yet
Federal Public Service Commission Exam Syllabi
29 pages
Absolute Value and Differentiability at Zero
No ratings yet
Absolute Value and Differentiability at Zero
22 pages
Successive Differentiation Overview
50% (2)
Successive Differentiation Overview
5 pages
JBS MBA Capstone Project Overview 2024
No ratings yet
JBS MBA Capstone Project Overview 2024
7 pages
Gambling and Limit Theorems Explained
No ratings yet
Gambling and Limit Theorems Explained
59 pages
Interpreting the Regression Constant
No ratings yet
Interpreting the Regression Constant
9 pages
Multivariate Calculus Exercises and Solutions
No ratings yet
Multivariate Calculus Exercises and Solutions
2 pages
Lecture Notes For Math-CSE 451: Introduction To Numerical Computation
100% (2)
Lecture Notes For Math-CSE 451: Introduction To Numerical Computation
102 pages
53205-mt - Optimization Techniques
No ratings yet
53205-mt - Optimization Techniques
2 pages
3 Schervish-1995
100% (1)
3 Schervish-1995
718 pages
Trends in External Quality Assessment
100% (2)
Trends in External Quality Assessment
23 pages
OAS5023 Research Method Exam Guidelines
No ratings yet
OAS5023 Research Method Exam Guidelines
18 pages
Lecture For Week5
No ratings yet
Lecture For Week5
28 pages
Measures of Dispersion Assignment
No ratings yet
Measures of Dispersion Assignment
3 pages
Vector Algebra in Electromagnetism
No ratings yet
Vector Algebra in Electromagnetism
9 pages
Introduction to Analytical Chemistry
No ratings yet
Introduction to Analytical Chemistry
13 pages
Student Teachers' Fraction Knowledge Gaps
No ratings yet
Student Teachers' Fraction Knowledge Gaps
20 pages
Newton-Raphson Method Explained
100% (2)
Newton-Raphson Method Explained
30 pages
Metode Forecasting Kualitatif dan Kuantitatif
No ratings yet
Metode Forecasting Kualitatif dan Kuantitatif
26 pages
Reassessing Service Quality Standards
No ratings yet
Reassessing Service Quality Standards
16 pages
Area Between Two Curves Explained
No ratings yet
Area Between Two Curves Explained
25 pages
ISSRA's Analysis of Afghanistan Conflict
No ratings yet
ISSRA's Analysis of Afghanistan Conflict
22 pages
2024-25 Sem 6 B.E. Timetable
No ratings yet
2024-25 Sem 6 B.E. Timetable
12 pages
Grade 11 Statistics and Probability BOW
No ratings yet
Grade 11 Statistics and Probability BOW
4 pages
Engineering Mathematics I Syllabus
No ratings yet
Engineering Mathematics I Syllabus
1 page
Basic Calculus FIDP for Grade 11
No ratings yet
Basic Calculus FIDP for Grade 11
11 pages
De Moivre's Theorem: Statement and Proof
100% (1)
De Moivre's Theorem: Statement and Proof
4 pages
Lipschitz Functions on Compact Sets
No ratings yet
Lipschitz Functions on Compact Sets
33 pages
Basic Statistics Assignment Overview
100% (1)
Basic Statistics Assignment Overview
12 pages
Environment Borena
No ratings yet
Environment Borena
31 pages
Control Systems Syllabus For EC 4 Sem 2018 Scheme - VTU CBCS 18EC43 Syllabus PDF
No ratings yet
Control Systems Syllabus For EC 4 Sem 2018 Scheme - VTU CBCS 18EC43 Syllabus PDF
3 pages

Introduction to Statistics Lab Guide

Uploaded by

Introduction to Statistics Lab Guide

Uploaded by

IB 372

Thanks to Steve Paton - Smithsonian

What are statistics?

Statistic vs. Parameter

Additional central tendency

Mode: the most common value

Range: total span of the data (Xmax -

The Friendly Histogram

Within each sample:

Compared across samples:

The Normal Distribution

Many statistics are based on

Sample normal data:

Properties of the Normal

Theoretical percentiles can be computed

What if my data arent

Time between fires (yr)

Dont worry, there are

Inference: the process by which we

Alternative Hypothesis (HA)

Statistical tests attempt to (mathematically)

How small does p have to be?

Errors in Hypothesis Testing

Type II Error: Accept H0 when H0 is actually

The two-sample version tests whether the

For two-sample test:

How does the t-Test work?

Example: We study tree growth rates on clay

Total variability = Variability due to factors +

ANOVA example continued

ANOVA example continued

ANOVA example continued

ANOVA example continued

For our example we have:

ANOVA example continued

In biology, its common to measure

Where: O is the observed frequency (# of events,

Again well also need to know degrees

Then (again by statistical magic), we

Imagine we conduct an experiment to determine

A reasonable expectation (our null hypothesis) is:

First, we draw up a contingency table:

The critical value for df = 3, = 0.05 is

Since our 2 is greater than 2critical, we reject

Often measured with Pearsons

Computing Pearsons correlation coefficient

In regression, you propose an algebraic

Y = values of the dependent variable

In regression we always plot the independent variable on

Most common method is least-squares

Then the sum of all

Which is the independent variable?

Regression: How good is the

Perfect fit - all

Regression: How good is the

Regression: Is the fit

For our purposes, statistics software will do this

Be careful extrapolating regression lines:

You might also like