0% found this document useful (0 votes)
12 views44 pages

Generalized Linear Models Overview

This lecture introduces Generalized Linear Models (GLMs), discussing their motivation, components, and the transition from ordinary linear regression. It highlights the importance of modeling data to understand relationships between variables and outlines the three main components of a GLM: the random component, systematic component, and link function. The lecture also provides examples of exponential class distributions, emphasizing the flexibility of GLMs in handling various types of data.

Uploaded by

mdmaiduli904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

Generalized Linear Models Overview

This lecture introduces Generalized Linear Models (GLMs), discussing their motivation, components, and the transition from ordinary linear regression. It highlights the importance of modeling data to understand relationships between variables and outlines the three main components of a GLM: the random component, systematic component, and link function. The lecture also provides examples of exponential class distributions, emphasizing the flexibility of GLMs in handling various types of data.

Uploaded by

mdmaiduli904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 11: Introduction to Generalized Linear Models

Dipankar Bandyopadhyay, Ph.D.

BMTRY 711: Analysis of Categorical Data Spring 2011


Division of Biostatistics and Epidemiology
Medical University of South Carolina

Lecture 11: Introduction to Generalized Linear Models – p. 1/44


Outline

1. Introduction (motivation and history)


2. Review ordinary linear regression
3. Components of a GLM
4. Natural Exponential family

Lecture 11: Introduction to Generalized Linear Models – p. 2/44


Brief Overview of Statistics

Response Variable
Explanatory Variables Binary Nominal Continuous
Binary 2 × 2 table Contingency tables t-tests
logistic regression log-linear models
Nominal Logistic regression Contingency tables ANOVA
Log-linear models log-linear models
Continuous Dose-response models It depends Multiple regression
logistic regression
Some Continuous Logistic regression It depends ANCOVA
and some categorical Multiple regression
Note, in general, most common analyses can be approached from a “modelling” approach.
Some such as the log-linear and logistic are topics for this class.

Lecture 11: Introduction to Generalized Linear Models – p. 3/44


Motivation for Modeling

Why do we want to “model” data?


• The structural form of the model describes the patterns of interactions or associations
in data.
• Inference for the model parameters provides a way to evaluate which explanatory
variable(s) are related to the response variable(s) while statistically controlling for the
other variables.
• Estimated model parameters provide measures of the strength and importance of
effects.
• A model’s predicted values “smooth” the data - That is, they provide good estimates of
the mean of the response variable.
• Modeling enables use to examine general extensions to the methods we have studied
thus far.

Lecture 11: Introduction to Generalized Linear Models – p. 4/44


Review of Ordinary Linear Regression

Suppose you have a continuous response variable (Y ) and continuous and/or discrete
explanatory variables (X’s).

You want to model the responses as a function of the explanatory variables (covariates). For
example,
Yi = β0 + β1 x1i + β2 x2i + ei
where
1. Yi is the response for the ith subject
~ = (β0 , β1 , β2 )′ is a (column) vector of constants (or parameters) that describe the
2. β
shape of the regression “line” (line, curve, etc)
~ i = (1, x1i , x2i ) is the (row) vector of explanatory variables for the ith subject.
3. X
4. ei is the random error assumed to be distributed as N (0, σ 2 )

Lecture 11: Introduction to Generalized Linear Models – p. 5/44


In general, you can view the previous regression model as,

Y = E(Y ) + ǫ

Where
E(Y ) = β0 + β1 x1i + β2 x2i
or in more general terms
= Xn×p βp×1

Thus, E(Y ) is the n × 1 vector of expectations.

Note,
2 3 2 3
X1 x11 , x12 , . . . , x1p
6 X2 7 6 x21 , x22 , . . . , x2p 7
X=6 7=6
6 7 6 7
··· ···
7
4 5 4 5
Xn xn1 , xn2 , . . . , xnp

is called the design matrix.

Lecture 11: Introduction to Generalized Linear Models – p. 6/44


Common Models of this Type

The analysis of continuous data has relied heavily on the linear model presented. These
reflect just a few applications of the linear model.
1. Simple linear regression
2. Multiple regression
3. ANOVA
4. ANCOVA

Lecture 11: Introduction to Generalized Linear Models – p. 7/44


Estimators

The least squares estimator for β is

β̃ = (X ′ X)−1 X ′ Y

The predicted value of Y (denoted as Ŷ ) is

Ŷ = X β̃

Diagnostic of the regression fit can be accomplished with the Hat Matrix

H = X(X ′ X)−1 X ′

As we develop our ‘generalized’ approach, you will notice many similarities.

Lecture 11: Introduction to Generalized Linear Models – p. 8/44


Movement towards a GLM

1. For OLS, we are dependent on the distribution of Y being normal.


2. For categorical data (by definition), the normality assumption is rarely feasible.
3. We may also be interested in other relations of the Xβ with Y . Other mapping
functions that ensure the range of Y remains valid is one of the key justifications.

In terms of a GLM, we have three components related to these limitations.

Lecture 11: Introduction to Generalized Linear Models – p. 9/44


Three Components of a GLM

There are 3 components of a generalized linear model (or GLM)


1. Random Component (the outcome)
2. Systematic Component (the design matrix multiplied by the parameter vector)
3. Link function (the function, g(·) that “links” the systematic component to the random
component)
Nelder & Wedderburn (1972) are generally given credit for unifying a broad class (of existing)
models into the GLM definitions.

They showed that provided the random component was part of the ‘Exponential Class’, the
MLEs for all of the models could be obtained using the same algorithm.

Lecture 11: Introduction to Generalized Linear Models – p. 10/44


Random Component

• The random component of a GLM consists of a response variable Y with the


independent observations (y1 , y2 , . . . , yn ).
• For a GLM, Y needs to have a distribution in the natural exponential family.
• Recall from theory, an exponential class distribution is of the form

f (yi ; θi ) = a(θi )b(yi )exp[yi Q(θi )]

This, in terms of common language, is


• a(θi ) is a function only dependent on the unknown parameter
• b(yi ) is a function of the observed sample
• Q(θi ) is a function only dependent on the unknown parameter

Lecture 11: Introduction to Generalized Linear Models – p. 11/44


Easy Example Exponential Class Variable

Suppose,
Y ∼ Poisson(λ)

Then,
e−λ λy
f (y, λ) = y!

1
= e−λ ( y! )ey log λ

Here θ = λ, a(θ) = a(λ) = e−λ , b(y) = 1/y! and Q(π) = log λ


Thus,
A Poisson random variable is of the exponential class variable.

Lecture 11: Introduction to Generalized Linear Models – p. 12/44


Slightly more complicated example

Suppose,
Yi ∼ Bern(π)

where P (Yi = 1) = π and P (Yi = 0) = 1 − π

Then,
f (yi ; π) = π yi (1 − π)1−yi

π yi
= (1−π)yi (1−π)−1

π
= (1 − π)( 1−π )yi

π )
(yi log 1−π
= (1 − π)(1)e

Here θ = π, a(θ) = a(π) = (1 − π), b(y) = 1 and Q(π) = log(π/(1 − π))

Thus, a Bernoulli random variable is a member of the exponential class.

Lecture 11: Introduction to Generalized Linear Models – p. 13/44


Exponential Class Variables in General

In general,
• The majority of distributions we will be interested in are exponential class variables
• This includes the more common examples of
1. Normal
2. Poisson
3. Binomial
4. Multinomial
• It also includes the less common examples of
1. Gamma
2. Negative Binomial

Lecture 11: Introduction to Generalized Linear Models – p. 14/44


Examples

1. Dichotomous (binary) with a fixed number of trials


• MI / No MI
• Success/Failure
• Correct/Incorrect
• Agree/Disagree
These responses have a Bernoulli distribution.
2. Counts (including cells in a contingency table)
• Number of babies born at MUSC daily
• Number of car wrecks per year in Charleston County
These responses have a Poisson distribution.

Lecture 11: Introduction to Generalized Linear Models – p. 15/44


Most Common Distributions

Although, many distributions are members of the Exponential Class,

For the most part, we will focus on the


1. Binomial
2. Poisson
distributions.

However, the approach we will discuss works equally well for all exponential class
distributions.

Lecture 11: Introduction to Generalized Linear Models – p. 16/44


Systematic Component

Denote a new vector (η1 , η2 , . . . , ηn ) such that

P
ηi = jβj xij , i = 1, . . . , n
= Xi β

• Recall, previously, we let ηi be the E(Y ).


• However, this results in a linear regression model.
• If we want to minimize this dependency, let

η = f (E(Y ))

where f (·) is a function.

Lecture 11: Introduction to Generalized Linear Models – p. 17/44


Link Component

Denote,
E(Y ) = µi

Let, g(·) be a monotonic and differentiable function such that

ηi = g(µi )

Thus,
X
g(µi ) = βj xij , i = 1, . . . , N
j

In words, we are now modeling a function of the mean (or Expectation) as a combination of
linear predictors.

Lecture 11: Introduction to Generalized Linear Models – p. 18/44


Link Function

• The function g(·) is called the “link function” since it links the E(Yi ) to the set of
explanatory variables and their estimates.
• For example, if we let g(x) = x (the identify link) and Y is distributed as a Normal R.V.,
then, we are back to the linear model (either simple linear or multiple linear regression)
• For GLM, you generally have the flexibility to choose what ever link you desire.
• However, there is a Special link that we need to consider

Lecture 11: Introduction to Generalized Linear Models – p. 19/44


Canonical Link

f (yi ; θi ) = a(θi )b(yi )exp[yi Q(θi )]

If we revisit the density function for an exponential, we see a function Q(θi ) that looks
interesting.

Q(θ) is defined as the natural parameter of the distribution.

If we let g(·) be defined such that it transforms the mean to the natural parameter, we have
the Canonical link

Lecture 11: Introduction to Generalized Linear Models – p. 20/44


Example Canonical Link

Suppose
Yi ∼ Bern(π)

where P (Yi = 1) = π and P (Yi = 0) = 1 − π

Then we previously showed that

π )
(yi log 1−π
f (yi ; π) = (1 − π)(1)e

with Q(π) = log(π/(1 − π))

So, if we would let


X
g(π) = log(π/(1 − π)) = βj xj
j

We would have the canonical link of a Bernoulli/Binomial distribution.

Lecture 11: Introduction to Generalized Linear Models – p. 21/44


Recall, the function
g(π) = log(π/(1 − π))

was previously introduced as the ‘log odds’ and was called the logit.

Lets recap what we have just accomplished.

If we let the random component be Bernoulli/Binomial and consider the linking function as
logit, we can model the log odds ratio as a linear function of covariates using
X
g(π) = βj xj
j

Lecture 11: Introduction to Generalized Linear Models – p. 22/44


Since g(π) = log(π/1 − π), we can write the success probability as

π
1−π
= eXβ

π = eXβ − πeXβ

π(1 + eXβ ) = eXβ

eXβ
π = 1+eXβ

Lecture 11: Introduction to Generalized Linear Models – p. 23/44


What are the β’s and the X’s?

For the logistic model, we have


X
g(π) = log(π/(1 − π)) = βj xj
j

To answer the question, consider a model in which you have one predictor (i.e., treatment)
and you observe the response MI/No MI.

Let
(
1 if subject i received the active treatment
x1i =
0 else

and
Xi = [1, xi1 ]

Thus, if subject i is on active drug,


Xi = [1, 1]

and if on placebo
Xi = [1, 0]

Lecture 11: Introduction to Generalized Linear Models – p. 24/44


Then, the odds for a person on placebo would be

π
= eβ0 +β1 ·0 = eβ0
1−π

and for a subject on active drug, the log odds would be

π
= eβ0 +β1 ·1 = eβ0 +β1
1−π

Thus, the odds ratio of a success for comparing active treatment to placebo could be written
as
eβ0 +β1
OR = β
= eβ1
e 0
or that log(OR) = β1 .

If you recall, we introduced this notation when we introduced RD, RR and OR.

Lecture 11: Introduction to Generalized Linear Models – p. 25/44


Other Choices in the Link

An implied advantage of the GLM formulation is that you can specify other links to derive
additional parameter interpretations.

For example, suppose you used the “log” link (log(π) = Xβ) instead of the “logit” link.

Now, the log link is not the canonical link, but that is OKAY. Then,

log(π) = β0 + β1 x1i

or
π = eβ0 +β1 x1i
Therefore, the RR could be viewed as

π|x = 1 eβ0 +β1 β1


RR = = = e
π|x = 0 eβ0

β1 can now be interpreted as log Relative Risk.

Lecture 11: Introduction to Generalized Linear Models – p. 26/44


example

Recall our famous MI example.


Myocardial Infarction
Fatal Attack or No
Nonfatal attack Attack
Placebo 189 10845
Aspirin 104 10933
Previously, we estimated the OR to be

OR = (189 · 10933)/(104 · 10845) = 1.832

which indicates that subjects taking placebo had 1.8 times the odds of having an MI when
compared to subjects taking aspirin.

Lecture 11: Introduction to Generalized Linear Models – p. 27/44


Now using a GLM

For this analysis, we want to use the aspirin group as the reference group and estimate.

Therefore in terms of our regression dummy codes, we want


(
1 if subject i received PLACEBO
x1i =
0 if subject i received ASPIRIN

with the response coding of


(
1 if subject i has either a Fatal MI or a Non Fatal MI
Yi =
0 if subject i does not have an MI

Lecture 11: Introduction to Generalized Linear Models – p. 28/44


Inputting Data

And we could input this data into SAS as


data one;
input y x1 count;
cards;
1 1 189
1 0 104
0 1 10845
0 0 10933
;
run;

Lecture 11: Introduction to Generalized Linear Models – p. 29/44


Modeling Fitting Using SAS

And use PROC GENMOD (generalized linear models) to fit the data
proc genmod descending;
freq count;
model y = x1 /dist = bin link=logit;
estimate ’X1’ x1 1 /exp;
run;
Notes:
1. We have specified our design matrix to include just X1. GENMOD automatically
includes an intercept unless you tell it not to.
2. We used “dist=bin” to specify the distribution of Y as binomial
3. We used “link = logit” to fit the canonical link (logistic link)
4. The estimate statement invokes a contrast and exponentiates the parameter estimates
Don’t worry, we’ll become very familiar with GENMOD over the next few weeks.

Lecture 11: Introduction to Generalized Linear Models – p. 30/44


Selected Results

Response Profile

Ordered Total
Value y Frequency

1 1 293
2 0 21778

PROC GENMOD is modeling the probability that y=’1’.

Note: The most important line is the one that indicates what level of the response is
considered a success. In this case, we used “DESCENDING” to specify y=1 as the success.
(by default, SAS takes the first sorted response category as the success)

Lecture 11: Introduction to Generalized Linear Models – p. 31/44


Analysis Of Parameter Estimates

Standard Wald 95% Confidence


Parameter DF Estimate Error Limits

Intercept 1 -4.6552 0.0985 -4.8483 -4.4620


x1 1 0.6054 0.1228 0.3647 0.8462

Contrast Estimate Results

Standard
Label Estimate Error Alpha Confidence Limits

X1 0.6054 0.1228 0.05 0.3647 0.8462


Exp(X1) 1.8321 0.2251 0.05 1.4400 2.3308
Therefore our estimate of OR = 1.832 with a 95% CI of (1.44, 2.33).

Lecture 11: Introduction to Generalized Linear Models – p. 32/44


Using PROC LOGISTIC

proc logistic descending; freq count; model y = x1;


*estimate X1 x1 1 /exp;
run;

THE OUTPUT

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -4.6551 0.0985 2232.4885 <.0001


x1 1 0.6054 0.1228 24.2911 <.0001

Odds Ratio Estimates

Point 95% Wald


Effect Estimate Confidence Limits

x1 1.832 1.440 2.331


Difference? GENMOD is moment-based, but LOGISTIC uses ML!

Lecture 11: Introduction to Generalized Linear Models – p. 33/44


Deviance

For a particular GLM for observations y = (y1 , . . . , yn ), let

l(µ, y)

denote the log likelihood function expressed in terms of the means µ = (µ1 , . . . , µn )

Let
l(µ̂, y)

denote the maximum of the log likelihood for the model.

If we have n observations and fit a model with n parameters, we have a saturated model.

We then have a ‘perfect fit’ of the data (i.e., no degrees of freedom).

Denote the likelihood under the saturated model as

l(y, y)

Lecture 11: Introduction to Generalized Linear Models – p. 34/44


Then, the DEVIANCE of a model is

D = −2[l(µ̂, y) − l(y, y)]

Note:
1. D is distributed as χ2 with df = N − p
2. p is the number of parameters estimated under the alternative (or fitted model).
3. Recall, N in the saturated model is the number of parameters included (one for each
observation).
4. Therefore, using the rules for calculating the df of a contingency table we developed
earlier, df equals the difference in parameters estimated under the null (saturated
model) and the alternative (at least one β not equal to zero)
5. For contingency tables D ≡ G2
6. As we proceed, the Deviance will be used to provide a measure of model fit.

Lecture 11: Introduction to Generalized Linear Models – p. 35/44


Calculation of Deviance by Hand -Okay, not really

If you recall, we used a Poisson log linear model to calculate the Pearson residuals for a
contingency table.

A saturated model for a contingency table is one that contains an interaction term. For

example;
proc genmod;
model count = x1 y x1*y /dist=poi link = log;
run;
produces a saturated model.

Lecture 11: Introduction to Generalized Linear Models – p. 36/44


Proof

The design matrix for this model would be

X = [1, x1i , yi , x1i yi ]

Since x1i = yi = (0, 1)


cell (1,1) has X = (1, 1, 1, 1)
cell (1,2) has X = (1, 1, 0, 0)
cell (2,1) has x = (1, 0, 1, 0)
cell (2,2) has x = (1, 0, 0, 0)

That is, counts for all cells are determined by a combination of Xβ.

Lecture 11: Introduction to Generalized Linear Models – p. 37/44


GENMOD Results

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 0 0.0000 .
Scaled Deviance 0 0.0000 .
Pearson Chi-Square 0 0.0000 .
Scaled Pearson X2 0 0.0000 .
Log Likelihood 181840.4662
Note: df = 0 since we are fitting the saturated model (df = N − N )

l(y, y) = 181840.4662

Lecture 11: Introduction to Generalized Linear Models – p. 38/44


Analysis Of Parameter Estimates

Standard Wald 95% Confidence


Parameter DF Estimate Error Limits

Intercept 1 9.2995 0.0096 9.2808 9.3183


x1 1 -0.0081 0.0136 -0.0346 0.0185
y 1 -4.6552 0.0985 -4.8483 -4.4620
x1*y 1 0.6054 0.1228 0.3647 0.8462
Therefore, cell (1,1)’s count would be

log(count cell 1,1) = 9.2995 − 0.0081 − 4.6552 + 0.6054


= 5.2416
or
count cell 1,1 = e5.2416
= 189

Lecture 11: Introduction to Generalized Linear Models – p. 39/44


The Alternative Model

proc genmod;
model count = x1 y /dist=poi link = log;
run;

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1 25.3720 25.3720


Scaled Deviance 1 25.3720 25.3720
Pearson Chi-Square 1 25.0139 25.0139
Scaled Pearson X2 1 25.0139 25.0139
Log Likelihood 181827.7802
l(µ̂, y) = 181827.7802

Lecture 11: Introduction to Generalized Linear Models – p. 40/44


Analysis Of Parameter Estimates

Standard Wald 95% Confidence


Parameter DF Estimate Error Limits

Intercept 1 9.2956 0.0096 9.2769 9.3144


x1 1 -0.0003 0.0135 -0.0267 0.0261
y 1 -4.3085 0.0588 -4.4238 -4.1932
Thus, the expected cell count for cell (1,1) would be

log(expected count cell 1,1) = 9.2956 − 0.0003 − 4.3085


= 4.9868
or
count cell 1,1 = e4.9868
= 146.467

This is the same values as (with some rounding error) 293 ∗ 11034/22071 = 146.48

Lecture 11: Introduction to Generalized Linear Models – p. 41/44


Therefore,
D = −2(181827.7802 − 181840.4662)
= 25.37

on df = 4 − 3 = 1

Lecture 11: Introduction to Generalized Linear Models – p. 42/44


Summary of Canonical Links

Distribution Natural Parameter Canonical Link


Poisson log(λ) log
Normal µ identity
Binomial log(π/(1 − π)) logit
As stated before, just because the natural parameter suggests a certain link, there is no
requirement to model only using the canonical link.

Lecture 11: Introduction to Generalized Linear Models – p. 43/44


Recap

Some key summary points:


1. We generalized linear models by allowing for the specification of the distribution of Y
and the relationship of the expectation to the design matrix
2. We do not need normality for the regression model
3. GLMs provide a unified theory of modeling that encompasses most of the important
models for continuous and discrete variables
4. As we will see next, model parameters can be estimated by ML
5. By restricting the distributions to only exponential class distributions, we can use the
same algorithm for ML estimation

Lecture 11: Introduction to Generalized Linear Models – p. 44/44

You might also like