0% found this document useful (0 votes)
27 views10 pages

The Elements of Statistical Learning: Trevor Hastie Robert Tibshirani Jerome Friedman

The document is the preface to the second edition of 'The Elements of Statistical Learning' by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, highlighting updates made since the first edition. It includes the addition of four new chapters, revisions for clarity, and improvements in accessibility for colorblind readers. The authors express gratitude to contributors and readers for their feedback and support in enhancing the book's content.

Uploaded by

shahzaibali39390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views10 pages

The Elements of Statistical Learning: Trevor Hastie Robert Tibshirani Jerome Friedman

The document is the preface to the second edition of 'The Elements of Statistical Learning' by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, highlighting updates made since the first edition. It includes the addition of four new chapters, revisions for clarity, and improvements in accessibility for colorblind readers. The authors express gratitude to contributors and readers for their feedback and support in enhancing the book's content.

Uploaded by

shahzaibali39390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Springer Series in Statistics

Trevor Hastie
Robert Tibshirani
Jerome Friedman

The Elements of
Statistical Learning
Data Mining, Inference, and Prediction

Second Edition
This is page v
Printer: Opaque this

To our parents:

Valerie and Patrick Hastie

Vera and Sami Tibshirani

Florence and Harry Friedman

and to our families:

Samantha, Timothy, and Lynda

Charlie, Ryan, Julie, and Cheryl

Melanie, Dora, Monika, and Ildiko


vi
This is page vii
Printer: Opaque this

Preface to the Second Edition

In God we trust, all others bring data.

–William Edwards Deming (1900-1993)1

We have been gratified by the popularity of the first edition of The


Elements of Statistical Learning. This, along with the fast pace of research
in the statistical learning field, motivated us to update our book with a
second edition.
We have added four new chapters and updated some of the existing
chapters. Because many readers are familiar with the layout of the first
edition, we have tried to change it as little as possible. Here is a summary
of the main changes:

1 On the Web, this quote has been widely attributed to both Deming and Robert W.

Hayden; however Professor Hayden told us that he can claim no credit for this quote,
and ironically we could find no “data” confirming that Deming actually said this.
viii Preface to the Second Edition

Chapter What’s new


1. Introduction
2. Overview of Supervised Learning
3. Linear Methods for Regression LAR algorithm and generalizations
of the lasso
4. Linear Methods for Classification Lasso path for logistic regression
5. Basis Expansions and Regulariza- Additional illustrations of RKHS
tion
6. Kernel Smoothing Methods
7. Model Assessment and Selection Strengths and pitfalls of cross-
validation
8. Model Inference and Averaging
9. Additive Models, Trees, and
Related Methods
10. Boosting and Additive Trees New example from ecology; some
material split off to Chapter 16.
11. Neural Networks Bayesian neural nets and the NIPS
2003 challenge
12. Support Vector Machines and Path algorithm for SVM classifier
Flexible Discriminants
13. Prototype Methods and
Nearest-Neighbors
14. Unsupervised Learning Spectral clustering, kernel PCA,
sparse PCA, non-negative matrix
factorization archetypal analysis,
nonlinear dimension reduction,
Google page rank algorithm, a
direct approach to ICA
15. Random Forests New
16. Ensemble Learning New
17. Undirected Graphical Models New
18. High-Dimensional Problems New

Some further notes:

• Our first edition was unfriendly to colorblind readers; in particular,


we tended to favor red/green contrasts which are particularly trou-
blesome. We have changed the color palette in this edition to a large
extent, replacing the above with an orange/blue contrast.

• We have changed the name of Chapter 6 from “Kernel Methods” to


“Kernel Smoothing Methods”, to avoid confusion with the machine-
learning kernel method that is discussed in the context of support vec-
tor machines (Chapter 11) and more generally in Chapters 5 and 14.

• In the first edition, the discussion of error-rate estimation in Chap-


ter 7 was sloppy, as we did not clearly differentiate the notions of
conditional error rates (conditional on the training set) and uncondi-
tional rates. We have fixed this in the new edition.
Preface to the Second Edition ix

• Chapters 15 and 16 follow naturally from Chapter 10, and the chap-
ters are probably best read in that order.

• In Chapter 17, we have not attempted a comprehensive treatment


of graphical models, and discuss only undirected models and some
new methods for their estimation. Due to a lack of space, we have
specifically omitted coverage of directed graphical models.

• Chapter 18 explores the “p  N ” problem, which is learning in high-


dimensional feature spaces. These problems arise in many areas, in-
cluding genomic and proteomic studies, and document classification.

We thank the many readers who have found the (too numerous) errors in
the first edition. We apologize for those and have done our best to avoid er-
rors in this new edition. We thank Mark Segal, Bala Rajaratnam, and Larry
Wasserman for comments on some of the new chapters, and many Stanford
graduate and post-doctoral students who offered comments, in particular
Mohammed AlQuraishi, John Boik, Holger Hoefling, Arian Maleki, Donal
McMahon, Saharon Rosset, Babak Shababa, Daniela Witten, Ji Zhu and
Hui Zou. We thank John Kimmel for his patience in guiding us through this
new edition. RT dedicates this edition to the memory of Anna McPhee.

Trevor Hastie
Robert Tibshirani
Jerome Friedman
Stanford, California
August 2008
x Preface to the Second Edition
This is page xi
Printer: Opaque this

Preface to the First Edition

We are drowning in information and starving for knowledge.

–Rutherford D. Roger

The field of Statistics is constantly challenged by the problems that science


and industry brings to its door. In the early days, these problems often came
from agricultural and industrial experiments and were relatively small in
scope. With the advent of computers and the information age, statistical
problems have exploded both in size and complexity. Challenges in the
areas of data storage, organization and searching have led to the new field
of “data mining”; statistical and computational problems in biology and
medicine have created “bioinformatics.” Vast amounts of data are being
generated in many fields, and the statistician’s job is to make sense of it
all: to extract important patterns and trends, and understand “what the
data says.” We call this learning from data.
The challenges in learning from data have led to a revolution in the sta-
tistical sciences. Since computation plays such a key role, it is not surprising
that much of this new development has been done by researchers in other
fields such as computer science and engineering.
The learning problems that we consider can be roughly categorized as
either supervised or unsupervised. In supervised learning, the goal is to pre-
dict the value of an outcome measure based on a number of input measures;
in unsupervised learning, there is no outcome measure, and the goal is to
describe the associations and patterns among a set of input measures.
xii Preface to the First Edition

This book is our attempt to bring together many of the important new
ideas in learning, and explain them in a statistical framework. While some
mathematical details are needed, we emphasize the methods and their con-
ceptual underpinnings rather than their theoretical properties. As a result,
we hope that this book will appeal not just to statisticians but also to
researchers and practitioners in a wide variety of fields.
Just as we have learned a great deal from researchers outside of the field
of statistics, our statistical viewpoint may help others to better understand
different aspects of learning:
There is no true interpretation of anything; interpretation is a
vehicle in the service of human comprehension. The value of
interpretation is in enabling others to fruitfully think about an
idea.
–Andreas Buja

We would like to acknowledge the contribution of many people to the


conception and completion of this book. David Andrews, Leo Breiman,
Andreas Buja, John Chambers, Bradley Efron, Geoffrey Hinton, Werner
Stuetzle, and John Tukey have greatly influenced our careers. Balasub-
ramanian Narasimhan gave us advice and help on many computational
problems, and maintained an excellent computing environment. Shin-Ho
Bang helped in the production of a number of the figures. Lee Wilkinson
gave valuable tips on color production. Ilana Belitskaya, Eva Cantoni, Maya
Gupta, Michael Jordan, Shanti Gopatam, Radford Neal, Jorge Picazo, Bog-
dan Popescu, Olivier Renaud, Saharon Rosset, John Storey, Ji Zhu, Mu
Zhu, two reviewers and many students read parts of the manuscript and
offered helpful suggestions. John Kimmel was supportive, patient and help-
ful at every phase; MaryAnn Brickner and Frank Ganz headed a superb
production team at Springer. Trevor Hastie would like to thank the statis-
tics department at the University of Cape Town for their hospitality during
the final stages of this book. We gratefully acknowledge NSF and NIH for
their support of this work. Finally, we would like to thank our families and
our parents for their love and support.
Trevor Hastie
Robert Tibshirani
Jerome Friedman
Stanford, California
May 2001
The quiet statisticians have changed our world; not by discov-
ering new facts or technical developments, but by changing the
ways that we reason, experiment and form our opinions ....
–Ian Hacking
This is page xiii
Printer: Opaque this

Contents

Preface to the Second Edition vii

Preface to the First Edition xi

1 Introduction 1

2 Overview of Supervised Learning 9


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Variable Types and Terminology . . . . . . . . . . . . . . 9
2.3 Two Simple Approaches to Prediction:
Least Squares and Nearest Neighbors . . . . . . . . . . . 11
2.3.1 Linear Models and Least Squares . . . . . . . . 11
2.3.2 Nearest-Neighbor Methods . . . . . . . . . . . . 14
2.3.3 From Least Squares to Nearest Neighbors . . . . 16
2.4 Statistical Decision Theory . . . . . . . . . . . . . . . . . 18
2.5 Local Methods in High Dimensions . . . . . . . . . . . . . 22
2.6 Statistical Models, Supervised Learning
and Function Approximation . . . . . . . . . . . . . . . . 28
2.6.1 A Statistical Model
for the Joint Distribution Pr(X, Y ) . . . . . . . 28
2.6.2 Supervised Learning . . . . . . . . . . . . . . . . 29
2.6.3 Function Approximation . . . . . . . . . . . . . 29
2.7 Structured Regression Models . . . . . . . . . . . . . . . 32
2.7.1 Difficulty of the Problem . . . . . . . . . . . . . 32

Common questions

Powered by AI

The second edition of 'The Elements of Statistical Learning' introduces four new chapters, updates some existing ones, and addresses colorblind issues in visuals. The new chapters cover Random Forests, Ensemble Learning, Undirected Graphical Models, and High-Dimensional Problems, reflecting advancements in statistical learning and its increasing application in fields like genomics and document classification . These additions and updates were significant because they reflect the fast-paced developments in statistical learning and aim to incorporate new statistical techniques that evolved since the first edition. They also correct shortcomings from the previous edition, such as differentiating types of error-rate estimation, making the book more comprehensive and precise .

Addressing the 'p ≫ N' problem is critical as it pertains to situations where the dimensionality of the feature space far exceeds the number of observations. This is particularly relevant in fields like genomics, where the number of genetic markers can vastly outnumber samples, and document classification, where each document could be represented in a high-dimensional space reflecting thousands of terms. Tackling this issue involves developing strategies that efficiently handle such scenarios by employing regularization techniques, feature selection, and dimensionality reduction to avoid overfitting and enhance model interpretability and performance. These methods improve the robustness of inferential and predictive models in these high-dimensional contexts .

Adding new methods to the chapter on kernel smoothing methods acknowledges the evolving complexity and performance improvements in machine learning models. The distinction between 'Kernel Smoothing Methods' and more general 'Kernel Methods' addresses confusion over terminology, as the latter is extensively used in SVMs and other advanced analysis techniques . The addition is significant as it aligns the book with current research and practice trends, emphasizing greater precision in machine learning language and broadening understanding of kernel-based techniques beyond traditional applications, which is vital for tackling sophisticated data analysis problems in modern contexts.

The discussion of error-rate estimation has been modified to clearly differentiate between conditional and unconditional error rates. This adjustment is crucial as it addresses previous shortcomings where this distinction was not accurately presented. By providing a more accurate and nuanced understanding of error-rate estimation, the book lays a better foundation for assessing and validating statistical models, which is essential for avoiding misleading results and ensuring robust model evaluation . Precise error estimation is fundamental to developing more reliable predictive models in statistical learning.

Cross-validation plays a crucial role in model assessment and selection by providing a mechanism for estimating the predictive performance of statistical models. The second edition's discussion of its strengths and pitfalls is important because it prepares readers to employ cross-validation effectively. Recognizing these aspects helps avoid overfitting, understand the variability in validation errors, and improve model generalization. It's vital for practitioners to know its constraints, such as computational cost and applicability to high-dimensional data, to leverage it wisely and make informed decisions in model evaluation .

The inclusion of ecological examples and restructuring material between chapters concerning boosting and additive trees likely aim to enhance practical understanding and adapt the content to real-world applications where such models are commonly used. The ecological example provides a concrete use case that illustrates the application of boosting in complex, non-linear relationships typical of ecology. Additionally, moving some materials to adjacent chapters streamlines the narrative and better organizes related topics, facilitating smoother learning progression for readers as they build on foundational concepts and techniques .

It's significant because by bridging the gap between statistics and fields like computer science and engineering, the book enriches the discipline of statistics with computational and algorithmic perspectives, which are essential in handling modern data-centric problems. This interdisciplinary approach fosters the development of more sophisticated techniques and solutions, such as machine learning algorithms, that are applicable across numerous domains. It enhances statisticians' ability to tackle complex, technology-driven challenges and encourages knowledge transfer and collaborative innovations that can drive multiple fields forward concurrently .

The second edition improved accessibility for colorblind readers by changing the color palette, replacing red/green contrasts with orange/blue contrasts. This change is important because it ensures that visual aids in the book are accessible to a wider audience, including those with color vision deficiencies, allowing them to accurately interpret and understand the data and concepts being presented . Ensuring accessibility is crucial in educational materials to provide equal learning opportunities for all readers.

The exclusion of directed graphical models from Chapter 17 limits the chapter's comprehensiveness, as it only provides insight into undirected models. Directed graphical models, like Bayesian networks, are fundamental for representing complex dependencies and knowledge in probabilistic terms. Excluding them might restrict readers' understanding of the full scope of graphical models, potentially overlooking crucial methods for causal inference and decision-making processes in statistical applications. However, this omission could be justified by space limitations and the intention to focus more deeply on newly developed methods for undirected models .

The book addresses modern challenges by discussing the explosion of data size and complexity due to the digital age, which requires statisticians to make sense of large datasets efficiently. It covers the evolution of data mining and bioinformatics as responses to computational demands in various fields, including biology and medicine. By explaining new ideas in statistical learning within a statistical framework, the book helps statisticians glean insights and patterns from data, ultimately transforming data into knowledge despite the complexities of modern datasets .

You might also like