Visualization
Siddharth R
Scatter Plot
Source: [Link]
Need of smoothing
● Scatter plot enables visual assessments of relationships or
functional dependencies between the variable
● However, it is often quite difficult in practice because of
noisy data values, sparse data points, and weak
interrelationships
● General pattern vs Precise Nature
● Solution: fitting a smooth curve to the points in the
scatterplot
● Two approaches:
○ Parametric fitting
○ Nonparametric fitting
Parametric Curve Fitting
● Linear Regression (straight line fitting)
○ House Price vs House Size
● Polynomial Regression (curved line fitting)
○ Growth of the tree over time
● Exponential fitting
○ Modeling the spread of virus over time
● Sinusoidal curve fitting
○ Modeling temperature over time
Locally estimated scatterplot smoothing (LOESS)
● LOESS is a non-parametric method, i.e. it doesn’t assume a specific global
functional form for the data
● LOESS is based on local regression, meaning that the regression is
performed locally for each point in the dataset using a subset of the data.
● For each data point, a small neighborhood of surrounding points is selected,
and a weighted linear or polynomial regression is performed on those points.
● LOESS is also called LOWESS, which stands for locally weighted scatterplot
smoothing.
What relationship exists?
● Figure shows 1992 state
voter turnout rates plotted (on
the vertical axis) against the
percentage of high school
graduates in the respective
state populations (on the
horizontal axis).
What is the conclusion now?
● What the relationship looks like?
● A linear model would provide a
misleading depiction of the
relationship
● LOESS helps to avoid the inaccurate
representation of the data.
● Two key parameters
○ Smoothing parameter
○ Degree of polynomial
Steps for LOESS
● Assume that the data consist of n observations on two variables, X and Y.
● The data is displayed in a scatter plot with the scale of X and Y as x-axis and y-axis. The points are
(xi, yi) where i ranges from 1 to n.
● Select a series of m locations that are equally-spaced across the range of X
● Perform a series of m weighted regression analyses
● These regressions are “local” in the sense that each one only uses the subset of observations
● The observations included in each local regression are inversely weighted according to their
distance
● After all of the local regressions are completed, the resultant points are plotted in the scatterplot,
superimposed over the data points
● How the neighborhood subset of points will be selected?
Bandwidth (or Span)
● Bandwidth (often called the smoothing parameter or span)
controls the size of the neighborhood used for local regression.
● A smaller span means a smaller neighborhood and less
smoothing, while a larger span means more smoothing.
● The value ranges from 0 to 1
● Example:
○ A bandwidth of 0.2 uses 20% of the nearest points for local regression.
○ A bandwidth of 0.8 uses 80% of the nearest points.
Impact of Bandwidth/Span
● Larger size tend to cancel
idiosyncratic observations each
other out
● Larger value means that only fewer
observations will change when
moving from one fitting window to
the next.
● Smaller values are highly sensitive
to noise
● Consider LOESS curve as a string
and larger values pulling it tighter
producing a straighter curve.
● Trade off between overfitting vs lack
of fitting
Also See: [Link]
Degree of Polynomial
● If degree of polynomial is set to 1, then linear equations are fit within each of the windows.
● If degree of polynomial is set to 2, quadratic equations are used.
● The figure shows data on public preferences between two candidates Walter Mondale and Gary
Hart, over the course of the 1984 presidential primary campaign
Key points to note
● If the point cloud conforms to a generally monotonic pattern (either increasing
or decreasing), then it should be set to 1 for locally linear fitting.
● If the data exhibit some nonmonotone pattern, with local minima and/or
maxima, then it should be set to a value of 2 for locally quadratic equations.
● The residuals are the differences between the actual data points and the
fitted values. They measure how well the LOESS model fits the data.
● LOESS is sensitive to outliers. So after an initial LOESS fit, residuals are
computed, and points with large residuals are down-weighted in subsequent
iterations to make the model more resistant to outliers.
● What do you think of the computational complexity of LOESS?
High Dimensional Data
● Curse of dimensionality
● Dimensionality reduction approaches
○ Feature selection
○ Feature extraction
● Feature Selection
○ Filter method
○ Wrapper method
○ Embedded method
● Feature Extraction
○ PCA
○ Random projection
Next Class
● PCA
● t-SNE
Thank You !!!