Spectral Clustering

      Zitao Liu
Agenda
•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
CLUSTERING REVIEW
K-MEANS CLUSTERING
• Description
   Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional
   real vector, k-means clustering aims to partition the n observations into k sets
   (k ≤ n) S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS):



   where μi is the mean of points in Si.
• Standard Algorithm




1) k initial "means"    2) k clusters are created   3) The centroid of   4) Steps 2 and 3
(in this case k=3)      by associating every        each of the k        are repeated until
are randomly            observation with the        clusters becomes     convergence has
selected from the       nearest mean.               the new means.       been reached.
data set.
•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
GENERAL




  Disconnected     Groups of points(Weakly connections in between components
graph components   Strongly connections within components)
GRAPH NOTATION
GRAPH NOTATION
SIMILARITY
GRAPH




 All the above graphs are regularly used in spectral clustering!
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
GRAPH
LAPLACIANS
• Unnormalized Graph Laplacian

                         L=D-W
GRAPH
LAPLACIANS
• Unnormalized Graph Laplacian

                         L=D-W
GRAPH
LAPLACIANS
• Unnormalized Graph Laplacian

                         L=D-W




Proof:
GRAPH
LAPLACIANS
• Normalized Graph Laplacian
GRAPH
LAPLACIANS Laplacian
• Normalized Graph
GRAPH
LAPLACIANS
• Normalized Graph Laplacian




  Proof. The proof is analogous to the one of Proposition 2, using Proposition 3.
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
ALGORIGHM
ALGORIGHM

• Unnormalized Graph Laplacian

                         L=D-W
ALGORIGHM
• Normalized Graph Laplacian
ALGORIGHM
• Normalized Graph Laplacian
ALGORIGHM




        It really works!!!
ALGORIGHM
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
GRAPH CUT




  Disconnected     Groups of points(Weakly connections in between components
graph components   Strongly connections within components)
GRAPH CUT
GRAPH CUT
Problems!!!
• Sensitive to outliers




     What we get          What we want
GRAPH CUT
Solutions
• RatioCut(Hagen and Kahng, 1992)




• Ncut(Shi and Malik, 2000)
GRAPH CUT
Problem!!!
• NP hard

Solution!!!
• Approximation


                                    relaxing
                          Ncut                 Normalized Spectral Clustering
Approx       Spectral
imation     Clustering
                                    relaxing
                         RatioCut              Unnormalized Spectral Clustering
GRAPH CUT
• Approximation RatioCut for k=2
    Our goal is to solve the optimization problem:



    Rewrite the problem in a more convenient form:




                     Magic happens!!!
GRAPH CUT
• Approximation RatioCut for k=2
GRAPH CUT
• Approximation RatioCut for k=2
    Additionally, we have




    f satisfies
GRAPH CUT
• Approximation RatioCut for k=2




                             Relaxation !!!




                              Rayleigh-Ritz Theorem
GRAPH CUT
• Approximation RatioCut for k=2

                     f is the eigenvector corresponding to
                     the second smallest eigenvalue of L


        Use the sign as
          indicator               re-convert
           function




      Only works for k = 2                               More General, works for any k
GRAPH CUT
• Approximation RatioCut for arbitrary k



                                           (i=1,…,n; j=1,…,k)
GRAPH CUT
• Approximation RatioCut for arbitrary k
    Problem reformulation:




                                             Relaxation !!!




                                           Rayleigh-Ritz Theorem

                 Optimal H is the first k eigenvectors of L as
                 columns.
GRAPH CUT
• Approximation Ncut for k=2
    Our goal is to solve the optimization problem:



    Rewrite the problem in a more convenient form:




    Similar to above one can check that:
GRAPH CUT
• Approximation Ncut for k=2                                      (6)




                               Relaxation !!!




                                                Rayleigh-Ritz Theorem!!!
GRAPH CUT
• Approximation Ncut for arbitrary k
    Problem reformulation:




                                  Relaxation !!!




                                 Rayleigh-Ritz Theorem
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
RANDOM WALK

• A random walk on a graph is a stochastic process which
  randomly jumps from vertex to vertex.

• Random walk stays long within the same cluster and seldom
  jumps between clusters.

• A balanced partition with a low cut will also have the property
  that the random walk does not have many opportunities to
  jump between clusters.
RANDOM WALK
RANDOM WALK
RANDOM WALK
• Random walks and Ncut
RANDOM WALK
• Random walks and Ncut




Proof. First of all observe that




Using this we obtain




 Now the proposition follows directly with the definition of Ncut.
RANDOM WALK
• Random walks and Ncut
RANDOM WALK
• What is commute distance




  Points which are connected by a short path in the graph and lie in the same
  high-density region of the graph are considered closer to each other than
  points which are connected by a short path but lie in different high-density
  regions of the graph.

                    Well-suited for Clustering
RANDOM WALK

• How to calculate commute distance
  Generalized inverse (also called pseudo-inverse or Moore-Penrose inverse)
RANDOM WALK

• How to calculate commute distance




   This is result has been published by Klein and Randic(1993), where it has been
   proved by methods of electrical network theory.
RANDOM WALK
• Proposition 6’s consequence



                          Construct
                        an embedding
RANDOM WALK
• A loose relation between spectral clustering and commute distance.
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
PERTURBATION THEORY
• Perturbation theory studies the question of how eigenvalues
  and eigenvectors of a matrix A change if we add a small
  perturbation H.
PERTURBATION THEORY




Below we will see that the size of the eigengap can also be used in a different context
as a quality criterion for spectral clustering, namely when choosing the number k of
clusters to construct.
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
PRACTICAL DETAILS
• Constructing the similarity graph
PRACTICAL DETAILS

• Computing Eigenvectors
     1.   How to compute the first eigenvectors efficiently for large L
     2.   Numerical eigensolvers converge to some orthonormal basis of the
          eigenspace.


• Number of Clusters
PRACTICAL DETAILS




    Well Separated                 More Blurry                 Overlap So Much


Eigengap Heuristic usually works well if the data contains very well pronounced
clusters, but in ambiguous cases it also returns ambiguous results.
PRACTICAL DETAILS
• The k-means step
      It is not necessary. People also use other techniques

• Which graph Laplacian should be used?
PRACTICAL DETAILS
• Which graph Laplacian should be used?
       Why normalized is better than unnormalized spectral clustering?
   Objective1:




   Both RatioCut and Ncut directly implement

   Objective2:




   Only Ncut implements

 Normalized spectral clustering implements both clustering objectives mentioned above,
 while unnormalized spectral clustering only implements the first obejctive.
PRACTICAL DETAILS

• Which graph Laplacian should be used?
REFERENCE
•   Ulrike Von Luxburg. A Tutorial on Spectral Clustering. Max Planck Institute for
    Biological Cybernetics Technical Report No. TR-149.
•   Marina Meila and Jianbo Shi. A random walks view of spectral segmentation. In
    Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS),
    2001.
•   A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an
    algorithm. Advances in Neural Information Processing Systems (NIPS),14, 2001.
•   A. Azran and Z. Ghahramani. A new Approach to Data Driven Clustering. In
    International Conference on Machine Learning (ICML),11, 2006.
•   A. Azran and Z. Ghahramani. Spectral Methods for Automatic Multiscale Data
    Clustering. In IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), 2006.
•   Arik Azran. A Tutorial on Spectral Clustring.
    http://videolectures.net/mlcued08_azran_mcl/
SPECTRAL CLUSTERING




            Thank you