1
Analytics and Visualization of Big Data
Fadel M. Megahed
Lecture 21: Mining Social Network Graphs
Department of Industrial and Systems Engineering
Spring 13
Preface: Network and Communities 2
We can think of networks as something looking like this:
Source: Slide Adapted Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]
Preface: Network and Communities 3
Goal: Finding Densely Linked Clusters
Source: Slide Adapted Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]
Outline for Topics Covered in Chapter 10 4
• What is a Social Network?
Social Networks • Social Network as Graphs
as Graphs •
•
Varieties of Social Networks
Graphs with Several Node Types
Clustering of • Distance Measures
• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities
Direct Discovery • Finding Cliques
• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs
• What makes a good partition?
Partitioning of • Normalized Cuts
Graphs •
•
Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
Social Networks as Graphs 5
Essential Characteristics of
a social network are:
• Building Block Entities
(Entities are typically people)
• At least one relationship
between entities of a network
• Assumption: nonrandomness
Naturally modeled as
undirected graphs:
• Entities Nodes
• Nodes connected if there is a
relationship between entities Source: Click here
• Degree labeling the edges
Are Facebook/Twitter the only Networks that are Social? 6
Other than “friends”
networks, there are
many examples that
exhibit locality of
relationships:
• Telephone Networks
• Email Networks
• Collaboration Networks
• Airport Networks
Graphs with Several Node Types 7
Entities can be of different
types (e.g. Facebook has
people, pages, and network)
A natural way to represent
that is through a k-partite
graph
• Consists of k sets of nodes (no
edges between nodes of same
set)
Figure: [Link] network
8
• What is a Social Network?
Social Networks • Social Network as Graphs
as Graphs •
•
Varieties of Social Networks
Graphs with Several Node Types
Clustering of • Distance Measures
• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities
Direct Discovery • Finding Cliques
• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs
• What makes a good partition?
Partitioning of • Normalized Cuts
Graphs •
•
Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
How do you define a distance measure for a social network?9
If we are to apply standard clustering techniques, our
first step would be to define a distance measure.
Question: What would be a suitable distance
measure for the graph below?
• Hint: The closer the nodes, the better
A B D E
C G F
Problems in Hierarchical Clustering 10
Consider the graph in the Figure below.
A B D E
C G F
Questions:
1. Based on the geometry of the graph, identify some
of the large clusters that we can obtain.
2. By using hierarchical clustering (bottom-up
approach), what is the probability that we cluster B
and D together first?
Problems with Point Assignment (k-Means Clustering) 11
If we start by picking two points at random, they
might be in the same cluster.
If we start by picking one point at random, and select
a point furthest away from it, we can still mess it up –
Example??
Even if we get two reasonable starting points, e.g. B
and F, how will we assign D?
With a larger # nodes, the problem gets more complicated as you can imagine
Betweenness 12
Definition: Betweenness of edge (a,b) is defined as
the number of pairs of nodes (x, y) such that edge (a,b)
lies on the shortest path between them.
Properties:
• High Betweenness is bad!!
Interpretation: High scores suggest that (a,b) runs between
two different communities
Example/Exercise: Calculate the betweenness
A B D E
C G F
Betweenness – An Observation 13
The betweenness score for edges of a graph behave
something like (i.e. not exactly) a distance measure on
the nodes of a graph. Therefore, we can cluster by taking
out the edges in an increasing order of betweenness!!
Example:
A B D E
C G F
What is the interpretation of the first and last sets of clusters?
14
• What is a Social Network?
Social Networks • Social Network as Graphs
as Graphs •
•
Varieties of Social Networks
Graphs with Several Node Types
Clustering of • Distance Measures
• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities
Direct Discovery • Finding Cliques
• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs
• What makes a good partition?
Partitioning of • Normalized Cuts
Graphs •
•
Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
Side Note: Complete Bipartite Graphs 15
Definition: It consists of s
nodes on one side and t Bo
Jackson
nodes on the other, with
all possible edges between NFL
the nodes of one side and
the other present. Deion
Sanders
It is possible that a
MLB
bipartite graph with
many edges has a large
Drew
complete bipartite Hanson
subgraph
Searching for Small Communities 16
We want to enumerate
complete bipartite Bo
Jackson
subgraphs (ks,t)
• ks,t : s nodes on the left and t NFL
nodes on the right
• Note that the book defines
Deion
ks,t differently Sanders
In example, s=3 and t=2; MLB
• Note it is more efficient to
have s<=t (i.e. rotate graph Drew
if we were to make any real Hanson
calculations)
A 3 Step Plan 17
Two points:
(1) Dense bipartite graph: the signature of a community
(2) Complete bipartite subgraph Ks,t
• Ks,t = graph on s nodes, each links to the same t other nodes
Plan:
How do we solve (2) in a giant graph?
Similar problems were solved on big non-graph data?
Details Regarding Frequent Itemset Enumeration 18
Setting:
Market: Universe U of n items
Baskets: m subsets of U: S1, S2, …, Sm ⊆ U (Si is a set of
items one person bought)
Support: Frequency threshold f
Goal:
Find all items in T that were bought together at least
f times (T s.t. T ⊆ Si of ≥ f sets Si )
What’s the connection between the itemsets and
complete bipartite graphs?
From Itemsets to Bipartite ks,t 19
Source: Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]
From Itemsets to Bipartite ks,t 20
Source: Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]
From Itemsets to Bipartite ks,t – Summary 21
Analytical result:
Complete bipartite subgraphs Ks,t are embedded in
larger dense enough graphs (i.e., the communities)
• Biparite subgraphs act as “signatures” of communities
Algorithmic result:
Frequent itemset extraction and dynamic programming
finds graphs Ks,t
• Method is super scalable
22
• What is a Social Network?
Social Networks • Social Network as Graphs
as Graphs •
•
Varieties of Social Networks
Graphs with Several Node Types
Clustering of • Distance Measures
• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities
Direct Discovery • Finding Cliques
• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs
• What makes a good partition?
Partitioning of • Normalized Cuts
Graphs •
•
Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
What makes a good partition? 23
A good partition has the following properties:
• Maximize the number of within-group connections
• Minimize the number of between-group connections
A B D E
C G F
Best (and
shortest) cut
What makes a good partition? 24
A good partition has the following properties:
• Maximize the number of within-group connections
• Minimize the number of between-group connections
A B D E
C G F
H
Best
cut
Shortest
cut
Typically, we want the clusters to be similar in size
Normalized Cuts 25
A proper definition of a “good” cut must produce
balanced sets.
Suppose we want to divide the figure into two distinct
sets of nodes: S and T, then the normalized cut is:
Example: Identify the ncut:
Smallest cut A B D E
Optimal cut
C G F
H
Some Matrices that Describe the Graphs 26
Adjacency Matrix (A)
n×n matrix
A=[aij], aij= 1 if edge exists between node i and j
0 otherwise A B C D E F G
A 0 1 1 0 0 0 0
A B D E B 1 0 1 1 0 0 0
C 1 1 0 0 0 0 0
D 0 1 0 0 1 1 1
C G F E 0 0 0 1 0 1 0
F 0 0 0 1 1 0 1
Important Property: G 0 0 0 1 0 1 0
Symmetric Matrix
Some Matrices that Describe the Graphs 27
Degree Matrix (D)
n×n matrix
D=[dii], dii= degree of node
A B C D E F G
A 2 0 0 0 0 0 0
A B D E B 0 3 0 0 0 0 0
C 0 0 2 0 0 0 0
D 0 0 0 4 0 0 0
C G F E 0 0 0 0 2 0 0
F 0 0 0 0 0 3 0
Important Property: G 0 0 0 0 0 0 2
Diagonal
Some Matrices that Describe the Graphs 28
Laplacian Matrix (L)
n×n matrix
L=D-A = [lij], lii= dii
lij = -1 if edge exists A B C D E F G
0 otherwise A 2 -1 -1 0 0 0 0
A B D E B -1 3 -1 -1 0 0 0
C -1 -1 2 0 0 0 0
D 0 -1 0 4 -1 -1 -1
C G F E 0 0 0 -1 2 -1 0
F 0 0 0 -1 -1 3 -1
G 0 0 0 -1 0 -1 2
Eigenvalues of the Laplacian Matrix 29
Partition based on the smallest eigenvector
30
Analytics and Visualization of Big Data
Fadel M. Megahed
Lecture 21: Mining Social Network Graphs
Department of Industrial and Systems Engineering
Spring 13