0% found this document useful (0 votes)
142 views30 pages

Mining Social Network Graphs Analysis

This document discusses mining social network graphs. It begins by introducing social networks as graphs, with nodes representing entities and edges representing relationships. It describes different types of social networks and how graphs can represent networks with multiple node types. It then discusses techniques for clustering social network graphs, including calculating betweenness to identify dense clusters. It also covers direct discovery of communities by finding complete bipartite subgraphs, analogous to frequent itemset mining. Partitioning graphs to identify communities is also briefly discussed.

Uploaded by

TechTown
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views30 pages

Mining Social Network Graphs Analysis

This document discusses mining social network graphs. It begins by introducing social networks as graphs, with nodes representing entities and edges representing relationships. It describes different types of social networks and how graphs can represent networks with multiple node types. It then discusses techniques for clustering social network graphs, including calculating betweenness to identify dense clusters. It also covers direct discovery of communities by finding complete bipartite subgraphs, analogous to frequent itemset mining. Partitioning graphs to identify communities is also briefly discussed.

Uploaded by

TechTown
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

Analytics and Visualization of Big Data

Fadel M. Megahed

Lecture 21: Mining Social Network Graphs

Department of Industrial and Systems Engineering


Spring 13
Preface: Network and Communities 2

We can think of networks as something looking like this:

Source: Slide Adapted Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]
Preface: Network and Communities 3

Goal: Finding Densely Linked Clusters

Source: Slide Adapted Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]
Outline for Topics Covered in Chapter 10 4

• What is a Social Network?


Social Networks • Social Network as Graphs
as Graphs •

Varieties of Social Networks
Graphs with Several Node Types

Clustering of • Distance Measures


• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities

Direct Discovery • Finding Cliques


• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs

• What makes a good partition?


Partitioning of • Normalized Cuts
Graphs •

Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
Social Networks as Graphs 5

 Essential Characteristics of
a social network are:
• Building Block  Entities
(Entities are typically people)
• At least one relationship
between entities of a network
• Assumption: nonrandomness

 Naturally modeled as
undirected graphs:
• Entities  Nodes
• Nodes connected if there is a
relationship between entities Source: Click here
• Degree  labeling the edges
Are Facebook/Twitter the only Networks that are Social? 6

 Other than “friends”


networks, there are
many examples that
exhibit locality of
relationships:
• Telephone Networks
• Email Networks
• Collaboration Networks
• Airport Networks
Graphs with Several Node Types 7

 Entities can be of different


types (e.g. Facebook has
people, pages, and network)

 A natural way to represent


that is through a k-partite
graph
• Consists of k sets of nodes (no
edges between nodes of same
set)

 Figure: [Link] network


8

• What is a Social Network?


Social Networks • Social Network as Graphs
as Graphs •

Varieties of Social Networks
Graphs with Several Node Types

Clustering of • Distance Measures


• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities

Direct Discovery • Finding Cliques


• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs

• What makes a good partition?


Partitioning of • Normalized Cuts
Graphs •

Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
How do you define a distance measure for a social network?9
 If we are to apply standard clustering techniques, our
first step would be to define a distance measure.

 Question: What would be a suitable distance


measure for the graph below?
• Hint: The closer the nodes, the better 
A B D E

C G F
Problems in Hierarchical Clustering 10

 Consider the graph in the Figure below.


A B D E

C G F
Questions:
1. Based on the geometry of the graph, identify some
of the large clusters that we can obtain.

2. By using hierarchical clustering (bottom-up


approach), what is the probability that we cluster B
and D together first?
Problems with Point Assignment (k-Means Clustering) 11

 If we start by picking two points at random, they


might be in the same cluster.

 If we start by picking one point at random, and select


a point furthest away from it, we can still mess it up –
Example??

 Even if we get two reasonable starting points, e.g. B


and F, how will we assign D?

With a larger # nodes, the problem gets more complicated as you can imagine 
Betweenness 12

 Definition: Betweenness of edge (a,b) is defined as


the number of pairs of nodes (x, y) such that edge (a,b)
lies on the shortest path between them.

 Properties:
• High Betweenness is bad!!
 Interpretation: High scores suggest that (a,b) runs between
two different communities

 Example/Exercise: Calculate the betweenness


A B D E

C G F
Betweenness – An Observation 13

The betweenness score for edges of a graph behave


something like (i.e. not exactly) a distance measure on
the nodes of a graph. Therefore, we can cluster by taking
out the edges in an increasing order of betweenness!!

Example:

A B D E

C G F

What is the interpretation of the first and last sets of clusters?


14

• What is a Social Network?


Social Networks • Social Network as Graphs
as Graphs •

Varieties of Social Networks
Graphs with Several Node Types

Clustering of • Distance Measures


• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities

Direct Discovery • Finding Cliques


• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs

• What makes a good partition?


Partitioning of • Normalized Cuts
Graphs •

Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
Side Note: Complete Bipartite Graphs 15

 Definition: It consists of s
nodes on one side and t Bo
Jackson
nodes on the other, with
all possible edges between NFL
the nodes of one side and
the other present. Deion
Sanders
 It is possible that a
MLB
bipartite graph with
many edges has a large
Drew
complete bipartite Hanson
subgraph 
Searching for Small Communities 16

 We want to enumerate
complete bipartite Bo
Jackson
subgraphs (ks,t)
• ks,t : s nodes on the left and t NFL
nodes on the right
• Note that the book defines
Deion
ks,t differently Sanders

 In example, s=3 and t=2; MLB


• Note it is more efficient to
have s<=t (i.e. rotate graph Drew
if we were to make any real Hanson
calculations)
A 3 Step Plan 17

Two points:
(1) Dense bipartite graph: the signature of a community
(2) Complete bipartite subgraph Ks,t
• Ks,t = graph on s nodes, each links to the same t other nodes

Plan:
How do we solve (2) in a giant graph?
Similar problems were solved on big non-graph data?
Details Regarding Frequent Itemset Enumeration 18

Setting:
 Market: Universe U of n items
 Baskets: m subsets of U: S1, S2, …, Sm ⊆ U (Si is a set of
items one person bought)
 Support: Frequency threshold f

Goal:
 Find all items in T that were bought together at least
f times (T s.t. T ⊆ Si of ≥ f sets Si )
 What’s the connection between the itemsets and
complete bipartite graphs?
From Itemsets to Bipartite ks,t 19

Source: Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]


From Itemsets to Bipartite ks,t 20

Source: Jure Leskovic, Stanford CS246, Lecture Notes, see [Link]


From Itemsets to Bipartite ks,t – Summary 21

Analytical result:
 Complete bipartite subgraphs Ks,t are embedded in
larger dense enough graphs (i.e., the communities)
• Biparite subgraphs act as “signatures” of communities

Algorithmic result:
 Frequent itemset extraction and dynamic programming
finds graphs Ks,t
• Method is super scalable
22

• What is a Social Network?


Social Networks • Social Network as Graphs
as Graphs •

Varieties of Social Networks
Graphs with Several Node Types

Clustering of • Distance Measures


• Applying Standard Clustering Techniques
Social Network • Betweenness
Graphs • Using Betweenness to Find Communities

Direct Discovery • Finding Cliques


• Complete Bipartite Graphs
of Communities • Finding Complete Bipartite Subgraphs

• What makes a good partition?


Partitioning of • Normalized Cuts
Graphs •

Some Matrices that Describe Graphs
Eigenvalues of the Laplacian Matrix
What makes a good partition? 23

 A good partition has the following properties:


• Maximize the number of within-group connections
• Minimize the number of between-group connections

A B D E

C G F
Best (and
shortest) cut
What makes a good partition? 24

 A good partition has the following properties:


• Maximize the number of within-group connections
• Minimize the number of between-group connections

A B D E

C G F
H
Best
cut

Shortest
cut
Typically, we want the clusters to be similar in size
Normalized Cuts 25

 A proper definition of a “good” cut must produce


balanced sets.

 Suppose we want to divide the figure into two distinct


sets of nodes: S and T, then the normalized cut is:

Example: Identify the ncut:


 Smallest cut A B D E
 Optimal cut

C G F
H
Some Matrices that Describe the Graphs 26

Adjacency Matrix (A)


 n×n matrix
 A=[aij], aij= 1 if edge exists between node i and j
0 otherwise A B C D E F G
A 0 1 1 0 0 0 0
A B D E B 1 0 1 1 0 0 0
C 1 1 0 0 0 0 0
D 0 1 0 0 1 1 1
C G F E 0 0 0 1 0 1 0
F 0 0 0 1 1 0 1
Important Property: G 0 0 0 1 0 1 0
 Symmetric Matrix
Some Matrices that Describe the Graphs 27

Degree Matrix (D)


 n×n matrix
 D=[dii], dii= degree of node
A B C D E F G
A 2 0 0 0 0 0 0
A B D E B 0 3 0 0 0 0 0
C 0 0 2 0 0 0 0
D 0 0 0 4 0 0 0
C G F E 0 0 0 0 2 0 0
F 0 0 0 0 0 3 0
Important Property: G 0 0 0 0 0 0 2
 Diagonal
Some Matrices that Describe the Graphs 28

Laplacian Matrix (L)


 n×n matrix
 L=D-A = [lij], lii= dii
lij = -1 if edge exists A B C D E F G
0 otherwise A 2 -1 -1 0 0 0 0
A B D E B -1 3 -1 -1 0 0 0
C -1 -1 2 0 0 0 0
D 0 -1 0 4 -1 -1 -1
C G F E 0 0 0 -1 2 -1 0
F 0 0 0 -1 -1 3 -1
G 0 0 0 -1 0 -1 2
Eigenvalues of the Laplacian Matrix 29

 Partition based on the smallest eigenvector


30

Analytics and Visualization of Big Data

Fadel M. Megahed

Lecture 21: Mining Social Network Graphs

Department of Industrial and Systems Engineering


Spring 13

You might also like