Optimizing Rank Aggregation Methods
Optimizing Rank Aggregation Methods
NIR AILON
Google Research, New York, NY
MOSES CHARIKAR
Princeton University, Princeton, NJ
AND
ALANTHA NEWMAN
DIMACS, Rutgers University, New Brunswick, NJ 23
Abstract. We address optimization problems in which we are given contradictory pieces of input
information and the goal is to find a globally consistent solution that minimizes the extent of dis-
agreement with the respective inputs. Specifically, the problems we address are rank aggregation, the
feedback arc set problem on tournaments, and correlation and consensus clustering. We show that
for all these problems (and various weighted versions of them), we can obtain improved approxima-
tion factors using essentially the same remarkably simple algorithm. Additionally, we almost settle a
long-standing conjecture of Bang-Jensen and Thomassen and show that unless NP⊆BPP, there is no
polynomial time algorithm for the problem of minimum feedback arc set in tournaments.
Categories and Subject Descriptors: F.2 [Theory of Computation]: Analysis of Algorithms and
Problem Complexity; G.2 [Mathematics of Computing]: Discrete Mathematics
The work of N. Ailon was done while he was at Princeton University, partly supported by a Princeton
University Honorific Fellowship.
M. Charikar was supported by a National Science Foundation (NSF) ITR grant CCR-0205594,
DOE Early Career Principal Investigator award DE-FG02-02ER25540, NSF CAREER award CCR-
0237113, and Alfred P. Sloan fellowship and a Howard B. Wentz Jr. Junior Faculty award.
The work of A. Newman was done while she was visiting Princeton University, supported by M.
Charikar’s Alfred P. Sloan fellowship.
Authors’ addresses: N. Ailon, Google NY, 4th Floor, 76 Ninth Avenue, New York NY 10011, e-mail:
nailon@[Link]; M. Charikar, Department of Computer Science, Princeton University, 35 Olden
Street, Princeton, NJ 08540, e-mail: moses@[Link]; A. Newman, DIMACS Center, CoRE
Building, 4th Floor, Rutgers University, 96 Frelinghuysen Road, Piscataway, NJ 08854, e-mail:
[Link]@[Link].
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or direct commercial
advantage and that copies show this notice on the first page or initial screen of a display along with the
full citation. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute
to lists, or to use any component of this work in other works requires prior specific permission and/or
a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701,
New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@[Link].
C 2008 ACM 0004-5411/2008/10-ART23 $5.00 DOI 10.1145/1411509.1411513 [Link]
10.1145/1411509.1411513
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:2 N. AILON ET AL.
1. Introduction
The problem of aggregating inconsistent information from many different sources
arises in numerous contexts and disciplines. For example, the problem of ranking a
set of contestants or a set of alternatives based on possibly conflicting preferences
is a central problem in the areas of voting and social choice theory. Combining k
different complete ranked lists on the same set of n elements into a single ranking,
which best describes the preferences expressed in the given k lists, is known as the
problem of rank aggregation. This problem dates back to as early as the late 18th
century when Condorcet and Borda each proposed voting systems for elections with
more than two candidates [Condorcet 1785; Borda 1781]. There are numerous ap-
plications in sports, databases, and statistics [Dwork et al. 2001a; Fagin et al. 2003]
in which it is necessary to effectively combine rankings from different sources.
Another example of aggregating information is the problem of integrating possibly
contradictory clusterings from existing data sets into a single representative cluster-
ing. This problem is known as consensus clustering or ensemble clustering and can
be applied to remove noise and incongruencies from data sets [Filkov and Skiena
2003] or combine information from multiple classifiers [Strehl 2002].
In the last half century, rank aggregation has been studied and defined from
a mathematical perspective. In particular, Kemeny proposed a precise criterion
for determining the “best” aggregate ranking [Kemeny 1959; Kemeny and Snell
1962].1 Given n candidates and k permutations of the candidates, {π1 , π2 , . . . , πk },
a Kemeny optimal ranking of the candidates is the ranking π that minimizes a
“sum of distances”, ik d(π, πi ), where d(π j , πk ) denotes the number of pairs
of candidates that are ranked in different orders by π j and πk .2 For example, if
π j = (1, 2, 3, 4) and πk = (2, 3, 1, 4), then d(π j , πk ) = 2 since elements 1 and 2
appear in different orders in the two rankings as do elements 1 and 3. In other words,
a Kemeny optimal ranking minimizes the number of pairwise disagreements with
the given k rankings. Throughout this article we will refer to the problem of finding
a Kemeny optimal ranking as RANK-AGGREGATION.
More recently, RANK-AGGREGATION has been studied from a computational per-
spective. Finding a Kemeny optimal ranking is NP-hard [Bartholdi et al. 1989]
and remains NP-hard even when there are only four input lists to aggregate
[Dwork et al. 2001a]. This motivates the problem of finding a ranking that ap-
proximately minimizes the number of disagreements with the given input rankings.
Several 2-approximation algorithms are known [Diaconis and Graham 1977; Dwork
1
Historically known as Kemeny aggregation.
The distance function d(·, ·) is in fact a distance function and is known as the Kendall tau distance.
2
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:3
et al. 2001a]. In fact, if we take the best of the input rankings, then the number of
disagreements between this ranking and the k input rankings is no more than twice
optimal.
The feedback arc set problem on tournaments is closely related to the RANK-
AGGREGATION problem. A tournament is a directed graph G = (V, A) such that
for each pair of vertices i, j ∈ V , either (i, j) ∈ A of ( j, i) ∈ A. The minimum
feedback arc set is the smallest set A ⊆ A such that (V, A − A ) is acyclic. The
size of this set is exactly the minimum number of backward edges induced by
a linear ordering of V . Throughout the article, we refer to this problem as FAS-
TOURNAMENT. This problem turns out to be useful in studying RANK-AGGREGATION,
but is also interesting in its own right. For example, imagine a sports tournament
where each player plays against every other player once: How should we rank
the players based on these possibly non-transitive (inconsistent) outcomes? The
complementary problem to finding a minimum feedback arc set is the maximum
acyclic subgraph problem, also known as the linear ordering problem. RANK-
AGGREGATION can be cast as a special case of weighted FAS-TOURNAMENT, where
the objective is to minimize the total weight of backward edges in a linear order of
the vertices. When the weight of edge (i, j) is the fraction of input rankings that
order i before j, solving RANK-AGGREGATION is equivalent to solving this weighted
FAS-TOUR-NAMENT instance.
The last problem we consider is that of clustering objects based on complete
but possibly conflicting pairwise information. An instance of this problem can be
represented by a graph with a vertex for each object and an edge labeled (+) or
(−) for each pair of vertices, indicating that two elements should be in the same
or different clusters, respectively. The goal is to cluster the elements so as to min-
imize the number of “−” edges within clusters and “+” edges crossing clusters.
This problem is known as CORRELATION-CLUSTERING (on complete graphs) [Bansal
et al. 2004]. A useful application of CORRELATION-CLUSTERING is optimally com-
bining the output of different machine learning classifiers [Bansal et al. 2004;
Strehl 2002]. Bansal et al. [2004] provide in-depth descriptions of other applica-
tions of CORRELATION-CLUSTERING. An analog to RANK-AGGREGATION is known as
CONSENSUS-CLUSTERING. In this problem, we are given k clusterings of the same
set of n elements. The goal is to find a clustering that minimizes the number of
pairwise disagreements with the given k clusterings. This problem can also be used
to optimally combine datasets. For example, CONSENSUS-CLUSTERING has been ap-
plied to the problem of integrating data resulting from experiments that measure
gene expression [Filkov and Skiena 2003].
1.1. PREVIOUS WORK. The minimum feedback arc set problem can be approx-
imated to within a factor of O(log n log log n) in general graphs [Even et al. 1998;
Seymour 1995] and has (at least) the same approximation hardness as the vertex
cover problem [Karp 1972], which is 1.36 [Dinur and Safra 2002]. More than a
decade ago, Bang-Jensen and Thomassen [1992] conjectured that FAS-TOURNAMENT
is NP-hard. However, for the past decade, no progress has been made on settling
this conjecture. In contrast, the minimum feedback vertex set problem on tourna-
ments is NP-hard [Speckenmeyer 1989] and is approximable to within 5/2 [Cai
et al. 2001].
We are not aware of any approximation for FAS-TOURNAMENT that improves on
the bound for the feedback arc set problem in general graphs. The complementary
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:4 N. AILON ET AL.
maximization problem on tournaments has been studied; Arora et al. [1996] and
Frieze and Kannan [1999] gave PTASs for the maximum acyclic subgraph problem
in dense graphs, which implies a PTAS for the problem on tournaments. Inter-
estingly, since the appearance of the conference version of this work [Ailon et al.
2005], Kenyon-Mathieu and Schudy [2007] used the maximization PTAS as a main
component in a minimization PTAS. This significantly improves on the result in this
work for the ranking problems (in particular for RANK-AGGREGATION), since here
we guarantee only constant approximation factors. Neverthelss, our algorithms are
very simple and practical and more suitable for applications. Refer to Section 10
for a complete survey and comparison with followup work.
There are two well-known factor 2-approximation algorithms for sc Rank-Aggre-
gation. Since both RANK-AGGREGATION and CONSENSUS-CLUSTERING are equivalent
to finding the median of a set of points with a metric distance function, it easy to
see that choosing one of the given lists or given clusters at random, yields a 2-
approximation algorithm. We refer to these algorithms as PICK-A-PERM and PICK-
A-CLUSTER, respectively. The Spearman’s footrule distance between two permuta-
tions πi and π j on n elements is defined to be: F(πi , π j ) = nk=1 |πi (k) − π j (k)|.
The footrule distance is no more than twice the Kemeny distance [Diaconis
and Graham 1977] and can be computed in polynomial time via a minimum
cost matching [Dwork et al. 2001a, 2001b]. These observations yield another
2-approximation.
CORRELATION-CLUSTERING has been studied both on general and complete graphs.
Both the minimization and maximization versions have been investigated. Bansal
et al. [2004] gave the first constant-factor approximation for the problem of min-
imizing disagreements on the complete graph. This factor was improved to 4
by rounding a linear program Charikar et al. [2003]. The weighted version of
CORRELATION-CLUSTERING, in which edges have fractional ± assignments has also
been studied. Each edge is assigned fractional values w ij+ and w ij− rather than a dis-
crete “+” or “−” label. When the edge weights satisfy the probability constraints
(i.e., w ij+ + w ij− = 1 for all edges), the best previous approximation factor was 7
[Gionis et al. 2005; Bansal et al. 2004]. When the edge weights satisfy the prob-
ability and the triangle inequality constraints (see Section 1.2), the best previous
approximation factor was 3 [Gionis et al. 2005]. CORRELATION-CLUSTERING on com-
plete graphs is MAX-SNP-hard [Charikar et al. 2003] and CONSENSUS-CLUSTERING
is NP-hard [Wakabayashi 1998]. However, CONSENSUS-CLUSTERING is not
known to be NP-hard if the number of input clusters is constant [Filkov and Skiena
2003].
1.2. OUR RESULTS. We give improved approximation algorithms for the fol-
lowing optimization problems:
—FAS-TOURNAMENT,
—RANK-AGGREGATION,
—CORRELATION-CLUSTERING, and
—CONSENSUS-CLUSTERING.
We show that they can all be approximated using essentially the same remarkably
simple algorithm. For example, the algorithm for FAS-TOURNAMENT, called KWIK-
SORT, is as follows: First, we pick a random vertex i to be the “pivot” vertex. Second,
we place all vertices connected to i with an in-edge on the left side of i and all
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:5
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:6 N. AILON ET AL.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:7
For a weighted FAS-TOURNAMENT instance, we will apply our algorithm for FAS-
TOURNAMENT on an unweighted graph to a majority tournament, which is an un-
weighted tournament that corresponds to the input weighted tournament. Similarly,
a weighted CORRELATION-CLUSTERING instance has a corresponding unweighted
majority instance.
Note that although the majority instances depend on the weights of the weighted
instances, they are unweighted instances.
We will use (i, j, k) to denote the directed triangle (i → j, j → k, k → i). It
will be clear from the context whether a triangle is the set of its vertices or its edges.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:8 N. AILON ET AL.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:9
the breaking vertex with probability 1/3, because all vertices input to a recursive
call are chosen as pivot with equal probability. Therefore, any edge e = (i, j) of
t becomes a backward edge with probability 1/3 (still, conditioned on At ). More
formally, if we let Be denote the event that e becomes a backward edge, then
1
Pr [Be ∧ At ] = Pr [Be |At ] Pr [At ] =pt .
3
The event Be ∧ At means that the backwardness of edge e was charged to triangle
t to which it is incident. The main observation of this proof is as follows: for two
different triangles t, t ∈ T sharing an edge e, the events Be ∧ At and Be ∧ At
are disjoint. Indeed, an edge e can be charged to only one triangle t incident to e.
Therefore, for all e ∈ E,
1
pt ≤ 1 . (1)
t:e∈t 3
So { pt /3}t∈T is a fractional packing of T . Thus, C OPT ≥ t∈T pt /3 = E[C KS ]/3,
as required.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:10 N. AILON ET AL.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:11
PICK-A-PERM({π1 , π2 , . . . πk })
Output a permutation πi chosen uniformly at random
from the input permutations.
(In practice, we can pick the permutation πi that minimizes the cost, but we use
the randomized version for the analysis.) Let C PAP denote the cost of PICK-A-PERM
on the RANK-AGGREGATION instance. Let G w = (V, Aw ) be the corresponding
unweighted majority tournament. Let z(e) = 2w(e)w(e), where w(e) and w(e) are
defined as in Section 4. We claim that
E[C PAP ] = z(e) . (4)
e∈Aw
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:12 N. AILON ET AL.
as required.
11 ∗
LEMMA 5.2. For all t ∈ T , 37 w(t) + 47 z(t) ≤ 7
c (t) , and for all e ∈ Aw ,
3
7
w(e) + 47 z(e) ≤ 11
7
c∗ (e).
PROOF. The second inequality in the lemma is obtained by verifying the simple
fact that w(e) ≤ c∗ (e) and z(e) ≤ 2c∗ (e) for all e ∈ Aw . To prove the first inequality,
we want to show that
3 4 11
f (t) = w(t) + z(t) − c∗ (t) ≤ 0, (5)
7 7 7
where (slightly changing notation) t = (w 1 , w 2 , w 3 ) and
w(t) = w 1 + w 2 + w 3
z(t) = 2w 1 (1 − w 1 ) + 2w 2 (1 − w 2 ) + 2w 3 (1 − w 3 )
c∗ (t) = 1 − w 2 + 1 − w 3 + w 1
1/2 ≤ w 1 ≤ w j ≤ 1 for j = 2, 3
w1 + w2 + w3 ≤ 2
The proof can be completed by finding the global maximum of f (t) on the
defined polytope using standard techniques of multivariate calculus.
Note that for (w 1 , w 2 , w 3 ) = (1/2, 3/4, 3/4) we obtain w(t) = 2, z(t) = 5/4 and
c∗ (t) = 1, so (5) is tight. Theorem 5.3 follows from Theorem 5.1 and Lemma 5.2,
using β = 3/7 and γ = 11/7:
THEOREM 5.3. The best of KWIKSORT on G w and PICK-A-PERM is an expected
11/7 approximation for RANK-AGGREGATION.
In using Theorem 5.1 to derive bounds, we can also take advantage of a priori
knowledge of the system of weights w. We illustrate this using the special case of
only k = 3 voters, a case of independent interest in applications [Chaudhuri et al.
2006].
LEMMA 5.4. If k = 3, then for all t ∈ T , 25 w(t) + 35 z(t) ≤ 65 c∗ (t) and for all
e ∈ Aw , 25 w(e) + 35 z(e) ≤ 65 c∗ (e).
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:13
PROOF. In this special case, we have that w(e) ∈ {2/3, 1} for all e ∈ Aw , and
w(e1 ) = w(e2 ) = w(e3 ) = 2/3 for all t = (e1 , e2 , e3 ) ∈ T , therefore w(t) =
2, z(t) = 4/3 and c∗ (t) ≥ 4/3. The inequalities can now be easily verified.
Theorem 5.5 follows from Theorem 5.1 and Lemma 5.4, using β = 2/5 and
γ = 6/5.
KWIKCLUSTER(G = (V, E + , E − ))
If V = ∅ then return ∅
Pick random pivot i ∈ V .
Set C = {i}, V = ∅.
For all j ∈ V, j = i:
If (i, j) ∈ E + then
Add j to C
Else (If (i, j) ∈ E − )
Add j to V
Return C ∪ KWIKCLUSTER(G ) .
4
A CORRELATION-CLUSTERING instance with no bad triplets induces a consistent clustering, just as a
tournament with no 3-cycles is acyclic. Our algorithms have an optimal cost of 0 on these instances.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:14 N. AILON ET AL.
the first one among them was chosen as pivot. Let pt denote the probability of At .
The analysis continues identically to that of KWIKSORT.
THEOREM 6.1. Algorithm KWIKCLUSTER is a randomized expected 3-approxi-
mation algorithm for CORRELATION-CLUSTERING.
Now let (V, w + , w − ) be a weighted CORRELATION-CLUSTERING instance, where
n
w , w − ∈ (R+ )(2) . Unlike weighted FAS-TOURNAMENT, we will only consider
+
weight systems that satisfy the probability constraints w ij+ + w ij− = 1. We create the
unweighted majority CORRELATION-CLUSTERING instance G w = (V, E w+ , E w− ) and
return the clustering generated by KWIKCLUSTER(G w ).
Triangle inequality constraints in weighted CORRELATION-CLUSTERING have the
following form: for all i, j, k, w ij+ +w jk+ +w ik− ≤ 2. (Equivalently, w ik− ≤ w ij− +w jk− .)
Theorem 6.2 is analogous to Theorem 4.3:
THEOREM 6.2. Algorithm KWIKCLUSTER on G w is a 5 (respectively, 2) approx-
imation for weighted
CORRELATION-CLUSTERING with probability constraints (respectively, with prob-
ability and triangle inequality constraints combined).
The proof is almost identical to that of Theorem 4.3, with “+ + −” (bad) triplets
in G w replacing the role of directed (bad) triangles in tournaments.
Solving CONSENSUS-CLUSTERING is equivalent to solving weighted CORRELATION-
CLUSTERING with w ij+ (respectively, w ij− ) as the fractional number of input clusters
with a (+) (respectively, (−)) relation between i and j. This weighted CORRELATION-
CLUSTERING instance obeys both the probability constraints and the triangle inequal-
ity constraints, but we can do better than the 2 approximation guaranteed by The-
orem 6.2. Analysis almost identical to the one in Section 5 gives an expected 11/7
approximation for this case. The KWIKCLUSTER is coupled with PICK-A-CLUSTER,
which is defined analogously to PICK-A-PERM: Simply return a cluster chosen uni-
formly at random from the list.
THEOREM 6.3. The best of KWIKCLUSTER on G w and PICK-A-CLUSTER has an
expected approximation ratio of at most 11
7
for CONSENSUS-CLUSTERING.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:15
LP-KWIKSORT(V, x) LP-KWIKCLUSTER(V, x + , x − )
A recursive algorithm for rounding A recursive algorithm for rounding
the LP for weighted FAS-TOURNAMENT. the LP for weighted CORRELATION-
Given an LP solution: CLUSTERING.
x = {xij }i, j∈V , returns an ordering Given an LP solution:
on the vertices. x + = {xij+ }i< j , x − = {xij− }i< j ,
returns a clustering of the vertices.
If V = ∅ then return empty-list
Pick random pivot i ∈ V. If V = ∅ then return ∅
Set VR = ∅, VL = ∅. Pick random pivot i ∈ V.
Set C = {i}, V = ∅.
For all j ∈ V, j = i:
With probability xji For all j ∈ V, j = i :
Add j to VL . With probability xij+
Else (W/ prob. xij = 1 − xji ) Add j to C.
Add j to VR . Else (W/ prob. xij− = 1 − xij+ )
Add j to V .
Return order
LP-KWIKSORT(VL , x), i, Return clustering
LP-KWIKSORT(VR , x) {C}∪LP-KWIKCLUSTER(V , x + , x − )
—2 when the weights satisfy the probability and the triangle inequality constraints,
and
—4/3 for RANK-AGGREGATION.
The result for RANK-AGGREGATION is obtained by returning the better of LP-
KWIKSORT and PICK-A-PERM.
THEOREM 7.2. Our clustering LP rounding algorithm LP-KWIKCLUSTERING
obtains the following approximation ratios on weighted CORRELATION-CLUSTERING
instances:
—5/2 when the weights satisfy the probability constraints,
—2 when the weights satisfy the probability and the triangle inequality constraints,
and
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:16 N. AILON ET AL.
Now we notice that for any t = {i, j, k} and t = {i, j, k } (two triplets sharing
a pair i, j), the events At ∧ (Bijt ∨ Bjit ) and At ∧ (Bijt ∨ Bjit ) are disjoint, because a
pair i, j can be split into two different recursion branches only once. Thus,
pt (qijt + qjit ) ≤ 1 .
t:i, j∈t
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:17
The above expression is exactly the probability that the pair i, j is dangerously
charged. Therefore, the total expected cost of LP-KWIKSORT is E[C LKS ] = B LKS +
F LKS , where
B LKS = pt y(t)
t
F LKS
= 1− pt (qij + qji ) cij∗ .
t t
It is easy to see that 0 ≤ FLP
PAP
≤ 2FLP (because z ij ≤ 2cij∗ , and t:i, j∈t pt (qijt +
qjit ) ≤ 1). Along with F LKS = FLP , this implies that 23 F LKS + 13 FLPPAP
≤ 43 FLP .
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:18 N. AILON ET AL.
For all i = j,
j + q{ j}i + q{ij} ) ≤ 1
t t t
pt (q{i}
t:{i, j}⊆t
t {i, j}⊆t
t ∗
FLP = 1− pt q{i} j + q{ j}i + q{ij} cij .
t t
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:19
LEMMA 7.5. If the weight system satisfies the probability constraints (respec-
tively, probability constraints and triangle inequality constraints), then for any
t ∈ T,
∗
y(t) ≤ τ t
q{i} j + q{ j}i + q{ ji} cij ,
t t
{i, j}⊆t
PAC
BLP = pt t
q{i} j + q{ j}i + q{ij} z ij
t t
t {i, j}⊆t
PAC
FLP = 1− t
pt (q{i} j + q{t j}i + q{tij} ) z ij ≥ 0.
i< j t:{i, j}⊆t
z ij = 2w ij+ w ij−
LEMMA 7.6. For all t = {i, j, k},
2 1 t 4 t
y(t) + q{i} j + q{t j}i + q{tij} z ij ≤ q{i} j + q{t j}i + q{tij} cij∗ .
3 3 {i, j}⊆t 3 {i, j}⊆t
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:20 N. AILON ET AL.
Let ⊆ denote the triangle inequality and probability constraints for ranking
polytope, that is
= {(a1 , a2 , a3 ) ∈ : 1 ≤ a1 + a2 + a3 ≤ 2} .
We define three functions, piv, pap, lp : R6 → R, as follows:
piv(x, w) = x1 x2 w 3 + (1 − x1 )(1 − x2 )(1 − w 3 )
+ x2 x3 w 1 + (1 − x2 )(1 − x3 )(1 − w 1 )
+ x3 x1 w 2 + (1 − x3 )(1 − x1 )(1 − w 2 );
(1) Linearity in w. The functions f and g are linear in w (for x fixed). Therefore, f
obtains its maximum on (x, w) for w which is some vertex of , and similarly g
obtains its maximum value on (x, w) for w, which is some vertex of . For f , it
suffices to check w = (0, 0, 0) and w = (0, 0, 1) (due to symmetry), and for g,
it suffices to check w = (0, 0, 1). Let f̃ (x) = f (x, 0, 0, 0), fˆ (x) = f (x, 0, 0, 1)
and g̃(x) = g(x, 0, 0, 1). It remains to show that f̃ , fˆ , g̃ : R3 → R are bounded
above by 0 on .
(2) Trilinearity in x. For i = 1, 2, 3, the functions f̃ , fˆ and g̃ are linear in xi when
x j ’s are fixed for j ∈ {1, 2, 3} \ {i}. This means that any point x ∈ such that
x + tei ∈ for all t ∈ [−ε, ε] for some ε > 0 and some i ∈ {1, 2, 3} (where ei
is a standard basis element of R3 ) is not a strict local maximum of f̃ , fˆ and g̃
in , so these points x can be ignored. The points that are left are x ∈ such
that that x1 + x2 + x3 = 1 or x1 + x2 + x3 = 2.
Let Hk ⊆ R3 denote the hyperplane x1 + x2 + x3 = k for k = 1, 2, and let
k = ∩ Hk . The closed polytopes k are two dimensional and the polynomials
f̃ , fˆ and g̃ are of total degree 3 and maximal degree 2 in each variable. It is tedious
yet elementary to verify that the maxima are obtained in accordance with Table II.
Lemma 7.4 is equivalent to proving that h = 2piv/3 + pap/3 − 4lp/3 ≤ 0 for
all (x, w) ∈ × . The trilinearity in x still holds true for h, so as before we can
assume that either x ∈ 1 or x ∈ 2 . We can assume without loss of generality
(by symmetry) that x ∈ 2 , that is, x1 + x2 + x3 = 2. When x is fixed, then h
is a (possibly degenerate) concave paraboloid in w. In case of nondegeneracy, its
unique global maximum is obtained when ∇w h = 0, which can be easily verified
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:21
function\domain 1 2
f̃ 0 at (1/2, 0, 1/2) 0 at (1, 0, 1)
fˆ 0 at (0, 0, 1) 0 at (1, 0, 1)
g̃ 0 at (0, 0, 1) 0 at (1, 0, 1)
to be solved by w = w∗ = (w 1∗ , w 2∗ , w 3∗ ) defined by
x2 x3
w 1∗ = + 2x1 − 1
x2 x3 + (1 − x2 )(1 − x3 )
x3 x1
w 2∗ = + 2x2 − 1 (9)
x3 x1 + (1 − x3 )(1 − x1 )
x1 x2
w 3∗ = + 2x3 − 1
x1 x2 + (1 − x1 )(1 − x2 )
(the paraboloid in w is degenerate if and only if any of the denominators in (9) are
0, equivalently xi = 0 and x j = 1 for some i, j. But this implies that after possibly
permuting coordinates, x = (1, 1, 0). But h(1, 1, 0, w) = −2w 32 /3 ≤ 0, proving
the desired assertion trivially). Since we are assuming x1 + x2 + x3 = 2, we have
that for any 1 ≤ i < j ≤ 3, xi + x j ≥ 1, equivalently xi x j ≥ (1 − xi )(1 − x j ).
Therefore (9) implies w i ≥ 12 + 2xi − 1 for i = 1, 2, 3. Summing up, we obtain
w 1 + w 2 + w 3 ≥ − 32 + 2(x1 + x2 + x3 ) = 52 > 2. In other words, (9) implies that w∗
and are strictly on different sides of H2 . Let w = (w 1 , w 2 , w 3 ) be any point in .
Consider the straight line passing through w and w∗ , and let w the intersection of
this line with H2 . Restricted to (and for our fixed x ∈ 2 ) h is a parabola, attaining
its maximum on w∗ . Therefore h(x, w ) ≥ h(x, w ), and we can assume in what
follows that w = w ∈ H2 (we must drop the assumption that w ∈ though).
We change variables and let h̃ : R4 → R be defined by h̃(x1 , x2 , w 1 , w 2 ) =
h(x1 , x2 , 2 − x1 − x2 , w 1 , w 2 , 2 − w 1 − w 2 ). We reduced the problem to proving
that h̃ ≤ 0 on {x1 ≤ 1, x2 ≤ 1, x1 + x2 ≥ 1} × R2 . It is elementary to verify, using
vanishing derivatives, that for (x1 , x2 ) fixed, the maximum of h̃ is obtained when
(w 1 , w 2 ) = (x1 , x2 ). Substituting, we get h̃(x1 , x2 , x1 , x2 ) = −2(−1 + x1 )(−1 +
x2 )(−1 + x2 + x3 ), which is less than or equal to 0 because x1 + x2 ≥ 1 and
x1 , x2 ≤ 1.
8.2. POLYHEDRAL INEQUALITIES FOR CLUSTERING. Let ⊆ R3 denote the prob-
ability constraints polytope as defined in (7). Let ⊆ denote the triangle
inequality and probability constraints polytope for clustering, that is,
= {(a1 , a2 , a3 ) ∈ : a3 ≤ a1 + a2 , a1 ≤ a2 + a3 , a2 ≤ a3 + a1 } .
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:22 N. AILON ET AL.
TABLE III. MAXIMUM OF f 000 , f 001 , f 011 , f 111 , g000 , g011 , g111 ON 1, 2, 3
function \ domain 1 2 3
f 000 0 at (0, 0, 0) 0 at (0, 0, 0) 0 at (0, 0, 0)
f 001 0 at (1/2, 0, 1/2) −1 at (0, 1/2, 1/2) 0 at (1/2, 1/2, 1)
f 011 −3/2 at (1/2, 0, 1/2) 0 at (0, 1, 1) 0 at (0, 1, 1)
f 111 0 at (1, 0, 1) 0 at (1, 1, 0) 0 at (1, 0, 1)
g000 0 at (0, 0, 0) 0 at (0, 0, 0) 0 at (0, 0, 0)
g011 −1 at (1, 0, 1) 0 at (0, 1, 1) 0 at (0, 1, 1)
g111 0 at (1, 0, 1) 0 at (1, 1, 0) 0 at (1, 0, 1)
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:23
constraint 1 2 3
00**** 0:000000 0:000000 0:000000
11**** 0:11011 12 0:11011 12 infeasible
**01* − 23 :101011 0:011011 0:011011
0**0** 0:000000 0:000000 0:000000
1**0** − 23 :110011 − 23 :110011 − 23 :101011
0***0* 0:000000 0:000000 0:000000
1***0* 0:101101 − 23 :110101 0:101101
x x0ww* 0:000000 0:000000 0:000000
x x*ww0 0:000000 0:000000 0:000000
01**** infeasible 0:011 12 11 0:011 12 11
***00* 0:000000 0:000000 0:000000
***11* 0:101111 0:11011 12 0:101111
0**1** − 54 :0001 14 34 0:011111 0:011111
1**1** 0:1 59 5 59 5
1
64 64 64 64
0:11011 12 0:1011 12 1
0***1* − 53 :000 12 1 12 0:011 12 11 0:011 12 11
1***1* 0:11011 12 0:11011 12 0:101111
x x1ww* infeasible infeasible 0: 12 21 1 12 21 1
x x*ww1 0:110111 0:110111 0: 12 21 1 12 21 1
The constraint 0**0*1 means, as an example, x1 = 0, w1 = 0, w3 = 1.
A constraint of the form x x0ww* means x1 = x2 , x3 = 0, w 1 = w 2 . The
maxima are denoted by M:x1 x2 x3 w 1 w 2 w 3 , where M is the maximum value,
attained at (x1 , x2 , x3 , w 1 , w 2 , w 3 ).
Boundary Cases. We can now assume that either: (1) at least two of x1 , x2 , x3 ,w 1 ,
w 2 , w 3 are in {0, 1}, or, (2) x1 = x2 , w 1 = w 2 and at least one of x3 , w 3 are in
{0, 1}. In addition, the function h is trilinear in x, so we may assume (as above) that
x ∈ 1 ∪ 2 ∪ 3 . This reduces the problem to proving inequalities for polynomials
of total degree at most 4 and maximal degree at most 3 (respectively, 2) in each
x-variable (respectively, w-variable), in 3-dimensional polytopes. We summarize
the analysis in Table IV.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:24 N. AILON ET AL.
The variance makes this approach fail. By blowing up G and using a concentration
property of the random variable counting the number of backward edges in A R ,
we can use this construction (see similar random digraph constructions in Newman
[2000] and Newman and Vempala [2001]).
We pick an integer k = poly(n) (chosen later). The blow-up digraph G k =
(V k , Ak ) is defined as follows:
Vk = {v 1 , . . . , v k }
v∈V
Ak = {(u i , v j )|(u, v) ∈ A, i, j ∈ {1, . . . , k}} .
We observe that the minimum feedback arc set of G k is exactly k 2 times the
minimum feedback arc set of G. Indeed, it suffices to consider only rankings π
on V k that rank the vertices v 1 , . . . , v k as one block for all v ∈ V (as explained
in Alon [2006], if v i <π v j are not adjacent in the ranking, then either moving v i
immediately to the left of v j or moving v j immediately to the right of v i will result
in a ranking inducing no more feedback edges than π).
Now we turn G k into a tournament T k = {V k , Ak ∪ AkR } using the construction
defined above. For a ranking π of V k , let f R (π ) denote the number of feedback
edges in AkR with respect to π. Denote by μ the expected value of f R (π), which is the
same for all π, and can be efficiently computed. We claim that for k = poly(n), √ with
probability at least 2/3, all rankings π satisfy | f R (π) − μ| = O((nk)3/2 log(nk)).
This would imply, using the above observation, that, for big enough k = poly(n),
the size of the minimum feedback arc set of T k can be used to √ efficiently recover
the size of the minimum feedback arc set of G, because (nk)3/2 log(nk) = o(k 2 ).
To prove the claim, for any fixed ranking π, set a random indicator variable X wπ z
for every nonedge {w, z} of G k that equals k
1 iffπ the edge between w and z in A R is
backward with respect to π. So f R (π ) = X w z . A simple application of Chernoff
bounds [Alon and Spencer 1992] and union bound (over all possible (nk)! rankings)
completes the proof of the claim. It follows that unless FAS-DIGRAPH ∈ B P P, we
cannot solve FAS-TOURNAMENT in polynomial time.
We wish to thank Noga Alon for ideas significantly simplifying the proof [Alon
2006]. Our initial hardness result was via max-SNP hardness of FAS-DIGRAPH, and
Noga Alon pointed out that the same idea also works with the weaker NP-hardness.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:25
problem of learning how to rank has been revisited in the context of reduction to
binary preference learning. We refer the reader to a recent paper by Ailon and Mohri
[2008], which is inspired by this work and improves a result by Balcan et al. [2007]
(inspired by Coppersmith et al. [2006]).
On the clustering side, Ailon and Charikar [2005] extended results here to hier-
archical clulstering, a problem well studied in phylogeny. They generalize KWIK-
CLUSTER to that setting and obtain constant factor approximation guarantees.
REFERENCES
AILON, N. 2008. Aggregation of partial rankings, p-ratings and top-m lists. Algorithmica. DOI
10.1007/s00453-008-9211-1.
AILON, N., AND CHARIKAR, M. 2005. Fitting tree metrics: Hierarchical clustering and phylogeny. In Pro-
ceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Pittsburgh,
PA). IEEE Computer Society Press, Los Alamitos, CA, 73–82.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
23:26 N. AILON ET AL.
AILON, N., CHARIKAR, M., AND NEWMAN, A. 2005. Aggregating inconsistent information: Ranking and
clustering. In Proceedings of the 37th Annual Symposium on the Theory of Computing (STOC) (Boston,
MA). ACM, New York, 684–693.
AILON, N., AND MOHRI, M. 2008. An efficient reduction of ranking to classification. In Conference on
Learning Theory (COLT) (Helsinki, Finland).
ALON, N. 2006. Ranking tournaments. SIAM J. Disc. Math. 20, 1, 137–142.
ALON, N., AND SPENCER, J. H. 1992. The Probabilistic Method. Wiley, New York.
ARORA, S., FRIEZE, A., AND KAPLAN, H. 1996. A new rounding procedure for the assignment problem
with applications to dense graph arrangement problems. In Proceedings of the 37th Annual Symposium
on the Foundations of Computer Science (FOCS) (Burlington, VT). IEEE Computer Society Press, Los
Alamitos, CA, 24–33.
BALCAN, M.-F., BANSAL, N., BEYGELZIMER, A., COPPERSMITH, D., LANGFORD, J., AND SORKIN, G. B. 2007.
Robust reductions from ranking to classification. In Proceedings of the Conference on Learning Theory
(COLT). Lecture Notes in Computer Science, vol. 4539. Springer-Verlag, New York, 604–619.
BANG-JENSEN, J., AND THOMASSEN, C. 1992. A polynomial algorithm for the 2-path problem in semi-
complete graphs. SIAM J. Disc. Math. 5, 3, 366–376.
BANSAL, N., BLUM, A., AND CHAWLA, S. 2004. Correlation clustering. Mach. Learn. J. (Special Issue on
Theoretical Advances in Data Clustering) 56, 1–3, 89–113. (Extended abstract appeared in FOCS 2002,
pages 238–247.)
BARTHOLDI, J., TOVEY, C. A., AND TRICK, M. 1989. Voting schemes for which it can be difficult to tell
who won the election. Social Choice Welf. 6, 2, 157–165.
BORDA, J. C. 1781. Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des Sciences.
CAI, M.-C., DENG, X., AND ZANG, W. 2001. An approximation algorithm for feedback vertex sets in
tournaments. SIAM J. Comput. 30, 6, 1993–2007.
CHARIKAR, M., GURUSWAMI, V., AND WIRTH, A. 2003. Clustering with qualitative information. In Pro-
ceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Boston,
MA). IEEE Computer Society Press, Los Alamitos, CA, 524–533.
CHAUDHURI, K., CHEN, K., MIHAESCU, R., AND RAO, S. 2006. On the tandem duplication-random loss
model of genome rearrangement. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete
Algorithm (SODA’06). ACM, New York, 564–570.
CONDORCET, M.-J. 1785. Éssai sur l’application de l’analyse à la probabilité des décisions rendues à la
pluralité des voix.
COPPERSMITH, D., FLEISCHER, L., AND RUDRA, A. 2006. Ordering by weighted number of wins gives a
good ranking for weighted tournaments. In Proceedings of the 17th Annual ACM-SIAM Symposium on
Discrete Algorithm (SODA’06). ACM, New York, 776–782.
DIACONIS, P., AND GRAHAM, R. 1977. Spearman’s footrule as a measure of disarray. J. Roy. Stat. Soc.,
Ser. B 39, 2, 262–268.
DINUR, I., AND SAFRA, S. 2002. On the importance of being biased. In Proceedings of the 34th Annual
Symposium on the Theory of Compututing (STOC). ACM, New York, 33–42.
DWORK, C., KUMAR, R., NAOR, M., AND SIVAKUMAR, D. 2001a. Rank aggregation methods for the web.
In Proceedings of the 10th International Conference on the World Wide Web (WWW10) (Hong Kong,
China), 613–622.
DWORK, C., KUMAR, R., NAOR, M., AND SIVAKUMAR, D. 2001b. Rank aggregation revisited. Manuscript.
(Available from: [Link]
EVEN, G., NAOR, J. S., SUDAN, M., AND SCHIEBER, B. 1998. Approximating minimum feedback sets and
multicuts in directed graphs. Algorithmica 20, 2, 151–174.
FAGIN, R., KUMAR, R., AND SIVAKUMAR, D. 2003. Efficient similarity search and classification via rank
aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of
Data (San Diego, CA). ACM, New York, 301–312.
FILKOV, V., AND SKIENA, S. 2003. Integrating microarray data by consensus clustering. In Proceedings of
International Conference on Tools with Artificial Intelligence (ICTAI) (Sacramento, CA). 418–425.
FRIEZE, A., AND KANNAN, R. 1999. Quick approximations to matrices and applications. Combinator-
ica 19, 2, 175–220.
GIONIS, A., MANNILA, H., AND TSAPARAS, P. 2005. Clustering aggregation. In Proceedings of the 21st
International Conference on Data Engineering (ICDE) (Tokyo, Japan).
HÄSTAD, J. 2001. Some optimal inapproximability results. J. ACM 48, 798–859.
KARP, R. M. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computa-
tions. Plenum Press, New York, 85–104.
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.
Aggregating Inconsistent Information: Ranking and Clustering 23:27
Journal of the ACM, Vol. 55, No. 5, Article 23, Publication date: October 2008.