Method For Node Ranking in A Linked Database

The patent US 6,285,999 B1 describes a method for ranking nodes in a linked database, such as the World Wide Web, based on the importance of documents that cite them. The ranking is determined not just by the number of citations but also by the significance of the citing documents, using an iterative procedure to calculate ranks. This method aims to improve search engine results by enhancing the selectivity and relevance of the documents returned in searches.

Uploaded by

habibs.cmsafrica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views12 pages

Method For Node Ranking in A Linked Database

Uploaded by

habibs.cmsafrica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

US006285999B1

(12) United States Patent (10) Patent No.: US 6,285,999 B1

Page (45) Date of Patent: Sep. 4, 2001

(54) METHOD FOR NODE RANKING INA Craig Boyle “To link or not to link: An empirical comparison
LINKED DATABASE of Hypertext linking strategies”. ACM 1992, pp. 221–231.*
L. Katz, “A new status index derived from sociometric
(75) Inventor; Lawrence Page, Stanford, CA (US) analysis,” 1953, Psychometricka, vol. 18, pp. 39–43.
(73) Assignee: The Board of Trustees of the Leland C.H. Hubbell, “An input-output approach to clique identi
Stanford Junior University, Stanford, fication sociometry,” 1965, pp. 377–399.
CA (US) Mizruchi et al., “Techniques for disaggregating centrality
scores in social networks,” 1996, Sociological Methodology,
(*) Notice: Subject to any disclaimer, the term of this pp. 26–48.
patent is extended or adjusted under 35 E. Garfield, “Citation analysis as a tool in journal evalua
U.S.C. 154(b) by 0 days. tion,” 1972, Science, vol. 178, pp. 471–479.
Pinski et al., “Citation influence for journal aggregates of
(21) Appl. No.: 09/004,827 scientific publications: Theory, with application to the lit
(22) Filed: Jan. 9, 1998 erature of physics,” 1976, Inf. Proc. And Management, vol.
12, pp. 297–312.
Related U.S. Application Data N. Geller, “On the citation influence methodology of Pinski
(60) Provisional
1997.
application No. 60/035,205, filed on Jan. 10, and Narin,” 1978, Inf. Proc. And Management, vol. 14, pp.
93–95.
(51) Int. Cl." … G06F 17/30 P. Doreian, “Measuring the relative standing of disciplinary
(52) U.S. Cl. .................................... 707/5; 707/7, 707/501 journals,” 1988, Inf. Proc. And Management, vol. 24, pp.
(58) Field of Search .................................... 707/100, 5, 7, 45–56.
707/513, 1–3, 10, 104, 501; 345/440; 38.2/226,
229, 230, 231 (List continued on next page.)
(56) References Cited Primary Examiner—Thomas Black
U.S. PATENT DOCUMENTS Assistant Examiner—Uyen Le
(74) Attorney, Agent, or Firm—Harrity & Snyder L.L.P.
4,953,106 + 8/1990 Gansner et al. ..................... 345/440 (57) ABSTRACT
5,450,535 + 9/1995 North ................ ... 395/140
5,748,954 5/1998 Mauldin .... -- ... 395/610
5,752,241 * 5/1998 Cohen ........... ... 707/3 A method assigns importance ranks to nodes in a linked
5,832,494 * 11/1998 Egger et al. ...... 707/102 database, such as any database of documents containing
5,848,407 * 12/1998 Ishikawa et al. ........................ 707/2 citations, the world wide web or any other hypermedia
6,014,678 * 1/2000 Inoue et al. .......................... 707/501 database. The rank assigned to a document is calculated
OTHER PUBLICATIONS
from the ranks of documents citing it. In addition, the rank
of a document is calculated from a constant representing the
S. Jeromy Carriere et al., “Web Query: Searching and Visu probability that a browser through the database will ran
alizing the Web through Connectivity”, Computer Networks domly jump to the document. The method is particularly
and ISDN Systems 29 (1997), pp. 1257–1267.* useful in enhancing the performance of search engine results
Wang et al “Prefetching in Worl Wide Web”, IEEE 1996, pp. for hypermedia databases, such as the world wide web,
28–32.* whose documents have a large variation in quality.
Ramer et al. “Similarity, Probability and Database Organi
sation: Extended Abstract”, 1996, pp. 272.276.” 29 Claims, 3 Drawing Sheets
US 6,285,999 B1
Page 2

OTHER PUBLICATIONS Carriere et al., “WebOuery: Searching and visualizing the

P. Doreian, “A measure of standing for citation networks web through connectivity,” 1997, Proc. 6” International
within a wider environment,” 1994, Inf. Proc. And Manage World Wide Web Conference, pp. 1–14.
ment, vol. 30, pp. 21–31. Arocena et al., “Applications of a web query language,”
Botafogo et al., “Structural analysis of hypertext: Identifying 1997, Computer Networks and ISDN Systems, vol. 29, No.
hierarchies and useful metrics,” 1992, ACM Trans. Inc. 8–13, pp. 1305–1316.
Systems, vol. 10, pp. 142–180.
Mark E. Frisse, “Searching for information in a hypertext Jon M. Kleinberg, “Authoritative sources in a hyperlinked
medical handbook,” 1988, Communications of the ACM, environment,” 1998, Proc. Of the 9” Annual ACM-SIAM
vol. 31, No. 7, pp. 880–886. Symposium on Discrete Algorithms, pp. 668–677.
Massimo Marchiori, “The quest for correct information on Henzinger et al., “Measuring index quality using random
the Web: Hyper search engines,” 1997, Computer Networks
and ISDN Systems, vol. 29, No. 8–13, pp. 1225–1235. walks on the web”, 1999, Proc. of the 8” International World
Oliver A. McBryan, “GENVL and WWWW: Tools for Wide Web Conference, pp. 213–225.
taming the web,” 1994, Proceedings of the first International
Wold Wide Web Conference, pp. 1–13. * cited by examiner
U.S. Patent Sep. 4, 2001 Sheet 1 of 3 US 6,285,999 B1

tº . . .) A DS

FIG. 1
U.S. Patent Sep. 4, 2001 Sheet 2 of 3 US 6,285,999 B1

FIG. 2
U.S. Patent Sep. 4, 2001 Sheet 3 of 3 US 6,285,999 B1

START
101

SELECT AN INITIALN-DIMENSIONAL VECTOR p,

103

COMPUTE AN APPROXIMATION p, TO A
STEADY-STATE PROBABILITY p. IN
ACCORDANCE WITH THE EQUATION p, - Ap,

105

DETERMINE A RANK r(k) FOR NODE k FROM A

kh comPONENT OF p.

DONE

FIG. 3
US 6,285,999 B1
1 2
METHOD FOR NODE RANKING IN A Hyperlink Search Engine, developed by IDD Information
LINKED DATABASE Services, ([Link] uses backlink informa
tion (i.e., information from pages that contain links to the
CROSS-REFERENCES TO RELATED current page) to assist in identifying relevant web docu
APPLICATIONS ments. Rather than using the content of a document to
This application claims priority from U.S. provisional determine relevance, the technique uses the anchor text of
links to the document to characterize the relevance of a
patent application Ser. No. 60/035,205 filed Jan. 10, 1997, document. The idea of associating anchor text with the page
which is incorporated herein by reference. the text points to was first implemented in the World Wide
STATEMENT REGARDING GOVERNMENT 10 Web Worm (Oliver A. McBryan, GENVL and WWWW.
SUPPORT Tools for Taming the Web, First International Conference on
the World Wide Web, CERN, Geneva, May 25–27, 1994).
This invention was supported in part by the National The Hyperlink Search Engine has applied this idea to assist
Science Foundation grant number IRI-9411306-4. The Gov in determining document relevance in a search. In particular,
ernment has certain rights in the invention. 15 search query terms are compared to a collection of anchor
text descriptions that point to the page, rather than to a
FIELD OF THE INVENTION keyword index of the page content. A rank is then assigned
This invention relates generally to techniques for analyz to a document based on the degree to which the search terms
ing linked databases. More particularly, it relates to methods match the anchor descriptions in its backlink documents.
for assigning ranks to nodes in a linked database, such as any 20 The well known idea of citation counting is a simple
database of documents containing citations, the world wide method for determining the importance of a document by
web or any other hypermedia database. counting its number of citations, or backlinks. The citation
rank r(A) of a document which has n backlink pages is
BACKGROUND OF THE INVENTION simply
25
Due to the developments in computer technology and its
increase in popularity, large numbers of people have recently In the case of databases whose content is of relatively
started to frequently search huge databases. For example, uniform quality and importance it is valid to assume that a
internet search engines are frequently used to search the highly cited document should be of greater interest than a
entire world wide web. Currently, a popular search engine 30
document with only one or two citations. Many databases,
might execute over 30 million searches per day of the however, have extreme variations in the quality and impor
indexable part of the web, which has a size in excess of 500 tance of documents. In these cases, citation ranking is overly
Gigabytes. Information retrieval systems are traditionally simplistic. For example, citation ranking will give the same
judged by their precision and recall. What is often neglected, rank to a document that is cited once on an obscure page as
to a similar document that is cited once on a well-known and
however, is the quality of the results produced by these 35
highly respected page.
search engines. Large databases of documents such as the
web contain many low quality documents. As a result, SUMMARY
searches typically return hundreds of irrelevant or unwanted
documents which camouflage the few relevant ones. In order Various aspects of the present invention provide systems
to improve the selectivity of the results, common techniques 40 and methods for ranking documents in a linked database.
allow the user to constrain the scope of the search to a One aspect provides an objective ranking based on the
specified subset of the database, or to provide additional relationship between documents. Another aspect of the
search terms. These techniques are most effective in cases invention is directed to a technique for ranking documents
where the database is homogeneous and already classified within a database whose content has a large variation in
into subsets, or in cases where the user is searching for well 45 quality and importance. Another aspect of the present inven
known and specific information. In other cases, however, tion is to provide a document ranking method that is scalable
these techniques are often not effective because each con and can be applied to extremely large databases such as the
straint introduced by the user increases the chances that the world wide web. Additional aspects of the invention will
desired information will be inadvertently eliminated from become apparent in view of the following description and
the search results. 50 associated figures.
Search engines presently use various techniques that One aspect of the present invention is directed to taking
attempt to present more relevant documents. Typically, advantage of the linked structure of a database to assign a
documents are ranked according to variations of a standard rank to each document in the database, where the document
vector space model. These variations could include (a) how rank is a measure of the importance of a document. Rather
recently the document was updated, and/or (b) how close the 55 than determining relevance only from the intrinsic content of
search terms are to the beginning of the document. Although a document, or from the anchor text of backlinks to the
this strategy provides search results that are better than with document, a method consistent with the invention deter
no ranking at all, the results still have relatively low quality. mines importance from the extrinsic relationships between
Moreover, when searching the highly competitive web, this documents. Intuitively, a document should be important
measure of relevancy is vulnerable to “spamming” tech 60 (regardless of its content) if it is highly cited by other
niques that authors can use to artificially inflate their docu documents. Not all citations, however, are necessarily of
ment’s relevance in order to draw attention to it or its equal significance. A citation from an important document is
advertisements. For this reason search results often contain more important than a citation from a relatively unimportant
commercial appeals that should not be considered a match to document. Thus, the importance of a page, and hence the
the query. Although search engines are designed to avoid 65 rank assigned to it, should depend not just on the number of
such ruses, poorly conceived mechanisms can result in citations it has, but on the importance of the citing docu
disappointing failures to retrieve desired information. ments as well. This implies a recursive definition of rank: the
US 6,285,999 B1
3 4
rank of a document is a function of the ranks of the ments B and C are pointers to document A. In this case we
documents which cite it. The ranks of documents may be say that B and C are backlinks of A, and that A is a forward
calculated by an iterative procedure on a linked database. link of B and of C. Documents B and C also have other
Because citations, or links, are ways of directing forward links to documents that are not shown.
attention, the important documents correspond to those Although the ranking method of the present invention is
documents to which the most attention is directed. Thus, a superficially similar to the well known idea of citation
high rank indicates that a document is considered valuable counting, the present method is more subtle and complex
by many people or by important people. Most likely, these than citation counting and gives far superior results. In a
are the pages to which someone performing a search would simple citation ranking, the rank of a document A which has
like to direct his or her attention. Looked at another way, the 10 n backlink pages is simply
importance of a page is directly related to the steady-state
probability that a random web surfer ends up at the page According to one embodiment of the present method of
after following a large number of links. Because there is a ranking, the backlinks from different pages are weighted
larger probability that a surfer will end up at an important differently and the number of links on each page is normal
page than at an unimportant page, this method of ranking 15 ized. More precisely, the rank of a page A is defined
pages assigns higher ranks to the more important pages. according to the present invention as
In one aspect of the invention, a computer implemented
method is provided for scoring linked database documents.
The method comprises the steps of: 20
obtaining a plurality of documents, at least some of the
documents being linked documents, at least some of the where B1, . . . , B, are the backlink pages of A, r(B1), . . . ,
documents being linking documents, and at least some r(B) are their ranks, [B], . . . , |B, are their numbers of
of the documents being both linked documents and forward links, and O. is a constant in the interval [0,1], and
linking documents, each of the linked documents being 25 N is the total number of pages in the web. This definition is
pointed to by a link in one or more of the linking clearly more complicated and subtle than the simple citation
documents; assigning a score to each of the linked rank. Like the citation rank, this definition yields a page rank
documents based on scores of the one or more linking that increases as the number of backlinks increases. But the
documents; and processing the linked documents present method considers a citation from a highly ranked
according to their scores. 30 backlink as more important than a citation from a lowly
Additional aspects, applications and advantages will ranked backlink (provided both citations come from back
become apparent in view of the following description and link documents that have an equal number of forward links).
associated figures. In the present invention, it is possible, therefore, for a
BRIEF DESCRIPTION OF THE DRAWINGS
document with only one backlink (from a very highly ranked
35 page) to have a higher rank than another document with
FIG. 1 is a diagram of the relationship between three many backlinks (from very low ranked pages). This is not
linked hypertext documents according to the invention. the case with simple citation ranking.
FIG. 2 is a diagram of a three-document web illustrating The ranks form a probability distribution over web pages,
the rank associated with each document in accordance with so that the sum of ranks over all web pages is unity. The rank
the present invention. 40 of a page can be interpreted as the probability that a surfer
will be at the page after following a large number of forward
FIG. 3 is a flowchart of one embodiment of the invention. links. The constant Cº. in the formula is interpreted as the
DETAILED DESCRIPTION
probability that the web surfer will jump randomly to any
web page instead of following a forward link. The page
Although the following detailed description contains 45 ranks for all the pages can be calculated using a simple
many specifics for the purposes of illustration, anyone of iterative algorithm, and corresponds to the principal eigen
ordinary skill in the art will appreciate that many variations vector of the normalized link matrix of the web, as will be
and alterations to the following details are within the scope discussed in more detail below.
of the invention. Accordingly, the following embodiments of In order to illustrate the present method of ranking,
the invention are set forth without any loss of generality to, 50 consider the simple web of three documents shown in FIG.
and without imposing limitations upon, the claimed inven 2. For simplicity of illustration, we assume in this example
tion. For support in reducing the present invention to that r=0. Document A has a single backlink to document C,
practice, the inventor acknowledges Sergey Brin, Scott and this is the only forward link of document C, so
Hassan, Rajeev Motwani, Alan Steremberg, and Terry Wino r(A)=r(C).
grad. 55 Document B has a single backlink to document A, but this
A linked database (i.e. any database of documents con is one of two forward links of document A, so
taining mutual citations, such as the world wide web or other r(B)=r(A)/2.
hypermedia archive, a dictionary or thesaurus, and a data Document C has two backlinks. One backlink is to
base of academic articles, patents, or court cases) can be document B, and this is the only forward link of document
represented as a directed graph of N nodes, where each node 60 B. The other backlink is to document A via the other of the
corresponds to a web page document and where the directed two forward links from A. Thus
connections between nodes correspond to links from one r(C)=r(B)+r(A)/2.
document to another. A given node has a set of forward links In this simple illustrative case we can see by inspection
that connect it to children nodes, and a set of backward links that r(A)=0.4, r(B)=0.2, and r(C)=0.4. Although a typical
that connect it to parent nodes. FIG. 1 shows a typical 65 value for O is ~0.1, if for simplicity we set os-0.5 (which
relationship between three hypertext documents A, B, and C. corresponds to a 50% chance that a surfer will randomly
As shown in this particular figure, the first links in docu jump to one of the three pages rather than following a
US 6,285,999 B1
5 6
forward link), then the mathematical relationships between present invention. Because the rank of various different
the ranks become more complicated. In particular, we then documents can vary by orders of magnitude, it is convenient
have to define a logarithmic rank

The solution in this case is r(A)=1%9, r(B)=1%9, and ke [1, N]

r(C)=1%9.
In practice, there are millions of documents and it is not which assigns a rank of 0 to the lowest ranked node and
possible to find the solution to a million equations by 10
increases by 1 for each order of magnitude in importance
inspection. Accordingly, in the preferred embodiment a higher than the lowest ranked node.
simple iterative procedure is used. As the initial state we “FIG. 3 shows one embodiment of a computer imple
may simply set all the ranks equal to 1/N. The formulas are mented method for calculating an importance rank for N
then used to calculate a new set of ranks based on the linked nodes of a linked database. At a step 101, an initial
existing ranks. In the case of millions of documents, suffi 15 N-dimensional vector po is selected. An approximation p, to
cient convergence typically takes on the order of 100 itera a steady-state probability p. in accordance with the equation
tions. It is not always necessary or even desirable, however, p, HA"po is computed at a step 103. Matrix A can be an NXN
to calculate the rank of every page with high precision. Even transition probability matrix having elements A?ilj] repre
approximate rank values, using two or more iterations, can senting a probability of moving from node i to node j. At a
provide very valuable, or even superior, information. 20 step 105, a rank ri?k] for node k from a k” component of p,
The iteration process can be understood as a steady-state is determined.”.
probability distribution calculated from a model of a random In one particular embodiment, a finite number of itera
surfer. This model is mathematically equivalent to the expla tions are performed to approximate poo. The initial distri
nation described above, but provides a more direct and bution can be selected to be uniform or non-uniform. A
concise characterization of the procedure. The model 25 uniform distribution would set each component of po equal
includes (a) an initial N-dimensional probability distribution to 1/N. A non-uniform distribution, for example, can divide
vector po where each component po?i] gives the initial the initial probability among a few nodes which are known
probability that a random surfer will start at a node i, and (b) a priori to have relatively large importance. This non
an NXN transition probability matrix A where each compo uniform distribution decreases the number of iterations
ment A[i][j] gives the probability that the surfer will move 30 required to obtain a close approximation to poo and also is
from node i to node j. The probability distribution of the one way to reduce the effect of artificially inflating relevance
graph after the surfer follows one link is pi-Apo, and after by adding unrelated terms.
two links the probability distribution is ps=Ap-A*p". In another particular embodiment, the transition matrix A
Assuming this iteration converges, it will converge to a is given by
steady-state probability 35

ps = lim A^po,

where 1 is an NXN matrix consisting of all 1s, o is the

which is a dominant eigenvector of A. The iteration circu 40
probability that a surfer will jump randomly to any one
lates the probability through the linked nodes like energy of the N nodes, and B is a matrix whose elements
flows through a circuit and accumulates in important places. B[i][j] are given by
Because pages with no links occur in significant numbers
and bleed off energy, they cause some complication with 1
computing the ranking. This complication is caused by the 45 - — if node i points to node j
B[i][j] = { ni
fact they can add huge amounts to the “random jump” factor. 0 otherwise,
This, in turn, causes loops in the graph to be highly empha
sized which is not generally a desirable property of the
model. In order to address this problem, these childless where n, is the total number of forward links from node
pages can simply be removed from the model during the 50 i. The (1–0) factor acts as a damping factor that
iterative stages, and added back in after the iteration is limits the extent to which a document’s rank can be
complete. After the childless pages are added back in, inherited by children documents. This models the
however, the same number of iterations that was required to fact that users typically jump to a different place in
remove them should be done to make sure they all receive the web after following a few links. The value of O.
a value. (Note that in order to ensure convergence, the norm 55 is typically around 15%. Including this damping is
of p, must be made equal to 1 after each iteration.) An important when many iterations are used to calculate
alternate method to control the contribution of the childless the rank so that there is no artificial concentration of
nodes is to only estimate the steady state by iterating a small rank importance within loops of the web.
number of times. Alternatively, one may set Cº-0 and only iterate a few
The rank r?i] of a node i can then be defined as a function 60 times in the calculation.
of this steady-state probability distribution. For example, the Consistent with the present invention, there are several
rank can be defined simply by ri]=po?i]. This method of ways that this method can be adapted or altered for various
calculating rank is mathematically equivalent to the iterative purposes. As already mentioned above, rather than including
method described first. Those skilled in the art will appre the random linking probability O. equally among all nodes,
ciate that this same method can be characterized in various 65 it can be divided in various ways among all the sites by
different ways that are mathematically equivalent. Such changing the 1 matrix to another matrix. For example, it
characterizations are obviously within the scope of the could be distributed so that a random jump takes the surfer
US 6,285,999 B1
7 8
to one of a few nodes that have a high importance, and will This can allow this ranking model to fill holes in the usage
not take the surfer to any of the other nodes. This can be very data, and provide a more accurate or comprehensive picture.
effective in preventing deceptively tagged documents from Thus, although this method of ranking does not necessar
receiving artificially inflated relevance. Alternatively, the ily match the actual traffic, it nevertheless measures the
random linking probability could be distributed so that degree of exposure a document has throughout the web.
random jumps do not happen from high importance nodes, Another important application and embodiment of the
and only happen from other nodes. This distribution would present invention is directed to enhancing the quality of
model a surfer who is more likely to make random jumps results from web search engines. In this application of the
from unimportant sites and follow forward links from present invention, a ranking method according to the inven
important sites. A modification to avoid drawing unwar 10 tion is integrated into a web search engine to produce results
ranted attention to pages with artificially inflated relevance far superior to existing methods in quality and performance.
A search engine employing a ranking method of the present
is to ignore local links between documents and only consider invention provides automation while producing results com
links between separate domains. Because the links from parable to a human maintained categorized system. In this
other sites to the document are not directly under the control approach, a web crawler explores the web and creates an
of a typical web site designer, it is then difficult for the 15
index of the web content, as well as a directed graph of
designer to artificially inflate the ranking. A simpler nodes corresponding to the structure of hyperlinks. The
approach is to weight links from pages contained on the nodes of the graph (i.e. pages of the web) are then ranked
same web server less than links from other servers. Also, in according to importance as described above in connection
addition to servers, internet domains and any general mea with various exemplary embodiments of the present inven
sure of the distance between links could be used to deter 20 tion.
mine such a weighting. The search engine is used to locate documents that match
Additional modifications can further improve the perfor the specified search criteria, either by searching full text, or
mance of this method. Rank can be increased for documents by searching titles only. In addition, the search can include
whose backlinks are maintained by different institutions and the anchor text associated with backlinks to the page. This
authors in various geographic locations. Or it can be 25 approach has several advantages in this context. First,
increased if links come from unusually important web anchors often provide more accurate descriptions of web
locations such as the root page of a domain. pages than the pages themselves. Second, anchors may exist
Links can also be weighted by their relative importance for images, programs, and other objects that cannot be
within a document. For example, highly visible links that are
near the top of a document can be given more weight. Also, 30 indexed by a text-based search engine. This also makes it
links that are in large fonts or emphasized in other ways can possible to return web pages which have not actually been
be given more weight. In this way, the model better approxi crawled. In addition, the engine can compare the search
mates human usage and authors’ intentions. In many cases terms with a list of its backlink document titles. Thus, even
it is appropriate to assign higher value to links coming from though the text of the document itself may not match the
pages that have been modified recently since such informa 35 search terms, if the document is cited by documents whose
tion is less likely to be obsolete. titles or backlink anchor text match the search terms, the
Various implementations of the invention have the advan document will be considered a match. In addition to or
tage that the convergence is very fast (a few hours using instead of the anchor text, the text in the immediate vicinity
current processors) and it is much less expensive than of the backlink anchor text can also be compared to the
building a full-text index. This speed allows the ranking to 40
be customized or personalized for specific users. For search terms in order to improve the search.
example, a user’s home page and/or bookmarks can be given Once a set of documents is identified that match the search
a large initial importance, and/or a high probability of a terms, the list of documents is then sorted with high ranking
random jump returning to it. This high rating essentially documents first and low ranking documents last. The rank
indicates to the system that the person’s homepage and/or 45 ing in this case is a function which combines all of the above
bookmarks does indeed contain subjects of importance that factors such as the objective ranking and textual matching.
should be highly ranked. This procedure essentially trains If desired, the results can be grouped by category or site as
the system to recognize pages related to the person’s inter well.
ests. The present method of determining the rank of a It will be clear to one skilled in the art that the above
document can also be used to enhance the display of 50 embodiments may be altered in many ways without depart
documents. In particular, each link in a document can be ing from the scope of the invention. Accordingly, the scope
annotated with an icon, text, or other indicator of the rank of of the invention should be determined by the following
the document that each link points to. Anyone viewing the claims and their legal equivalents.
document can then easily see the relative importance of What is claimed is:
various links in the document. 55 1. A computer implemented method of scoring a plurality
The present method of ranking documents in a database of linked documents, comprising:
can also be useful for estimating the amount of attention any obtaining a plurality of documents, at least some of the
document receives on the web since it models human documents being linked documents, at least some of the
behavior when surfing the web. Estimating the importance documents being linking documents, and at least some
of each backlink to a page can be useful for many purposes 60 of the documents being both linked documents and
including site design, business arrangements with the linking documents, each of the linked documents being
backlinkers, and marketing. The effect of potential changes pointed to by a link in one or more of the linking
to the hypertext structure can be evaluated by adding them documents;
to the link structure and recomputing the ranking. assigning a score to each of the linked documents based
Real usage data, when available, can be used as a starting 65 on scores of the one or more linking documents and
point for the model and as the distribution for the alpha processing the linked documents according to their
factor. SCOreS.
US 6,285,999 B1
9 10
2. The method of claim 1, wherein the assigning includes: processing the linked documents according to their
identifying a weighting factor for each of the linking updated ranks.
documents, the weighting factor being dependent on 10. A computer implemented method of ranking a plural
the number of links to the one or more linking ity of linked documents, comprising:
documents, and 5 automatically performing a random traversal of a plurality
adjusting the score of each of the one or more linking of linked documents, the random traversal including
documents based on the identified weighting factor. selecting a random link to traverse in a current linked
3. The method of claim 1, wherein the assigning includes: document;
identifying a weighting factor for each of the linking for each linked document that is traversed, assigning a
documents, the weighting factor being dependent on an 10 rank to the linked document that is dependent on the
number of times the linked document has been tra
estimation of a probability that a linking document will versed; and
be accessed, and processing the plurality of linked documents according to
adjusting the score of each of the one or more linking their rank.
documents based on the identified weighting factor. 15 11. The method of claim 10, wherein there is a predeter
4. The method of claim 1, wherein the assigning includes: mined probability that the next linked document to be
identifying a weighting factor for each of the linking traversed will be a random one according to a distribution of
documents, the weighting factor being dependent on the plurality of linked documents.
the URL, host, domain, author, institution, or last 12. The method of claim 1, wherein the processing
update time of the one or more linking documents, and 20 includes:
adjusting the score of each of the one or more linking displaying links to the linked documents as a directory
documents based on the identified weighting factor. listing.
5. The method of claim 1, wherein the assigning includes: 13. The method of claim 1, wherein the processing
identifying a weighting factor for each of the linking includes:
documents, the weighting factor being dependent on 25 displaying links to the linked documents, and
whether the one or more linking documents are selected displaying annotations representing the score of each of
documents or roots, and the linked documents.
adjusting the score of each of the one or more linking 14. The method of claim 13, wherein the annotations are
documents based on the identified weighting factor. bars, icons, or text.
6. The method of claim 1, wherein the assigning includes: 30
15. The method of claim 1, further comprising:
identifying a weighting factor for each of the linking processing the linked documents based on textual match
documents, the weighting factor being dependent on ing.
the importance, visibility or textual emphasis of the 16. The method of claim 15, wherein the textual matching
links in the one or more linking documents, and includes matching anchor text associated with the links.
adjusting the score of each of the one or more linking 35
17. The method of claim 1, further comprising:
documents based on the identified weighting factor. processing the linked documents based on groupings of
7. The method of claim 1, wherein the assigning includes: the linked documents.
identifying a weighting factor for each of the linking 18. A computer-readable medium that stores instructions
documents, the weighting factor being dependent on a executable by one or more processing devices to perform a
particular user’s preferences, the rate at which users 40
method for determining scores for a plurality of linked
access the one or more linking documents, or the documents, comprising:
importance of the one or more linking documents, and instructions for obtaining a plurality of documents, at
adjusting the score of each of the one or more linking least some of the documents being linked documents, at
documents based on the identified weighting factor. 45 least some of the documents being linking documents,
8. A computer implemented method of determining a and at least some of the documents being both linked
score for a plurality of linked documents, comprising: documents and linking documents, each of the linked
obtaining a plurality of linked documents; documents being pointed to by a link in one or more of
selecting one of the linked documents; the linking documents;
assigning a score to the selected document that is depen 50 instructions for determining a score for each of the linked
dent on scores of documents that link to the selected documents based on scores for the one or more linking
document; and documents; and
processing the linked documents according to their instructions for processing the linked documents accord
SCOreS. ing to their scores.
9. A computer implemented method of ranking a plurality 55 19. A computer-readable medium that stores instructions
of linked documents, comprising: executable by one or more processors to perform a method
obtaining a plurality of documents, at least some of the for scoring documents, comprising:
documents being linked documents and at least some of instructions for searching a plurality of documents, at
the documents being linking documents, at least some least some of the documents being linked documents
of the linking documents also being linked documents, 60 and at least some of the documents being linking
each of the linked documents being pointed to by a link documents, at least some of the linking documents also
in one or more of the linking documents; being linked documents, each of the linked documents
generating an initial estimate of a rank for each of the being pointed to by a link in one or more of the linking
linked documents; documents;
updating the estimate of the rank for each of the linked 65 instructions for scoring each of the linked documents
documents using ranks for the one or more linking based on scores for the one or more linking documents;
documents; and and
US 6,285,999 B1
11 12
instructions for providing the linked documents based on 25. The method of claim 1, wherein the assigning a score
their scores. includes:
20. The method of claim 1, wherein the assigning a score associating one or more backlinks with each of the linked
includes: documents, each of the backlinks corresponding to one
determining the score based on (1) a number of the linking 5 of the linking documents that links to the linked
documents that link to the linked document and (2) an document,
importance of the linking documents. assigning a weight to each of the backlinks, and
21. The method of claim 20, wherein the importance of determining a score for each of the linked documents
the linking documents is based on a number of documents based on a sum of the weights assigned to the backlinks
that link to the linking documents. 10
associated with the linked document.
22. The method of claim 1, wherein the assigning a score 26. The method of claim 25, wherein the weights assigned
includes:
to each of the backlinks are independent of text of the
associating one or more backlinks with each of the linked corresponding linking documents.
documents, each of the backlinks corresponding to one 15 27. The method of claim 1, wherein the assigning a score
of the linking documents that links to the linked includes:
document, determining the score primarily based on linking infor
assigning a weight to each of the backlinks, and mation.
determining a score for each of the linked documents 28. The method of claim 1, wherein the assigning a score
based on a number of backlinks for the linked docu 20 includes:
ment and the weights assigned to the backlinks. determining the score substantially independent of user
23. The method of claim 22, wherein the processing of the query content.
linked documents includes: 29. The method of claim 1, wherein the assigning a score
organizing the linked documents based on the determined includes:
SCOreS. 25 iteratively determining the score for a linked document,
24. The method of claim 22, wherein the assigning a the score being primarily based on document-linking
weight includes: information and substantially independent of user
assigning different weights to at least some of the back query content.
links associated with at least one of the linked docu
mentS.
UNITED STATES PATENT AND TRADEMARK OFFICE
CERTIFICATE OF CORRECTION
PATENT NO. : 6,285,999 B1 Page 1 of 1
APPLICATION NO. : 09/004827
DATED ; September 4, 2001
INVENTOR(S) : Lawrence Page
It is certified that error appears in the above-identified patent and that said Letters Patent is hereby corrected as shown below:

In the Specification
Column 1, lines 13-15 should be replaced with the following paragraph:
This invention was made with Government support under contract 9411306 awarded by the National
Science Foundation. The Government has certain rights in this invention.

Signed and Sealed this

Sixth Day of August, 2013

Teresa Stanek Rea

Acting Director of the United States Patent and Trademark Office

Node Ranking Method for Databases
No ratings yet
Node Ranking Method for Databases
12 pages
Document Scoring Method Patent
No ratings yet
Document Scoring Method Patent
10 pages
Document Scoring in Linked Databases
No ratings yet
Document Scoring in Linked Databases
12 pages
Pagerank: Standing On The Shoulders of Giants
No ratings yet
Pagerank: Standing On The Shoulders of Giants
10 pages
PageRank's Historical Foundations
No ratings yet
PageRank's Historical Foundations
10 pages
Competitor Link Analysis Techniques
No ratings yet
Competitor Link Analysis Techniques
43 pages
Understanding Information Networks and the Web
No ratings yet
Understanding Information Networks and the Web
37 pages
Web Search Engine Architecture Overview
No ratings yet
Web Search Engine Architecture Overview
29 pages
Understanding Search Engine Functionality
No ratings yet
Understanding Search Engine Functionality
10 pages
Topic Distillation Method Patent
No ratings yet
Topic Distillation Method Patent
15 pages
SCS313 WebDataMining Units3and4
No ratings yet
SCS313 WebDataMining Units3and4
26 pages
Link Analysis: PageRank and HITS Explained
No ratings yet
Link Analysis: PageRank and HITS Explained
47 pages
Understanding Web Mining Techniques
No ratings yet
Understanding Web Mining Techniques
22 pages
Efficient PageRank Approximation Method
No ratings yet
Efficient PageRank Approximation Method
16 pages
Comparing PageRank and HITS Algorithms
No ratings yet
Comparing PageRank and HITS Algorithms
21 pages
Network Analysis of Wikipedia Entries
No ratings yet
Network Analysis of Wikipedia Entries
12 pages
Eigenvector Insights of PageRank Algorithm
No ratings yet
Eigenvector Insights of PageRank Algorithm
13 pages
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
No ratings yet
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
46 pages
Exploring PageRank Model Enhancements
No ratings yet
Exploring PageRank Model Enhancements
33 pages
Ranking Popular Topics with Experts
No ratings yet
Ranking Popular Topics with Experts
15 pages
9 Search Searching
No ratings yet
9 Search Searching
15 pages
The Linear Algebra Behind Google
No ratings yet
The Linear Algebra Behind Google
13 pages
Understanding Link Analysis Concepts
No ratings yet
Understanding Link Analysis Concepts
28 pages
Equilibrium Sets in Directed Networks
No ratings yet
Equilibrium Sets in Directed Networks
33 pages
Chapter 6
No ratings yet
Chapter 6
22 pages
Google Page Rank Algorithm Analysis
No ratings yet
Google Page Rank Algorithm Analysis
5 pages
Confidence-Based Page Ranking Method
No ratings yet
Confidence-Based Page Ranking Method
6 pages
Advanced Data Mining Techniques
No ratings yet
Advanced Data Mining Techniques
59 pages
Using Social Annotation and Web Log To Enhance Search Engine
No ratings yet
Using Social Annotation and Web Log To Enhance Search Engine
6 pages
Web and Text Mining Overview
No ratings yet
Web and Text Mining Overview
36 pages
Comprehensive Guide to Web Mining Techniques
No ratings yet
Comprehensive Guide to Web Mining Techniques
36 pages
Web Search Fundamentals: Link Analysis
No ratings yet
Web Search Fundamentals: Link Analysis
86 pages
Understanding PageRank and Its Impact
No ratings yet
Understanding PageRank and Its Impact
4 pages
Ranking Hypertext Search Results Method
No ratings yet
Ranking Hypertext Search Results Method
29 pages
Icwet 1094
No ratings yet
Icwet 1094
6 pages
Overview of Web Mining Techniques
100% (1)
Overview of Web Mining Techniques
63 pages
Node Ranking in Labeled Directed Graphs
No ratings yet
Node Ranking in Labeled Directed Graphs
10 pages
SCS313 Assignment3 4 Answers
No ratings yet
SCS313 Assignment3 4 Answers
16 pages
Eigenvector Insights of PageRank Algorithm
No ratings yet
Eigenvector Insights of PageRank Algorithm
13 pages
Understanding Search Engine Functionality
No ratings yet
Understanding Search Engine Functionality
42 pages
Social Network Analysis Techniques
No ratings yet
Social Network Analysis Techniques
73 pages
Logo - File 3
No ratings yet
Logo - File 3
4 pages
Pagerank Computation and Keyword Search On Distributed Systems and P2P Networks
No ratings yet
Pagerank Computation and Keyword Search On Distributed Systems and P2P Networks
17 pages
Understanding Page Rank in Data Mining
No ratings yet
Understanding Page Rank in Data Mining
7 pages
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
No ratings yet
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
51 pages
Understanding Link Analysis and PageRank
No ratings yet
Understanding Link Analysis and PageRank
57 pages
Web Crawler Performance Evaluation
No ratings yet
Web Crawler Performance Evaluation
34 pages
Anatomy of a Large-Scale Search Engine
No ratings yet
Anatomy of a Large-Scale Search Engine
27 pages
Deep Crawling of Web Sites Using Frontier Technique: Samantula Hemalatha
No ratings yet
Deep Crawling of Web Sites Using Frontier Technique: Samantula Hemalatha
11 pages
WEights and Directions
No ratings yet
WEights and Directions
34 pages
Understanding PageRank Algorithm
No ratings yet
Understanding PageRank Algorithm
3 pages
SEO: The PAGE RANK Algorithm: Presidency University, Bengaluru School of Engineering
No ratings yet
SEO: The PAGE RANK Algorithm: Presidency University, Bengaluru School of Engineering
56 pages
Big Data Analytics: PageRank & Applications
No ratings yet
Big Data Analytics: PageRank & Applications
20 pages
Math Behind the PageRank Algorithm
No ratings yet
Math Behind the PageRank Algorithm
8 pages
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
Hubiness in Big Data Analysis
No ratings yet
Hubiness in Big Data Analysis
42 pages
Enhancing Search Engines with Data Mining
No ratings yet
Enhancing Search Engines with Data Mining
4 pages
Google PageRank Algorithm Explained
100% (1)
Google PageRank Algorithm Explained
10 pages
Mathematical Foundations of PageRank
No ratings yet
Mathematical Foundations of PageRank
22 pages
David Cohn: Business Objects Specialist
No ratings yet
David Cohn: Business Objects Specialist
6 pages
CM FeatureDescription70
No ratings yet
CM FeatureDescription70
75 pages
GSEB Class 11 Computer Queries Overview
No ratings yet
GSEB Class 11 Computer Queries Overview
7 pages
PHP Basics: Variables, Arrays, and Syntax
No ratings yet
PHP Basics: Variables, Arrays, and Syntax
18 pages
BCA 3rd Semester Syllabus 2025
No ratings yet
BCA 3rd Semester Syllabus 2025
27 pages
NSIC Online Training: Big Data & Hadoop
No ratings yet
NSIC Online Training: Big Data & Hadoop
2 pages
Fmp12 Tutorial
No ratings yet
Fmp12 Tutorial
82 pages
Salesforce Real Estate Management System
No ratings yet
Salesforce Real Estate Management System
5 pages
Snowflake Data Loading Best Practices
No ratings yet
Snowflake Data Loading Best Practices
51 pages
DBMS 3rd Semester Guess Paper 2024
No ratings yet
DBMS 3rd Semester Guess Paper 2024
2 pages
Resettlement Program Officer Role at MHPL
No ratings yet
Resettlement Program Officer Role at MHPL
6 pages
Oracle Database 21c - Distributed Data
No ratings yet
Oracle Database 21c - Distributed Data
42 pages
Infor ERP XA Power Link Overview
No ratings yet
Infor ERP XA Power Link Overview
112 pages
Web-Based Payroll Management System
No ratings yet
Web-Based Payroll Management System
3 pages
Next Generation Parametric Design - s12
No ratings yet
Next Generation Parametric Design - s12
14 pages
Fulltext01 3 PDF
No ratings yet
Fulltext01 3 PDF
61 pages
Mcafee Epolicy Orchestrator 5.10.0 Installation Guide
0% (1)
Mcafee Epolicy Orchestrator 5.10.0 Installation Guide
100 pages
Complaint Management System Report
No ratings yet
Complaint Management System Report
20 pages
Doculink Configuration Steps
80% (5)
Doculink Configuration Steps
31 pages
ServiceNow Clone Data Management Guide
No ratings yet
ServiceNow Clone Data Management Guide
10 pages
64-Week AI/ML Learning Roadmap
No ratings yet
64-Week AI/ML Learning Roadmap
3 pages
Grade 7 Computer Operations Curriculum
No ratings yet
Grade 7 Computer Operations Curriculum
31 pages
Java Parking Allotment System Report
No ratings yet
Java Parking Allotment System Report
11 pages
Taskbot Evaluation and Control Room Access
No ratings yet
Taskbot Evaluation and Control Room Access
13 pages
Portal Troubleshoot
No ratings yet
Portal Troubleshoot
145 pages
Odoo CRM: Managing Lost Opportunities
No ratings yet
Odoo CRM: Managing Lost Opportunities
225 pages
BNA Computer Systems and DAG Analysis
No ratings yet
BNA Computer Systems and DAG Analysis
7 pages
Nanda Kishore Resume
No ratings yet
Nanda Kishore Resume
2 pages
Cloud Computing Data Management Guide
No ratings yet
Cloud Computing Data Management Guide
6 pages
Login & Account Creation System Overview
No ratings yet
Login & Account Creation System Overview
2 pages