100% found this document useful (1 vote)

20 views45 pages

CLARANS in Spatial Web Mining

This document discusses spatial data mining and web mining. For spatial data mining, it covers spatial data types and attributes, spatial queries and data structures like quad trees and R-trees. It also discusses spatial rules, clustering algorithms and applications. For web mining, it provides an overview of web content mining, structure mining and usage mining. It describes techniques like crawlers, PageRank, HITS and pattern discovery from web server logs.

Uploaded by

rekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

20 views45 pages

CLARANS in Spatial Web Mining

Uploaded by

rekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Spatial & Web Mining

1
• Spatial Data: Introduction
• Spatial Data Overview
• Spatial Rule
• Spatial Clustering Algorithm
Spatial Object

• Contains both spatial and nonspatial

attributes.
• Must have a location type attributes:
– Latitude/longitude
– Zip code
– Street address
• May retrieve object using either (or
both) spatial or nonspatial attributes.
Spatial Data Mining
Applications
• Geology
• GIS Systems
• Environmental Science
• Agriculture
• Medicine
• Robotics
• May involved both spatial and
temporal aspects
Spatial Queries
• Spatial selection may involve specialized
selection comparison operations:
– Near
– North, South, East, West
– Contained in
– Overlap/intersect
• Region (Range) Query – find objects that
intersect a given region.
• Nearest Neighbor Query – find object close to
identified object.
• Distance Scan – find object within a certain
distance of an identified object where distance
is made increasingly larger.
Spatial Data Structures
• Data structures designed specifically to store or
index spatial data.
• Often based on B-tree or Binary Search Tree
• Cluster data on disk basked on geographic
location.
• May represent complex spatial structure by
placing the spatial object in a containing
structure of a specific geographic shape.
• Techniques:
– Quad Tree
– R-Tree
– k-D Tree
MBR

• Minimum Bounding Rectangle

• Smallest rectangle that completely
contains the object
MBR Examples
Quad Tree

• Hierarchical decomposition of the space

into quadrants (MBRs)
• Each level in the tree represents the
object as the set of quadrants which
contain any portion of the object.
• Each level is a more exact
representation of the object.
• The number of levels is determined by
the degree of accuracy desired.
Quad Tree Example
R-Tree

• As with Quad Tree the region is

divided into successively smaller
rectangles (MBRs).
• Rectangles need not be of the same
size or number at each level.
• Rectangles may actually overlap.
• Lowest level cell has only one object.
• Tree maintenance algorithms similar
to those for B-trees.
R-Tree Example
K-D Tree

• Designed for multi-attribute data, not

necessarily spatial
• Variation of binary search tree
• Each level is used to index one of the
dimensions of the spatial object.
• Lowest level cell has only one object
• Divisions not based on MBRs but
successive divisions of the dimension
range.
k-D Tree Example
Spatial Rules
• Characteristic Rule
The average family income in Dallas is
$50,000.
• Discriminant Rule
The average family income in Dallas is
$50,000, while in Plano the average income is
$75,000.
• Association Rule
The average family income in Dallas for
families living near White Rock Lake is
$100,000.
Spatial Clustering

• Detect clusters of irregular shapes

• Use of centroids and simple distance
approaches may not work well.
• Clusters should be independent of
order of input.
CLARANS Extensions

• Remove main memory assumption of

CLARANS.
• Use spatial index techniques.
• Use sampling and R*-tree to identify
central objects.
• Change cost calculations by reducing
the number of objects examined.
• Voronoi Diagram
Voronoi
Web Mining Outline

• Introduction
• Web Content Mining
• Web Structure Mining
• Web Usage Mining
Web Mining Taxonomy
Web Content Mining

• Extends work of basic search engines

• Search Engines
– IR application
– Keyword based
– Similarity between query and document
– Crawlers
– Indexing
– Profiles
– Link analysis
Crawlers
• Robot (spider) traverses the hypertext
sructure in the Web.
• Collect information from visited pages
• Used to construct indexes for search
engines
• Traditional Crawler – visits entire Web (?)
and replaces index
• Periodic Crawler – visits portions of the
Web and updates subset of index
• Incremental Crawler – selectively searches
the Web and incrementally modifies index
• Focused Crawler – visits pages related to a
particular subject
Focused Crawler

• Classifier to related documents to

topics
• Classifier also determines how useful
outgoing links are
• Hub Pages contain links to many
relevant pages. Must be visited even if
not high relevance score.
Focused Crawler
Context Focused Crawler

• Context Graph:
– Context graph created for each seed
document .
– Root is the sedd document.
– Nodes at each level show documents with links
to documents at next higher level.
– Updated during crawl itself .
• Approach:
1. Construct context graph and classifiers using
seed documents as training data.
2. Perform crawling using classifiers and context
graph created.
Context Graph
Virtual Web View
• Multiple Layered DataBase (MLDB) built on
top of the Web.
• Each layer of the database is more generalized
(and smaller) and centralized than the one
beneath it.
• Upper layers of MLDB are structured and can be
accessed with SQL type queries.
• Translation tools convert Web documents to
XML.
• Extraction tools extract desired information to
place in first layer of MLDB.
• Higher levels contain more summarized data
obtained through generalizations of the lower
levels.
Personalization

• Web access or contents tuned to better fit the

desires of each user.
• Manual techniques identify user’s
preferences based on profiles or
demographics.
• Collaborative filtering identifies
preferences based on ratings from similar
users.
• Content based filtering retrieves pages
based on similarity between pages and user
profiles.
Web Structure Mining

• PR(p) = c (PR(1)/N1 + … + PR(n)/Nn)

– PR(i): PageRank for a page i which points
to target page p.
– Ni: number of links coming out of page i
CLEVER

• Identify authoritative and hub pages.

• Authoritative Pages :
– Highly important pages.
– Best source for requested information.
• Hub Pages :
– Contain links to highly important pages.
HITS

• Hyperlink-Induces Topic Search

• Based on a set of keywords, find set of
relevant pages – R.
• Identify hub and authority pages for these.
– Expand R to a base set, B, of pages linked to or
from R.
– Calculate weights for authorities and hubs.
• Pages with highest ranks in R are returned.
Web Usage Mining

• Extends work of basic search engines

• Search Engines
– IR application
– Keyword based
– Similarity between query and document
– Crawlers
– Indexing
– Profiles
– Link analysis
Web Usage Mining
Applications
• Personalization
• Improve structure of a site’s Web
pages
• Aid in caching and prediction of future
page references
• Improve design of individual pages
• Improve effectiveness of e-commerce
(sales and advertising)
Web Usage Mining Activities
• Preprocessing Web log
– Cleanse
– Remove extraneous information
– Sessionize
Session: Sequence of pages referenced by one user at a
sitting.
• Pattern Discovery
– Count patterns that occur in sessions
– Pattern is sequence of pages references in session.
– Similar to association rules
• Transaction: session
• Itemset: pattern (or subset)
• Order is important
• Pattern Analysis
Web Usage Mining Issues

• Identification of exact user not

possible.
• Exact sequence of pages referenced
by a user not possible due to caching.
• Session not well defined
• Security, privacy, and legal issues
Data Structures

• Keep track of patterns identified

during Web usage mining process
• Common techniques:
– Trie
– Suffix Tree
– Generalized Suffix Tree
– WAP Tree
Trie vs. Suffix Tree

• Trie:
– Rooted tree
– Edges labeled which character (page)
from pattern
– Path from root to leaf represents pattern.
• Suffix Tree:
– Single child collapsed with parent. Edge
contains labels of both prior edges.
Trie and Suffix Tree
Generalized Suffix Tree

• Suffix tree for multiple sessions.

• Contains patterns from all sessions.
• Maintains count of frequency of
occurrence of a pattern in the node.
• WAP Tree:
Compressed version of generalized suffix
tree
Types of Patterns

• Algorithms have been developed to

discover different types of patterns.
• Properties:
– Ordered – Characters (pages) must occur in the
exact order in the original session.
– Duplicates – Duplicate characters are allowed in
the pattern.
– Consecutive – All characters in pattern must
occur consecutive in given session.
– Maximal – Not subsequence of another pattern.
Pattern Types

• Association Rules
None of the properties hold
• Episodes
Only ordering holds
• Sequential Patterns
Ordered and maximal
• Forward Sequences
Ordered, consecutive, and maximal
• Maximal Frequent Sequences
All properties hold
Episodes

• Partially ordered set of pages

• Serial episode – totally ordered with
time constraint
• Parallel episode – partial ordered with
time constraint
• General episode – partial ordered
with no time constraint

Webmininglec
100% (1)
Webmininglec
75 pages
Three Key Areas of Web Mining
No ratings yet
Three Key Areas of Web Mining
28 pages
Web Content Mining Overview
100% (1)
Web Content Mining Overview
112 pages
Web Mining Techniques and Concepts
No ratings yet
Web Mining Techniques and Concepts
130 pages
Survey of Linked Data Exploration Systems
100% (1)
Survey of Linked Data Exploration Systems
13 pages
SPARQL-Based Faceted Exploration Tool
No ratings yet
SPARQL-Based Faceted Exploration Tool
84 pages
RDF in Digital Libraries and Archives
No ratings yet
RDF in Digital Libraries and Archives
7 pages
Web Semantics: Science, Services and Agents On The World Wide Web
100% (1)
Web Semantics: Science, Services and Agents On The World Wide Web
22 pages
Historical Events Extraction API
100% (1)
Historical Events Extraction API
12 pages
Guide to Fossil Version Control
No ratings yet
Guide to Fossil Version Control
115 pages
Top Online History Resources Guide
No ratings yet
Top Online History Resources Guide
16 pages
Semantic Hierarchies in Image Annotation
No ratings yet
Semantic Hierarchies in Image Annotation
41 pages
Understanding Web 3.0 and Semantic Web
50% (2)
Understanding Web 3.0 and Semantic Web
39 pages
PowerShell Essentials for IT Professionals
No ratings yet
PowerShell Essentials for IT Professionals
5 pages
DBpedia: Open Data from Wikipedia
No ratings yet
DBpedia: Open Data from Wikipedia
14 pages
Introduction to the Semantic Web
No ratings yet
Introduction to the Semantic Web
244 pages
Essential History Web Resources
100% (7)
Essential History Web Resources
43 pages
Essential Windows 11 User Tips
No ratings yet
Essential Windows 11 User Tips
40 pages
Powershell For Beginners Guide To Learn Powershell, Powershell 5 and Powershell Scripting (2017)
100% (1)
Powershell For Beginners Guide To Learn Powershell, Powershell 5 and Powershell Scripting (2017)
87 pages
MFC Application Development Guide
No ratings yet
MFC Application Development Guide
1,377 pages
Moving Mainframe Applications To Windows
No ratings yet
Moving Mainframe Applications To Windows
34 pages
History of Computer Development
100% (1)
History of Computer Development
14 pages
Berry M.W., Browne M.-Understanding Search Engines. Mathematical Modeling and Text Retrieval-SIAM, Society For Industrial and Applied Mathematics (2005) PDF
100% (2)
Berry M.W., Browne M.-Understanding Search Engines. Mathematical Modeling and Text Retrieval-SIAM, Society For Industrial and Applied Mathematics (2005) PDF
136 pages
GeekTool Commands for Mac Customization
No ratings yet
GeekTool Commands for Mac Customization
2 pages
PyCharm: Modern Python Development Guide
0% (2)
PyCharm: Modern Python Development Guide
4 pages
Agentic Coding With Claude Code
No ratings yet
Agentic Coding With Claude Code
377 pages
Getting Started with GeekTool on Mac
No ratings yet
Getting Started with GeekTool on Mac
8 pages
R Markdown Guide: Formatting & Code
No ratings yet
R Markdown Guide: Formatting & Code
6 pages
Introduction to MFC Framework
0% (1)
Introduction to MFC Framework
172 pages
History and Vision of the Semantic Web
No ratings yet
History and Vision of the Semantic Web
86 pages
VBA Basics for Excel Beginners
100% (2)
VBA Basics for Excel Beginners
28 pages
The ABC Music Standard 2.1 (Dec 2011)
No ratings yet
The ABC Music Standard 2.1 (Dec 2011)
54 pages
Introducing Object-Oriented Programming (OOP) : CSCI N201: Programming Concepts
No ratings yet
Introducing Object-Oriented Programming (OOP) : CSCI N201: Programming Concepts
23 pages
AdvancesInKnowledgeDicoveryAndDataMining 2012 Part1
100% (1)
AdvancesInKnowledgeDicoveryAndDataMining 2012 Part1
642 pages
NirCmd Command-Line Utility Guide
No ratings yet
NirCmd Command-Line Utility Guide
59 pages
Advanced Metafor Tutorial Guide
No ratings yet
Advanced Metafor Tutorial Guide
50 pages
Introduction to Computer Software Basics
No ratings yet
Introduction to Computer Software Basics
25 pages
Python Basics for AI Programming
No ratings yet
Python Basics for AI Programming
3 pages
Complete Flexbox Guide on CSS-Tricks
No ratings yet
Complete Flexbox Guide on CSS-Tricks
10 pages
Python-docx Library User Guide
No ratings yet
Python-docx Library User Guide
197 pages
AutoIt V3 Scripting Tutorial for Beginners
No ratings yet
AutoIt V3 Scripting Tutorial for Beginners
106 pages
Tkinter GUI Basics in Python
No ratings yet
Tkinter GUI Basics in Python
59 pages
Inductive Logic Programming Overview
No ratings yet
Inductive Logic Programming Overview
15 pages
Emacs Mark and Region Management
100% (1)
Emacs Mark and Region Management
5 pages
GNU Emacs 21 Reference Card
No ratings yet
GNU Emacs 21 Reference Card
2 pages
Web Mining Techniques and Applications
No ratings yet
Web Mining Techniques and Applications
27 pages
Web Mining Techniques and Applications
0% (1)
Web Mining Techniques and Applications
48 pages
Overview of Web Mining Techniques
No ratings yet
Overview of Web Mining Techniques
41 pages
DM (MR-22) Module-5
No ratings yet
DM (MR-22) Module-5
31 pages
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
Understanding Web Mining Techniques
No ratings yet
Understanding Web Mining Techniques
25 pages
Web Mining Techniques and Applications
No ratings yet
Web Mining Techniques and Applications
21 pages
Comprehensive Guide to Web Mining Techniques
No ratings yet
Comprehensive Guide to Web Mining Techniques
36 pages
Understanding Web Mining Techniques
No ratings yet
Understanding Web Mining Techniques
33 pages
Sequential Assignment in Web Mining
No ratings yet
Sequential Assignment in Web Mining
48 pages
Web Mining Techniques and Applications
No ratings yet
Web Mining Techniques and Applications
87 pages
Web and Text Mining Overview
No ratings yet
Web and Text Mining Overview
36 pages
Understanding Web Mining Techniques
No ratings yet
Understanding Web Mining Techniques
73 pages
Understanding Web Mining Techniques
No ratings yet
Understanding Web Mining Techniques
32 pages
Datamining 5th Module
No ratings yet
Datamining 5th Module
18 pages
KYC Verification Test Cases Document
No ratings yet
KYC Verification Test Cases Document
18 pages
Progressive Band Selection Processing of Hyperspectral Image Classification
No ratings yet
Progressive Band Selection Processing of Hyperspectral Image Classification
5 pages
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
No ratings yet
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
4 pages
Data Preprocessing Techniques Overview
No ratings yet
Data Preprocessing Techniques Overview
63 pages
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
No ratings yet
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
10 pages
Indian Stock Market Prediction Using Deep Learning
No ratings yet
Indian Stock Market Prediction Using Deep Learning
6 pages
Data Mining Midterm Sample Questions
No ratings yet
Data Mining Midterm Sample Questions
4 pages
Implementing Linear Regression in R
No ratings yet
Implementing Linear Regression in R
6 pages
FSM and Regular Expressions Explained
No ratings yet
FSM and Regular Expressions Explained
21 pages
High Utility Sequential Pattern Mining
No ratings yet
High Utility Sequential Pattern Mining
17 pages
Fuzzy K-Mean Clustering in Cybersecurity
No ratings yet
Fuzzy K-Mean Clustering in Cybersecurity
5 pages
Chlorine Content in Illinois Coal
No ratings yet
Chlorine Content in Illinois Coal
25 pages
Class 11 Physics Cycle Test 1 Questions
No ratings yet
Class 11 Physics Cycle Test 1 Questions
4 pages
Blues Scale Guitar Guide and Diagrams
100% (2)
Blues Scale Guitar Guide and Diagrams
5 pages
Understanding Common Factors and GCF
No ratings yet
Understanding Common Factors and GCF
18 pages
J MicromanipulationPaper
No ratings yet
J MicromanipulationPaper
11 pages
Understanding Measures of Variability
No ratings yet
Understanding Measures of Variability
16 pages
Physics-II Course Overview for Civil Engineering
No ratings yet
Physics-II Course Overview for Civil Engineering
4 pages
Median Lobe Hypertrophy in CT Report
No ratings yet
Median Lobe Hypertrophy in CT Report
2 pages
Understanding Arterial Blood Pressure
No ratings yet
Understanding Arterial Blood Pressure
16 pages
Class 9 Triangle MCQs with Answers
No ratings yet
Class 9 Triangle MCQs with Answers
6 pages
Ouchterlony Double Immunodiffusion Method
No ratings yet
Ouchterlony Double Immunodiffusion Method
41 pages
Radiation Damage in Biomolecular Systems
No ratings yet
Radiation Damage in Biomolecular Systems
15 pages
Gullfaks Oil Field Reservoir Analysis
No ratings yet
Gullfaks Oil Field Reservoir Analysis
8 pages
Kentucky Certified Crop Advisor Manual
No ratings yet
Kentucky Certified Crop Advisor Manual
145 pages
Class X Chemistry Practice Paper: Reactions
No ratings yet
Class X Chemistry Practice Paper: Reactions
2 pages
White Sugar Manufacturing Process Overview
No ratings yet
White Sugar Manufacturing Process Overview
21 pages
3) Pre Centum Exam - 2 QP
No ratings yet
3) Pre Centum Exam - 2 QP
8 pages
Casting Solidification and Fluid Flow Principles
No ratings yet
Casting Solidification and Fluid Flow Principles
24 pages
Research Methodology Exam Paper 2021
No ratings yet
Research Methodology Exam Paper 2021
2 pages
Levine 8e Business Statistics Question Bank
No ratings yet
Levine 8e Business Statistics Question Bank
15 pages
Understanding Excel Structured References
No ratings yet
Understanding Excel Structured References
4 pages
RC3563 Battery Tester Manual
No ratings yet
RC3563 Battery Tester Manual
5 pages
Complaints Resolving System Project Report
No ratings yet
Complaints Resolving System Project Report
60 pages
Loads On Furniture Pdm3scan
No ratings yet
Loads On Furniture Pdm3scan
28 pages
Types of Texts in English
No ratings yet
Types of Texts in English
6 pages
EcoTestr pH 2 User Manual
No ratings yet
EcoTestr pH 2 User Manual
2 pages
Understanding Distributed System Security
No ratings yet
Understanding Distributed System Security
70 pages
Honor Device App Initialization Logs
No ratings yet
Honor Device App Initialization Logs
13 pages
βhCG ELISA Assay Instructions
No ratings yet
βhCG ELISA Assay Instructions
2 pages
B3
No ratings yet
B3
3 pages