0% found this document useful (0 votes)
8 views21 pages

OLE DB for Data Mining Overview

The document discusses Microsoft's OLE DB for Data Mining (OLDB DM), which provides an industry standard interface for building data mining applications. It introduces the basic components and operations supported by OLDB DM, including data mining models, populating models from training data, and using models to predict attributes for new data. The goal is to make it easier for developers to build data mining applications that can store, maintain, and programmatically use data mining models across different software and databases.

Uploaded by

api-27259648
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

OLE DB for Data Mining Overview

The document discusses Microsoft's OLE DB for Data Mining (OLDB DM), which provides an industry standard interface for building data mining applications. It introduces the basic components and operations supported by OLDB DM, including data mining models, populating models from training data, and using models to predict attributes for new data. The goal is to make it easier for developers to build data mining applications that can store, maintain, and programmatically use data mining models across different software and databases.

Uploaded by

api-27259648
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Appendix A: An Introduction to

Microsoft’s OLE OLDB for Data


Mining
 Introduction
 Overview and design philosophy
 Basic components
 Data set components
 Data mining models
 Operations on data model
 Concluding remarks
October 15, 2008 Data Mining: Concepts and Techniques 1
Why OLE DB for Data
Mining?
 Industry standard is critical for data mining
development, usage, interoperability, and
exchange
 OLEDB for DM is a natural evolution from OLEDB
and OLDB for OLAP
 Building mining applications over relational
databases is nontrivial
 Need different customized data mining
algorithms and methods
 Significant work on the part of application
builders
 Goal: ease the burden
October 15, 2008 of developing
Data Mining: Concepts and Techniques mining 2
Motivation of OLE DB for DM

 Facilitate deployment of data mining models


 Generating data mining models
 Store, maintain and refresh models as data is
updated
 Programmatically use the model on other data
set
 Browse models
 Enable enterprise application developers to
participate in building data mining solutions

October 15, 2008 Data Mining: Concepts and Techniques 3


Features of OLE DB for DM

 Independent of provider or software


 Not specialized to any specific mining
model
 Structured to cater to all well-known
mining models
 Part of upcoming release of Microsoft SQL
Server 2000

October 15, 2008 Data Mining: Concepts and Techniques 4


Overview

Data mining
 Core relational engine
applications
exposes OLE DB in a
language-based API OLE DB OLAP/DM
 Analysis server exposes
OLE DB OLAP and OLE DB Analysis Server

DM OLE DB
 Maintain SQL metaphor RDB engine
 Reuse existing
October 15, 2008
notions
Data Mining: Concepts and Techniques 5
Key Operations to Support Data
Mining Models

 Define a mining model


 Attributes to be predicted

 Attributes to be used for prediction

 Algorithm used to build the model

 Populate a mining model from training


data
 Predict attributes for new data
 Browse a mining model fro reporting and
visualization
October 15, 2008 Data Mining: Concepts and Techniques 6
DMM As Analogous to A Table in
SQL
 Create a data mining module object
 CREATE MINING MODEL [model_name]

 Insert training data into the model and train it


 INSERT INTO [model_name]

 Use the data mining model


 SELECT relation_name.[id],

[model_name].[predict_attr]
 consult DMM content in order to make predictions

and browse statistics obtained by the model


 Using DELETE to empty/reset
 Predictions on datasets: prediction join between a
model and a data set (tables)
 Deploy DMM by just writing SQL queries!
October 15, 2008 Data Mining: Concepts and Techniques 7
Two Basic Components

 Cases/caseset: input data


 A table or nested tables (for hierarchical data)
 Data mining model (DMM): a special type of table
 A caseset is associated with a DMM and meta-info
while creating a DMM
 Save mining algorithm and resulting abstraction
instead of data itself
 Fundamental operations: CREATE, INSERT INTO,
PREDICTION JOIN, SELECT, DELETE FROM, and
DROP
October 15, 2008 Data Mining: Concepts and Techniques 8
Flatterned Representation of
Caseset
Customer
Product
s
Customer Purchases
ID
Gender Customer ID
Product Problem: Lots of replication!
Hair Color Name
Age Quantity

Age Prob Product Type


Car
CID Gend Hair Age Age prob Prod Quan Type Car
Car prob
Owernershi 1 Male Black 35 100% TV 1 Elec Car 100%
p 1 Male Black 35 100% VCR 1 Elec Car 100%
Customer ID
1 Male Black 35 100% Ham 6 Food Car 100%
Car 1 Male Black 35 100% TV 1 Elec Van 50%
Car Prob 1 Male Black 35 100% VCR 1 Elec Van 50%
1 Male Black 35 100% Ham 6 Food Van 50%

October 15, 2008 Data Mining: Concepts and Techniques 9


Logical Nested Table
Representation of Caseset

 Use Data Shaping Service to generate a


hierarchical rowset
 Part of Microsoft Data Access
Components (MDAC) products
Product Car
CID Gend Hair Age Age prob Purchases Ownership
Car
Prod Quan Type Car
prob
TV 1 Elec Car 100%
1 Male Black 35 100% VCR 1 Elec
Van 50%
Ham 6 Food

October 15, 2008 Data Mining: Concepts and Techniques 10


More About Nested Table

 Not necessary for the storage subsystem


to support nested records
 Cases are only instantiated as nested
rowsets prior to training/predicting data
mining models
 Same physical data may be used to
generate different casesets

October 15, 2008 Data Mining: Concepts and Techniques 11


Defining A Data Mining Model

 The name of the model


 The algorithm and parameters
 The columns of caseset and the
relationships among columns
 “Source columns” and “prediction
columns”

October 15, 2008 Data Mining: Concepts and Techniques 12


Example

EATE MINING MODEL [Age Prediction] %Name of Model

stomer ID] LONG KEY, %source column


nder] TEXT DISCRETE, %source column
e] Double DISCRETIZED() PREDICT, %prediction colu
oduct Purchases] TABLE %source column

oduct Name] TEXT KEY, %source column


uantity] DOUBLE NORMAL CONTINUOUS, %source column
oduct Type] TEXT DISCRETE RELATED TO [Product Name]
%source column

NG [Decision_Trees_101] %Mining algorithm use

October 15, 2008 Data Mining: Concepts and Techniques 13


Column Specifiers

 KEY
 ATTRIBUTE
 RELATION (RELATED TO clause)
 QUALIFIER (OF clause)
 PROBABILITY: [0, 1]

 VARIANCE

 SUPPORT

 PROBABILITY-VARIANCE

 ORDER

 TABLE

October 15, 2008 Data Mining: Concepts and Techniques 14


Attribute Types

 DISCRETE
 ORDERED
 CYCLICAL
 CONTINOUS
 DISCRETIZED
 SEQUENCE_TIME

October 15, 2008 Data Mining: Concepts and Techniques 15


Populating A DMM

 Use INSERT INTO statement

 Consuming a case using the data mining

model

 Use SHAPE statement to create the

nested table from the input data

October 15, 2008 Data Mining: Concepts and Techniques 16


Example: Populating a DMM

TO [Age Prediction]

r ID], [Gender], [Age],


Purchases](SKIP, [Product Name], [Quantity], [Product Type])

[Customer ID], [Gender], [Age] FROM Customers ORDER BY [Cust

[CustID], {product Name], [Quantity], [Product Type] FROM Sale


Y [CustID]}
Customer ID] TO [CustID]

ct Purchases]

October 15, 2008 Data Mining: Concepts and Techniques 17


Using Data Model to Predict

 Prediction join
 Prediction on dataset D using DMM M

 Different to equi-join

 DMM: a “truth table”


 SELECT statement associated with
PREDICTION JOIN specifies values
extracted from DMM

October 15, 2008 Data Mining: Concepts and Techniques 18


Example: Using a DMM in
Prediction

SELECT t.[Customer ID], [Age Prediction].[Age]


FROM [Age Prediction]
PRECTION JOIN
(SHAPE
{SELECT [Customer ID], [Gender] FROM Customers ORDER BY
[Customer ID]}
APPEND
(
{SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER
BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
)
AS t
ON [Age Prediction].[Gender]=t.[Gender] AND
[Age Prediction].[Product Purchases].[Product Name]=t.[Product
Purchases].[Product Name]
October 15, 2008
AND
Data Mining: Concepts and Techniques 19
Browsing DMM

 What is in a DMM?

 Rules, formulas, trees, …, etc

 Browsing DMM

 Visualization

October 15, 2008 Data Mining: Concepts and Techniques 20


Concluding Remarks
 OLE DB for DM integrates data mining and
database systems
 A good standard for mining application

builders
 How can we be involved?

 Provide association/sequential pattern

mining modules for OLE DB for DM?


 Design more concrete language

primitives?
 References

October 15, [Link]


 2008 Data Mining: Concepts and Techniques 21

You might also like