Appendix A: An Introduction to
Microsoft’s OLE OLDB for Data
Mining
Introduction
Overview and design philosophy
Basic components
Data set components
Data mining models
Operations on data model
Concluding remarks
October 15, 2008 Data Mining: Concepts and Techniques 1
Why OLE DB for Data
Mining?
Industry standard is critical for data mining
development, usage, interoperability, and
exchange
OLEDB for DM is a natural evolution from OLEDB
and OLDB for OLAP
Building mining applications over relational
databases is nontrivial
Need different customized data mining
algorithms and methods
Significant work on the part of application
builders
Goal: ease the burden
October 15, 2008 of developing
Data Mining: Concepts and Techniques mining 2
Motivation of OLE DB for DM
Facilitate deployment of data mining models
Generating data mining models
Store, maintain and refresh models as data is
updated
Programmatically use the model on other data
set
Browse models
Enable enterprise application developers to
participate in building data mining solutions
October 15, 2008 Data Mining: Concepts and Techniques 3
Features of OLE DB for DM
Independent of provider or software
Not specialized to any specific mining
model
Structured to cater to all well-known
mining models
Part of upcoming release of Microsoft SQL
Server 2000
October 15, 2008 Data Mining: Concepts and Techniques 4
Overview
Data mining
Core relational engine
applications
exposes OLE DB in a
language-based API OLE DB OLAP/DM
Analysis server exposes
OLE DB OLAP and OLE DB Analysis Server
DM OLE DB
Maintain SQL metaphor RDB engine
Reuse existing
October 15, 2008
notions
Data Mining: Concepts and Techniques 5
Key Operations to Support Data
Mining Models
Define a mining model
Attributes to be predicted
Attributes to be used for prediction
Algorithm used to build the model
Populate a mining model from training
data
Predict attributes for new data
Browse a mining model fro reporting and
visualization
October 15, 2008 Data Mining: Concepts and Techniques 6
DMM As Analogous to A Table in
SQL
Create a data mining module object
CREATE MINING MODEL [model_name]
Insert training data into the model and train it
INSERT INTO [model_name]
Use the data mining model
SELECT relation_name.[id],
[model_name].[predict_attr]
consult DMM content in order to make predictions
and browse statistics obtained by the model
Using DELETE to empty/reset
Predictions on datasets: prediction join between a
model and a data set (tables)
Deploy DMM by just writing SQL queries!
October 15, 2008 Data Mining: Concepts and Techniques 7
Two Basic Components
Cases/caseset: input data
A table or nested tables (for hierarchical data)
Data mining model (DMM): a special type of table
A caseset is associated with a DMM and meta-info
while creating a DMM
Save mining algorithm and resulting abstraction
instead of data itself
Fundamental operations: CREATE, INSERT INTO,
PREDICTION JOIN, SELECT, DELETE FROM, and
DROP
October 15, 2008 Data Mining: Concepts and Techniques 8
Flatterned Representation of
Caseset
Customer
Product
s
Customer Purchases
ID
Gender Customer ID
Product Problem: Lots of replication!
Hair Color Name
Age Quantity
Age Prob Product Type
Car
CID Gend Hair Age Age prob Prod Quan Type Car
Car prob
Owernershi 1 Male Black 35 100% TV 1 Elec Car 100%
p 1 Male Black 35 100% VCR 1 Elec Car 100%
Customer ID
1 Male Black 35 100% Ham 6 Food Car 100%
Car 1 Male Black 35 100% TV 1 Elec Van 50%
Car Prob 1 Male Black 35 100% VCR 1 Elec Van 50%
1 Male Black 35 100% Ham 6 Food Van 50%
October 15, 2008 Data Mining: Concepts and Techniques 9
Logical Nested Table
Representation of Caseset
Use Data Shaping Service to generate a
hierarchical rowset
Part of Microsoft Data Access
Components (MDAC) products
Product Car
CID Gend Hair Age Age prob Purchases Ownership
Car
Prod Quan Type Car
prob
TV 1 Elec Car 100%
1 Male Black 35 100% VCR 1 Elec
Van 50%
Ham 6 Food
October 15, 2008 Data Mining: Concepts and Techniques 10
More About Nested Table
Not necessary for the storage subsystem
to support nested records
Cases are only instantiated as nested
rowsets prior to training/predicting data
mining models
Same physical data may be used to
generate different casesets
October 15, 2008 Data Mining: Concepts and Techniques 11
Defining A Data Mining Model
The name of the model
The algorithm and parameters
The columns of caseset and the
relationships among columns
“Source columns” and “prediction
columns”
October 15, 2008 Data Mining: Concepts and Techniques 12
Example
EATE MINING MODEL [Age Prediction] %Name of Model
stomer ID] LONG KEY, %source column
nder] TEXT DISCRETE, %source column
e] Double DISCRETIZED() PREDICT, %prediction colu
oduct Purchases] TABLE %source column
oduct Name] TEXT KEY, %source column
uantity] DOUBLE NORMAL CONTINUOUS, %source column
oduct Type] TEXT DISCRETE RELATED TO [Product Name]
%source column
NG [Decision_Trees_101] %Mining algorithm use
October 15, 2008 Data Mining: Concepts and Techniques 13
Column Specifiers
KEY
ATTRIBUTE
RELATION (RELATED TO clause)
QUALIFIER (OF clause)
PROBABILITY: [0, 1]
VARIANCE
SUPPORT
PROBABILITY-VARIANCE
ORDER
TABLE
October 15, 2008 Data Mining: Concepts and Techniques 14
Attribute Types
DISCRETE
ORDERED
CYCLICAL
CONTINOUS
DISCRETIZED
SEQUENCE_TIME
October 15, 2008 Data Mining: Concepts and Techniques 15
Populating A DMM
Use INSERT INTO statement
Consuming a case using the data mining
model
Use SHAPE statement to create the
nested table from the input data
October 15, 2008 Data Mining: Concepts and Techniques 16
Example: Populating a DMM
TO [Age Prediction]
r ID], [Gender], [Age],
Purchases](SKIP, [Product Name], [Quantity], [Product Type])
[Customer ID], [Gender], [Age] FROM Customers ORDER BY [Cust
[CustID], {product Name], [Quantity], [Product Type] FROM Sale
Y [CustID]}
Customer ID] TO [CustID]
ct Purchases]
October 15, 2008 Data Mining: Concepts and Techniques 17
Using Data Model to Predict
Prediction join
Prediction on dataset D using DMM M
Different to equi-join
DMM: a “truth table”
SELECT statement associated with
PREDICTION JOIN specifies values
extracted from DMM
October 15, 2008 Data Mining: Concepts and Techniques 18
Example: Using a DMM in
Prediction
SELECT t.[Customer ID], [Age Prediction].[Age]
FROM [Age Prediction]
PRECTION JOIN
(SHAPE
{SELECT [Customer ID], [Gender] FROM Customers ORDER BY
[Customer ID]}
APPEND
(
{SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER
BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
)
AS t
ON [Age Prediction].[Gender]=t.[Gender] AND
[Age Prediction].[Product Purchases].[Product Name]=t.[Product
Purchases].[Product Name]
October 15, 2008
AND
Data Mining: Concepts and Techniques 19
Browsing DMM
What is in a DMM?
Rules, formulas, trees, …, etc
Browsing DMM
Visualization
October 15, 2008 Data Mining: Concepts and Techniques 20
Concluding Remarks
OLE DB for DM integrates data mining and
database systems
A good standard for mining application
builders
How can we be involved?
Provide association/sequential pattern
mining modules for OLE DB for DM?
Design more concrete language
primitives?
References
October 15, [Link]
2008 Data Mining: Concepts and Techniques 21