0% found this document useful (0 votes)
12 views14 pages

Multibase: Integrating Distributed Databases

Multibase is a software system for integrating access to existing, heterogeneous, distributed databases without requiring changes to the local databases. It provides users with a unified global schema and query language to access multiple databases in a unified way by mapping queries to the individual databases and schemas.

Uploaded by

sami khan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

Multibase: Integrating Distributed Databases

Multibase is a software system for integrating access to existing, heterogeneous, distributed databases without requiring changes to the local databases. It provides users with a unified global schema and query language to access multiple databases in a unified way by mapping queries to the individual databases and schemas.

Uploaded by

sami khan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Multibase—integrating heterogeneous distributed

database systems*
by JOHN MILES SMITH, PHILIP A. BERNSTEIN, UMESHWAR DAYAL, NATHAN
GOODMAN, TERRY LANDERS, KEN W. T. LIN, and EUGENE WONG
Computer Corporation of America
Cambridge, Massachusetts

ABSTRACT dent databases, each with its own schema. Such databases are
nonintegrated. Furthermore, these databases may be man-
Multibase is a software system for integrating access to pre- aged by different database management systems (DBMS),
existing, heterogeneous, distributed databases. The system perhaps on different hardware. In this case, in addition to
suppresses differences of DBMS, language, and data models being nonintegrated the databases are distributed and hetero-
among the databases and provides users with a unified global geneous. Thus, the real world of nonintegrated, hetero-
schema and a single high-level query language. Autonomy for geneous, distributed databases differs greatly from the more
updating is retained with the local databases. The architecture ideal world of an integrated database.
of Multibase does not require any changes to local databases Nonintegrated, heterogeneous, distributed databases arise
or DBMSs. There are three principal research goals of the for several reasons. First, many of these databases were cre-
project. The first goal is to develop appropriate language ated before the benefits of integrated databases were well
constructs for accessing and integrating heterogeneous data- understood. In those days, total integration was not a prin-
bases. The second goal is to discover effective global and local cipal database design goal. Second, the lack of a central data-
optimization techniques. The final goal is to design methods base administrator for some enterprises has made it difficult
for handling incompatible data representations and inconsis- for independent organizations within an enterprise to produce
tent data. Currently the project is in the first year of a planned an integrated database suitable for all of them. Third, the
three year effort. This paper describes the basic architecture large size of many data processing applications has made dis-
of Multibase and identifies some of the avenues to be taken in tribution a necessity, simply to handle the volume of work.
subsequent research. Since integrated distributed DBMSs have not been available,
it has been necessary to implement applications on different
machines. Since different applications often have different
1. INTRODUCTION performance and functionality requirements, different
DBMSs were often selected to run on these machines to meet
these different requirements. Many data processing organiza-
What is Multibase? tions have experienced these problems, so there are many
nonintegrated, heterogeneous, distributed databases in the
The database approach to data processing requires that all world.
of the data relevant to an enterprise be stored in an integrated A principal problem in using databases of this type is that
database. By "integrated," we mean that a single schema of integrated retrieval. In such databases, each independent
(i.e., database description) describes the entire database, that database has its own schema, expressed in its own data model,
all accesses to the database are expressed relative to that and can be accessed only by its own retrieval language. Since
schema, and that such accesses are processed against a single different databases in general have different schemata, differ-
(logical) copy of the database. Unfortunately, in the real ent data models, and different retrieval languages, many diffi-
world many databases are not integrated. Often, the data culties arise in formulating and implementing retrieval re-
relevant to an enterprise is implemented by many indepen- quests (called queries) that require data from more than one
database. These difficulties include the following: resolving
* This research was jointly supported by the Defense Advanced Research incompatibilities between the databases, such as differences
Projects Agency of the Department of Defense and the Naval Electronic Sys- of data types and conflicting schema names; resolving incon-
tems Command and was monitored by the Naval Electronic Systems Command sistencies between copies of the same information stored in
under Contract No. N00039-80-C-0402. The views and conclusions contained in different databases; and transforming a query expressed in the
this document are those of the authors and should not be interpreted as neces-
sarily representing the official policies, either expressed or implied, of the
user's language4nto a set of queries expressed in the many
Defense Advanced Research Projects Agency or the Naval Electronic Systems different languages supported by the different sites. Imple-
Command or the U.S. Government. menting such a query usually consumes months of program-

487
488 National Computer Conference, 1981

al schema and to define a mapping from the local databases to


global schema the global schema. The run-time query processing subsystem
then uses the mapping definition to translate global queries
into local queries, ensuring that the local queries are executed
correctly and efficiently by local DBMSs. The schema design
aid is discussed first.
local ... local integration
schema schema schema

Schema Architecture

The Multibase architecture has three levels of schemata, a


local ... local global schema (GS) at the top level, an integration schema
host host
schema schema (IS) and one local schema (LS) per local database at the
middle level, and one local host schema (LHS) per local data-
Figure 1—Schema architecture base at the bottom level. These components and their inter-
relationships are depicted in Figure 1.
The local host schemata are the original existing schemata
mirig time, making it a very expensive activity. Sometimes, defined in local data models and used by the local DBMSs.
the necessary effort is so great that implementing the query is For example, they can be relational, file, or CODASYL sche-
not feasible at all. mata. Each of these LHSs is translated into a local schema
Multibase is a software system that helps integrate non- (LS) defined in the Functional Data Model. By expressing the
integrated, heterogeneous, distributed databases. Its main LSs in a single data model, higher levels of the system need
goal is to present the illusion of an integrated database to not be concerned with data model differences among the local
users without requiring that the database be physically inte- DBMSs. In addition, there is an integration schema that de-
grated. It accomplishes this by allowing users to view the scribes a database containing information needed for integrat-
database through a single global schema and by allowing them ing databases. For example, suppose one database records the
to access the data using a high level query language. Queries speed of ships in miles per hour, while the other records it in
posed in this language are entirely processed by Multibase as kilometers per hour. To integrate these two databases, we
if the database were integrated, homogeneous, and non-dis- need information about the mapping between these two
tributed. Multibase uses the Functional Data Model1 to define scales. This information is stored in the integration database.
the global schema, and the language DAPLEX1 as the high The LSs and IS are mapped, via a view mapping, into the
level query language. global schema (GS). The GS allows users to pose queries
against what appears to be a homogeneous and integrated
database. Roughly speaking, the LHS to LS mapping pro-
Implementation Objectives vides homogeneity and the LS and IS to GS mapping provides
integration. The schema design aid provides tools to the data-
There are many approaches to the design of the Multibase base designer to define LSs, the GS, and the mapping among
system. In deciding which approach to choose, we begin with them and the LHSs.
the following design objectives.

1. Generality: we do not want to design an application- Query Processing Architecture


specific Multibase system. Instead, we want to provide
powerful generalized tools that can be used to integrate The architecture of the run-time query processing sub-
various database systems for various applications with a system consists of the Multibase software and local DBMSs.
minimum of programming effort.
2. Extendability: we want a design that allows expansion of
functionality without major modification. There are global queries
areas in the Multibase design where substantial research
effort is still required, so we must be able to add addi- >'
tional features to the Multibase system as we learn more Multibase
about the problems. Software
3. Compatibility: we want a design that does not render
existing software invalid, because such software repre-
sents a very large investment. Thus, we must leave the \. local queries
existing interface to the local DBMS intact.
>'
The proposed architecture of the Multibase system consists
of two basic components: a schema design aid and a run-time local
DBMS
local
DBMS
local
DBMS
query processing subsystem. The schema design aid provides
tools to the "integrated" database designer to design the glob- Figure 2—Run-time query processing subsystem
Integrating Heterogeneous Distributed Database Systems 489

These components and their interrelationships are depicted in Organization


Figure 2. The users submit queries over the global schema
(called global queries) to the Multibase software, which trans- The architecture of the Multibase system is expanded in
lates them into subqueries over local schemata (called local more detail in Section 2. The process of mapping each LHS to
queries). These local queries are then sent to local DBMSs to a LS and merging LSs into a GS is discussed in Section 3.
be executed. Section 3 also discusses the problem of data incompatibility
Since the global queries are posed against the global schema and inconsistency. The method by which user queries are
without any knowledge of the distribution of the data and the translated into efficient local queries is discussed in Section 4.
availability of "fast access paths," the Multibase software Section 5 is a summary.
must optimize queries so they can be executed efficiently. In
addition, the translation process must also be correct; that is,
the local queries must retrieve exactly the information that the 2. QUERY PROCESSING ARCHITECTURE
original global query requests.
The architecture of the Multibase run-time subsystem consists
of
Meeting the Objectives
1. a query translator,
The proposed architecture meets the objective of gener- 2. a query processor,
ality. The only component of the Multibase system that is 3. a local database interface (LDI) for each local DBMS,
customized for the application is the global schema and its and
mapping definition to the local schemata. The only com- 4. local DBMSs.
ponent of Multibase that is customized for the local DBMSs
is the interface software that allows Multibase to commu-
nicate with the heterogeneous DBMSs in a single language. A global query references entity types and functions de-
These arc only small components of the-MuItibase system. fined in the global schema. Before it can be processed, it must
Thus, most of Multibase is neither application-specific nor be translated by the query translator into a query referencing
DBMS-specific. Multibase also meets the objective of com- only entity types and functions defined in the local schemata.
patibility, because local databases are not modified; there- In other words, the query translator translates a global query
fore, existing application programs can still access local data- over the global schema into a global query over the disjoint
bases through local DBMSs. And as the details of the union of local schemata. The query processor decomposes the
architecture are discussed in later sections, it will become global query over the disjoint union of local schemata into
clear that the objective of extendability is also met. individual local queries over local schemata. The query pro-
cessor also does query optimization and coordinates the
execution of local queries. The LDI translates local queries
received from the query processor into queries expressed in
Project Status the local DML and translates the results of the local queries
into a format expected by the query processor. These com-
The Multibase project is a three-year effort. Within the first ponents and their interrelationships are depicted in Figure 3.
two years, the research problems in the system design will be
resolved and evaluated, using a "breadboard" implementa-
tion of the system. In the final year, a revised design will be
developed and implemented in ADA. The ADA version will The User Interface
be made available for experimental testing within the Navy
"Command and Control" environment. The global schema is expressed in the functional data mod-
It is anticipated that the major research problems are el.1 In this data model, a schema is composed of entity types
and functions between entity types. Each entity type contains
1. basic architecture of the system, a set of entities, s& functions map entities into entities. Func-
2. global and local optimization, and tions can be single-valued or multi-valued, and can be partially
3. handling incompatible data. defined or totally defined.
The functional data model was selected because it embodies
At the time of this writing, an architecture has been designed the main structures xrf both theJlat file data models, such as
that supports a restricted version of DAPLEX with reason- the relational model, and the link structured data models,
able efficiency and that can be tailored to handle certain kinds such as CODASYL. Entity types correspond roughly to
of data incompatibility. This basic architecture is currently relations in the relational model or record types in the
being implemented as a breadboard system. Subsequently, CODASYL model. Functions correspond to owner-coupled
research will be devoted to removing the restrictions on DA- sets in the CODASYL model.
PLEX and investigating algorithms for processing incompat- The query language that we use with the functional data
ible data. The breadboard system will then be enhanced to model is called DAPLEX. DAPLEX is a high level language
include the new capabilities. This paper describes the basic that operates on data in the functional data model and is
architecture developed to date. designed to be especially easy to use by end users.
490 National Computer Conference, 1981

use the same language DAPLEX as both the query and map-
query over global schema ping language. The process of constructing the global schema
from the local schemata is discussed in Section 3.

query translator

Query Processor
query over disjoint union of LSs & IS
The query processor translates a query over the disjoint
union of LSs and IS into a query processing strategy. This
query processor strategy includes the following: a set of queries, each of which
is posed against exactly one LS or the IS; a set of "move"
query over IS
operations to ship the results of these queries between the
query over
LSI local DBMSs and the query processor; and a set of queries
that is executed locally by the query processor to integrate the
results of the LS and IS queries. The main goal of this trans-
lation is to minimize the total cost of evaluating the query,
LDI1 LDIn LDI
where cost is measured by local processing time and commu-
query over LHS
nication volume.
query over
LHS1 A query processing strategy is produced in two steps. First,
the query is translated into an internal representation called a
query graph. Using this representation, the query processor
DBMS1 DBMSn
isolates those subqueries of the given query (which are essen-
Figure 3—Run-time query processing subsystem tially subgraphs of the query graph) that can be entirely eval-
uated at one local DBMS. Thus, the result of the first step is
the set of single-site subqueries of the given query.
The second step is to combine the single-site queries with
move operations and local queries issued by the query pro-
Query Translator cessor. Move operations serve two purposes. First, they are
used to gather the results of the single-site queries back to the
The query translator receives global queries expressed in query processor. These results can be integrated by the query
DAPLEX over the GS and translates them into queries ex- processor by executing a query local to itself. The integrated
pressed in an internal language over the disjoint union of LSs results may be the answer to the query, in which case they are
and IS. returned to the user. Second, they may be used as input to
To perform the translation, the query translator must use other single-site queries. In this case, a move operation is
the mapping that defines how entity types and functions of the issued to ship the data to the local DBMS that needs it. The
GS are constituted from the entity types and functions of the method by which single-site queries, move operations, and
LS and the IS. The query translator uses these mapping defi- queries local to the query processor are sequenced to produce
nitions to substitute global entity types and global functions in a correct and efficient strategy is discussed in Section 4.
the global query by their mapping definitions. The substi-
tution results in a query containing only entity types and func-
tions of the LSs and the IS. Therefore references by the global
query to entities in the GS are now expressed as references to Local Database Interface (LDI)
the actual entities at particular sites that implement the global
GS. Any extra data needed from the integration database to Local queries posed against the LSs are sent by the query
resolve incompatibilities among LSs is now explicitly refer- processor to the LDIs in an internal format. The LDI trans-
enced in the translated query. lates these local queries into programs in the local DML and
The query produced by the query translator only references programming language over the local host schema (LHS).
data in the LS and the IS. Thus, we can imagine that this query This translation is optimized to minimize the processing time
is posed against a database state that is the disjoint union of of the translated query. When the local DBMS uses a high
the LSs together with the IS. This disjoint union is a homoge- level (i.e., set-at-a-time) language, such as DAPLEX, this
neous and centralized view of the distributed heterogeneous translation is fairly direct. However, when the local DBMS
database. uses a low level (i.e., record-at-a-time) language, such as
The language used for defining the mapping between sche- CODASYL DML embedded in COBOL, this translation may
mata must be compatible with the global DML. Otherwise, it be quite complex and may require nontrivial optimization.
would be awkward to translate the query from the GS to LSs Translation methods for a file system and CODASYL lan-
and IS using conventional query modification techniques. guage are described in Section 4.
(Query modification composes the given query, which is a To do the translation, the LDI must have information about
function from GS states to answer states, with the mapping how entity types and functions in the LS are mapped to ob-
from LS and IS states to GS states, to produce a query from jects in the LHS. These mappings are defined using the rules
LS and IS states to answer states.2) Therefore, we propose to discussed below.
Integrating Heterogeneous Distributed Database Systems 491

3. SCHEMA INTEGRATION ARCHITECTURE record type corresponds to an entity type, and the attributes
of the record type correspond to functions defined on the
"Schema Integration" is the process of defining a global sche- entity type.
ma and its mapping from the existing local schemata. The If an attribute of a record type is a key (in CODASYL '
general architecture of this design process is discussed in this terminology, a key is the data item(s) declared "NO DUPLI-
section. CATE ALLOWED") then the corresponding function must
There is one local host schema (LHS) for each local data- be a totally defined one-to-one mapping. If the attribute is a
base. Each LHS can be expressed in a relational, CODASYL, repeating group (declared to have multiple occurrences in a
or a file language. To merge these LHSs we must convert them CODASYL model), then the corresponding function is a set-
into a common data model first. Otherwise, we would be valued function.
mixing relations from a relational model with record types and A set type in the CODASYL model is a mapping between
set types from a CODASYL model. Thus the first step of an owner record type and one or several member record
schema integration is to translate LHSs into Local Schemata types. A set type maps an owner record to a set of member
(LS) defined in the Functional Data Model of DAPLEX. records, or, conversely, a set type maps a member record to
The second step is to merge LSs into a GS. To do this, an a unique owner record. Therefore, a set type resembles a
integration schema which defines an integration database is function that maps an owner entity to a set of member enti-
often needed. An integration database contains: information ties, or, conversely, maps a member entity to a unique owner
about mapping between different scales used by different LSs entity.
for the same entity type; statistical information about im- In a CODASYL model, a set type implies not only certain
precise data; and other information needed for reconciling semantic information but also the existence of access paths.
inconsistency between copies of the same data stored in differ- For example a set type "work-in" between "department" and
ent databases. The integration schema and LSs are then used "employee" record types implies that the employees owned
to define a global schema. by a department work in that department. But it also implies
The overall architecture of schema integration consists of that there is an access path from a department record to the
employee records owned by that department and another ac-
a) a global schema, cess path from each employee record to its own department
b) a mapping language, record. Since the LSs will be used for query optimization, we
c) local schemata (LS) and an integration schema (IS),
d) a mechanized local-to-host schema translator, and
e) local host schemata (LHS) and local DBMSs.

These components and their interrelationships are depicted in users


Figure 4. The local host schemata are translated into local
schemata by the mechanized local host schema translator, and T
local schemata and the IS are mapped into the GS by using the
mapping language facility. Global Schema

Mapping between LHS and LS Mapping Language Facility

Since an LHS can be defined in the relational, CODASYL,


or file model, how an LHS is mapped into an LS depends on V
the data model used.
LSI LS2 ... Integration
Schema
CODASYL model

If an LHS is defined in the CODASYL model, then it Mechanized Local Host


consists of record types and set types. The functional data Schema Translator
model consists of entity types and functions on entity types.
So, to map the LHS into an LS one simply maps record types 1f
and set types into entity types and functions respectively.
The concept of record type in the CODASYL model is very LSH2
similar to that of entity type in the functional data model. A
record in the CODASYL model has a record ID, and one or
several attributes. The record ID uniquely identifies the
record, and the attributes describe properties of the record. >f
Similarly, in the functional data model, an entity is an object DBMS2
of interest, and the functions defined on the entity return
values that describe the properties of the entity. Therefore, a Figure 4—Schema integration architecture
492 National Computer Conference, 1981

must capture all this access path information in the LSs.


Therefore, for each set type in an LHS, not only a set-valued system
function from the owneFentity type to the member entity all-class^
type, but also a single-valued function from each of the mem-
ber entity types to the owner entity type must be defined in the all-ship
corresponding LS.
In a CODASYL model, a recoTthtype can be declared to shipclass
have a "LOCATION MODE CALC USING KEY." This
means that an index file is created for the key, and the record consistsof
type is directly accessible through the indexed key. Therefore, ship
for each record type with "CALC KEY" in the LHS, a system
set function of which the domain is the key value and the positions
range is the entity type (corresponding to the record type)
must be defined in the LS. This system set function will be
used only for query processing optimization. It is not visible
to the database designer. Therefore, it cannot be incorporated trackhist
into the global schema. This restriction is imposed to preserve
the data independence of the global schema.
For example, the CODASYL schema shown in Figure 5 is Shipclass Record Trackhist Record
translated into the schema in the functional data model shown *classname char(24) ** DTG char(10)
in Figure 6. In Figure 6, the inverse of a function F is denoted length char(6) speed char(3)
draft char(2) latitude char(5)
by "F-inv." beam char(3) longitude char(6)
displacement char(5) course char(3)
endurance char(3)
* primary key
Relational model ** key within a set
Ship Record
A relational database schema consists of a set of relation * UIC char (6)
definitions. To translate a relational LHS to a functional LS VCN char (5)
name char (26)
we essentially map each relation to an entity type. A tuple of type char (4)
a relation in a relational model is similar to an entity in a flag char (2)
owner char (2)
functional data model. A tuple is uniquely identified by its hull char (4)
primary key and has one or more attributes, just as an entity Figure 5—A CODASYL schema
has one or more functional values. Therefore, to map a re-
lational model LHS into a functional data model LS, for each
relation in the LHS an entity type is defined in the LS, and for
each attribute of the relation a function is defined on the
corresponding entity type. The range of the function is the to the record file. This system function is not visible to the
domain of the attribute. If the attribute is a primary key, then database designer; it is used only for query optimization.
the function must be totally defined and one-to-one. If it is a
candidate key, then the function can be partially defined, but Integration of LSs
it must still be one-to-one. In any case, due to the relational
format, the function must be single-valued, not set-valued. To integrate LSs into a global schema, the database de-
For example, the relational LHS shown in Figure 7 is trans- signer designs an integration schema that defines an integra-
lated into the functional data model LS shown in Figure 8. tion database. He then designs a global schema and defines it
in terms of the LSs and the Integration Schema by using the
view support facility.
File model An integration database contains information needed for
merging entity types and their functions. For example, two
A file model consists of record files and indexed fields entity types, El and E2, from two schemata are shown in
(keys) in those files. A record file consists of a set of records Figure 9. These two entity types represent information about
of the same type, which is similar to the concept of record type ships. There are two functions defined on each entity type;
in the CODASYL model or a relation in the relational model. one function returns the ship-id of a ship and the other returns
To map a file LHS to a functional data model LS, for each the ship-class of the ship. The ship-class of El and E2 are
record file in LHS a corresponding entity type must be defined coded differently. A sample of entities and their functional
in the LS, and for each field of the record file a function must values are also shown in Figure 9. To merge El and E2 into
be defined on the entity type. Since a key supports an access a single entity type, a uniform code must be defined, and the
path to the record file, for each key of a record file, a system two existing codes must be mapped to the new code. Defini-
function must be defined whose domain is the key field's tions of the new code and the mapping function are shown in
entity type and whose range is the entity type corresponding Figure 10, and a sample of the function is shown in Figure 11.
Integrating Heterogeneous Distributed Database Systems 493

type shipclass i s entity and the integration of incompatible data, are discussed in
classname string(1..24)
length stringd, subsequent sections.
draft stringd,
beam stringd,
displacement stringd,
endurance stringd, Merging Entity Types and Functions
consists-of set of ship;
end entity;
To merge two entity types, say El and E2 in Figure 9, into
type ship i s entity an entity type, say E in Figure 12, the database designer must
UIC s t r i n g d . .6) ;
VCN s t r i n g d . .5) ; first determine whether the set of entities of type El is disjoint
name stringd 26) ; from the set of entities of type E2. If El and E2 are disjoint,
type stringCl 4 ) ;
flag stringd 2); then E is simply the union of El and E2. If El and E2 are not
owner stringd 2); disjoint, then the condition under which two entities from El
hull : s t r i n g d . .4) ;
positions : se_£ Q£ trackhist; and E2 respectively are identical must be specified. To specify
consists-of_inv : shipclass; the condition under which entities are identical, entities of El
end entity:
and E2 must be able to be identified by their attributes.
Therefore, for each entity type to be merged, a function or
type system i s entity combination of functions of the entity type must be a primary
all-class : set of shipclass
all-ship : set of ship; key. Two entities from two entity types being merged can then
end entity;

type trackhist i s entity


DTG : string (1, ,10) ;
speed string (1, .3) ;
latitude string d , , 5 ) ;
type platform is entity
VesselName :string (1. .26)
longitude string (1, ,6) ; class rstring (1. .25)
course string (1, ,3) ;
positions_inv ship;
type :string (1. .6) ;
hull :string (1. .6);
end entity?
flag :string (1. .2);
Figure 6—A schema in the functional data model category :string (1. .4);
PIF :string (1. • 4 ) ;
NOSICID :string (1. . 8 ) ;
IRCS :string (1. . 8 ) ;
The definitions of the new code and the function are stored in end entity;
the integration database. A global schema defined on the two type position is entity
local schemata and the integration schema is shown in Figure PIF istring (1. . 4 ) ;
NOSICID :string (1. . 8 ) ;
12. DTG :string (1. .10)
As the discussion above indicates, integration of local sche- latitude tstring (1. . 5 ) ;
longitude :string (1. .6);
mata which are not disjoint involves two activities: merging of bearing :string (1. .3);
entity types and merging of their functions. These activities course :string (1. .3);
speed :string (1. .3);
are discussed in the next section. Two special problems re- end entity:
lating to schema integration, the creation of new entity types, Figure 8—A schema for the functional data model

Relation Platform
VesselName char(26)
class char(25)
type char(6) type EI is entity type E2 is entity
shipidl : integer; shipid2 : integer;
hull char(6) classl : codel; class2 : code2;
flag char(2) end entity: sM entity;
category char(4)

{ PIF
NOSICID
IRCS
char(4)
char(8)
char(8)
El
ell
shipidl

1212
classl

cl
E2
e21
shipid2

3440
class2

d2
el2 1240 c3 e22 3651 d3
Relation Position

{ PIF
NOSICID
DTG
latitude
char(4)
char(8)
char(10)
char(5)
el3 2341 c5 e23
Figure 9—Local schemata
4411 d4

longitude char(6)
bearing char(3) type code is. entity
course char(3) end entity;
speed char(3)
Define <a n£u function
* primary key f : (codel union code2) -> code.
Figure 7—A relational model Figure 10—Integration database
494 National Computer Conference, 1981

Sample of function f Local Schema 1:


codel rcode2 d2 d4 supplierl partsl
cl c2 c3 c4 c5 dl d3
code 1 2 3 4 5 6 7 8 9

Figure 11—Sample of function f

type E is entity type supplierl is entity


sname : stringOO);
shipid : integer; sno : integer;
class : code; supplying : S£i £f supplyl;
end entity:
£nd. entity?
Figure 12—Global schema type partsl is entity
pname : string(15);
pno : integer;
supplied-by : £e_fc J&£ supplyl;
be specified as identical if and only if they have identical end entity;

primary key values.


type supplyl is entity
In Figure 13, entity types El and E2 (which are assumed,to sno : integer;
overlap), are merged into an entity type E. The syntax used pno
end entity;
: integer;

is a subset of DAPLEX. Notice that "shipidl" and "shipid2"


are assumed to be primary keys of El and E2 respectively. Local schema 2:
Further, it is assumed that an El entity and an E2 entity are supply2
identical if and only if they have the same primary key values. I |
I sno I pno

Creation of a New Entity Type and its Functions type supply2 is entity
sno : integer;
pno : integer;
end entity;
Merging two entity types into a single entity type is a special
Figure 14—Two local schemata
case of creating a new entity type. Essentially, a new entity
type may be created which is a combination of the existing
entity types. However, this combination does not create new
objects in the database. Rather, it simply presents many exist- Suppose a global schema with two entity types, "supplier"
ing objects of different types as objects of a single type to the and "parts," is to be designed from two local schemata shown
global schema users. Properties of the new global entities are in Figure 14. The global schema must capture all the informa-
simply those that previously existed in the local schemata. tion contained in both schemata. Notice that in the second
However, in some cases, a database designer may want to schema, "supplier" and "parts" entities do not exist, but their
design a more sophisticated global schema in which new (vir- existence is implied by the presence of supplier numbers and
tual) objects derive their properties (attributes) from many part numbers: "sno" and "pno." To capture this information,
dissimilar existing objects. An example is used to illustrate virtual "supplier" and "parts" entities corresponding to those
this process, and general principles can be drawn from the "sno" and "pno" must be created in the global schema. A
example. definition of the global schema is shown in Figure 15. Notice
that in the definition primary keys "[Link]" and
"[Link]" are used to map the new entities to existing entities
in the first schema and the implied entities in the second
type E jjs entity
shipid : integer; schema.
class : code;
end entity;

for each x in E I nhexe not (shipidl(x) isJLn Data Incompatibility


shipid2(E2))
loop
create nex E(shipid => shipidi(x) Several sources of data incompatibility are discussed in this
class => f (classl(x)));
end loop; section. The objective of the discussion is to show how the
proposed architecture allows us to incorporate our present
for each x in E2
understanding of incompatible data into Multibase. The de-
loop tails of solutions to the problem are to be fully investigated
create nSM E(shipid => shipid2(x), later in the project.
Some sources of data imprecision are:
class => f(class2(x)));

end loop; 1. Scale difference. For example, in one database four val-
Figure 13—The mapping definition of entity type E ues (cold, cool, warm, hot) are used to classify climates
Integrating Heterogeneous Distributed Database Systems 495

type supplier is entity


sno : integer; different scales, (co!d,cool,warm,hot) and Fahrenheit, into a
supplying : ss± Stl parts;
end entity;
unified scale (temperature range, probability) by combining
E4 with IS2 and E5 with IS3. The function g could return all
type parts is entity
the (temperature range, probability) pairs from the two data-
name: string(15); bases without any further processing, as is shown in Figure 16.
no : integer;
SM entity; Alternatively, g could use some statistical technique to pro-
cess sets of (Temp range, probability) pairs, and return a
for each x in (sno(supplierl) union sno(supply2)) simpler but descriptive summary of those pairs. For example,
loop the function g could return the average value and the standard
create supplier (sno => x ) ;
£Hd loop; deviation of the distribution represented by these pairs; it can
iox £&c_h y in (pno(partsi) unioji pno(suppiy2)) make statistical estimation and return a confidence interval;
loop or it can do time series analysis and return information about
create parts (pno => y ) ; the spectral function.
end loop;
The above examples are merely illustrative of potentiaFdata"
fojL sash s in supplier loop integration problems and their solutions. More complete ap-
supplying(s) :+ (p in parts where (for some yl in supplyl:
sno(s) = sno(yl) and pno(p)= pno(yl)) or
proaches to the problem will be fully investigated later in the
(fot; some y2 in supply2 : project.
sno(s) = sno(y2) and pno(p) = pno(y2)));
end loop;
Figure 15—A global schema
4. RUN-TIME QUERY PROCESSING SUBSYSTEM

of cities, while in another database the average tem- Overall Architecture


peratures in Fahrenheit may be recorded.
2. Level of Abstraction. For example, in one database Now we will show how the schema mappings developed
"labor cost" and "material cost" may be recorded sepa- during schema integration are utilized to drive query pro-
rately, while in another they are combined into "total cessing over the global schema. As we explained in Section 2,
cost." Another example is recording an employee's the run-time subsystem consists of a query translator and a
"average salary" instead of his or her "salary history" query processor. Here we will expand these two components
for the previous five years. in further detail.
3. Inconsistency Among Copies of the Same Information. A "Global Database Manager" (GDM) is that part of the
Certain information about an entity may appear in sever- Multibase System which consists of the query translator, and
al databases, and the values may be different due to the query processor. A query over the global schema is nor-
timing, errors, obsolescence, etc. mally sent to the nearest site that has a Global Database
Manager (GDM). There may be one or more GDMs in a
There are many other sources of data incompatibility. Data Multibase system. A GDM stores a copy of global schema,
incompatibility must be resolved if different databases are to
be integrated. The architecture of schema integration devel-
oped previously can be extended to handle the problem.
Let El and E2 be two entity types, and fl and f2 be func- E4 (of LSI) IS2 (of integration database)
tions defined on El and E2 respectively. If El and E2 have cityl climate climate range of temp probability
been merged into an entity type E, then fl and f2 can be Boston cold cold 0 - 20 F 20%
merged into the function f defined on E as follows, Norfolk cool cold 20 - 40 F 40%
Dallas warm cold 40 - 60 F 25%
Miami hot cold 60 - 80 F 10%
f(e) = Tl(fl(e)) if e in E1-(E1 intersect E2) cold 80
0
-
-
100F
20 F
5%
10%
cool
T2(f2(e)) if e in E2-(E1 intersect E2) cool 20 - 40 F 20%
g(fl(e),f2(e)) if e in (El intersect E2)
E5 (of LS2) IS3 (of integration database)
The transformations Tl and T2 are typically used to map
the ranges of fl and f2 into a common range as discussed in the city2 mean temp mean temp range of temp probability
section "Merging Entity Types and Functions." On the other Denver 52 F 52 F 0 - 20 F 20%
Chicago 54 F 52 F 20 - 40 F 35%
hand, the function g is used to reconcile any inconsistencies Los Ang 75 F 52 F 40 - 60 F 30%
between the values of fl and f2 over the same entity. Typi-
cally, g will involve accessing data described in the integration
schema. E6 (of global schema)
For example, in Figure 16, the entity types E4 and E5 are city temp range probability
merged into the entity type E6 by using functions IS2 and IS3
Boston 0 - 20 F 20%
of the integration database. In the figure, the data values of Boston 20 - 40 F 40%
the entities and functions are shown in tabular form. In this
example, Tl and T2 transform the climate of cities from two Figure 16—Example of data incompatibility
496 National Computer Conference, 1981

local schemata, integration schema, and the mapping defini- data to a central site which has large memory and computing
tions among them. It uses this information to parse, translate, power and do most of the processing there. In doing this
and decompose queries over the global schema into local planning, the "ACCESS PLANNER" tries to produce steps
queries over local schemata, and coordinates execution of the which minimize the cost of processing the query. The meaning
local queries. The structure of a GDM and its interface with of "cost" depends on the individual systems being integrated.
local DBMSs is shown in Figure 17. It may mean the amount of data moved between sites, or the
A query expressed in DAPLEX over the global schema is amount of processing time.
first parsed by the parser and a parse tree is generated. Com- The execution of the access plan is coordinated by the
ponents of the parse tree, which are entities and functions of "EXECUTION STRATEGIST." It sequences the steps of
the global schema, are then replaced by their corresponding the access plan and it makes sure that the data needed by a
definitions, which are expressed in terms of the local schemata step are there before the step is initiated.
LSs. The result is a parse tree consisting of entities and func- The "EXECUTION STRATEGIST" communicates with
tions of the local schemata. The parser is part of the query local DBMSs through the Local Database Interface (LDI).
translator. The LDIs receive "data move" and "local processing" steps
The parse tree is then simplified to eliminate the inefficient from the "EXECUTION STRATEGIST," translate these
boolean components. For example, the boolean expression steps into programs in the local query language or Data Ma-
"(a > 5)or(a < 20)" is reduced to "true," and "(a>5)and nipulation Language (DML), or call local routines to process
(a < 2)" is reduced to "false."The query simplifier is also part these steps, and translate the results of these steps into the
of the query translator. format expected by the "EXECUTION STRATEGIST."
The parse tree is then decomposed by the decomposer into The LDI may reside in a GDM if the local site does not have
subtrees. Each subtree represents a local query referencing enough memory or cpu power; otherwise it resides with the
only entities and functions of a single local schema. individual local DBMS at the local site.
The "ACCESS PLANNER" transforms the local queries The query processor to be described in this section is orient-
into "data movement" and "local processing" steps. De- ed towards the initial breadboard system. It is designed to
pending on the memory size and processing power of each handle restricted versions of the user interface language and
individual site, and the capacity of the communication chan- view mapping language with reasonable efficiency. Subse-
nels, the "ACCESS PLANNER" may move data and distrib- quent research is needed to extend the query processor to
ute the computing load among several sites, or it may move efficiently handle the unrestricted languages.
Within the "Query Processor," the database is modelled as
a collection of entity types and links. A link L from entity type
R to entity type S is a function from entities of S to entities of
R; S is called the owner entity type and R is called the member
entity type relative to L. We assume that if L links R to S, then
L, R, and S are all stored at the same site. We also assume that
Query Translator there is a database schema describing the entity types and
links of the database.
rParser, View Global Schema We will sketch the Multibase query processing strategy in
Mapper, Query and Views three steps. First, we define the set of queries that can be
posed. Second, we define the set of basic operations that
Local Schemata
Multibase is capable of executing. Third, we describe how to
LSi translate a query into a sequence of basic operations that solve
the query. Finally, we describe how to translate a local query
posed over a CODASYL local host schema into a program in
Query Processor
Integration a low level Data Manipulation Language.
Schema
:Decomposer,
Access Planner
Query Optimizer

Queries
Workspace
A query consists of a target list and a qualification. A target
list consists of a set of function terms of the form A(R) where
EXECUTION STRATEGIST R is an entity type and A is a non-link function of R. A
qualification is a conjunction of selection clauses, join clauses,
and link clauses. A selection clause is a formula of the form
(A(R) op k) where A(R) is a function term, op is one of
LDI1 LDI2 LDI3 { = , ^ , < , > , ^ , ^ = } and k is a constant. A join clause is a
formula of the form (A(R) = B(S)) where A(R) and B(S) are
DBMS1 DBMS2 DBMS3 Integration function terms. A link clause is a formula of the form
Database
(L(R) = S) where L is a link from R to S.
Figure 17—Run time query processing subsystem Let r and s be entities in R and S respectively. We say that
Integrating Heterogeneous Distributed Database Systems 497

r satisfies the selection clause (A(R) op k) if the A-value of r Basic operations


is op-related to k (i.e., (A(r) op k)). We say that r and s satisfy
the join clause (A(R) = B(S)) if the A-value of r equals the There are three types of sites in the breadboard Multibase:
B-value of s (i.e., A(r) = B(s)). And we say that r and s satisfy File, CODASYL, and GDM. Each type of site is capable of
the link clause L(R) = S if L connects r and s (i.e., L(r) = s). executing a different set of basic operations. This section de-
Let R l , . . . , Rn be the entity types referenced by qual- scribes these basic operations.
ification q, and let r l , . . . ,rn be entities in R l , . . . ,Rn re-
spectively. We say that r l , . . . ,rn satisfy the qualification q if 1. File Select. If record type R is stored at a File site S, then
r l , . . . ,rn satisfy all of the clauses of q. the only operation that caft be applied to R at S is a
Let Q be a query consisting of target list T = ((Ajl(Ril), selection of the form
. . . ,Ajm(Rim)) and qualification q. Let R l , . . . ,Rn be the
entity types referenced in T and q. The answer to Q is the set R[(A1 = kl) and (A2 = k2) and... and (An = kn)].
of all tuples of the form ((Ajl(ril),... ,Ajm(rim))) such that
r l , . . . ,rn are in R l , . . . ,Rn (respectively) and r l , . . . ,rn satis- The result of the selection is a record type consisting of
fy q. Given a database R l , . . .,Rn and a query Q, our goal is. the set of all records r in R such that r[Ai] = ki for
to compute the answer to Q efficiently. i = l , . . . , n ; this result is always transmitted to the
The subset of DAPLEX that we have just described makes GDM.
the following simplifications: 2. File Semijoin. In principle, File select can be generalized
into File semijoin t>x performing selections iteratively.
Let R be a File file and S a GDM file, and suppose
1. Set expressions in range predicates and qualifications A l , . . . ,An are fields of R and S. Then the semijoin of
have been "flattened out," and quantifiers eliminated. R by S on A l , . . . ,An, denoted R[A1,... ,An]S, equals
This allows us to utilize existing view algorithms for re-
lational databases. Further research will be devoted to {r in R | (there exist s in S)
handling the novel aspects of view processing in the ([Link] = [Link]... [Link] = [Link])}.
DAPLEX functional model. This can be computed by the following program.
2. The type-subtype hierarchy is not explicitly handled. Result: = 0;
This hierarchy will be useful in the schema integration for each s in S
step. However, the mechanics of interpreting queries loop
against the hierarchy require further research.
kl: = s . A l , . . . ; kn: = [Link];
Result: = Result U R[(A1 = k l ) . . .
A query graph QG(N,E) is an undirected labelled graph (An = kn)];
that represents a query Q. The nodes, N, of QG are the entity end loop;
types referenced in Q. Each node is labelled by the entity type
In practice, this operation may place an unacceptable
name of the node, the non-link functions of the entity type
load on the File system and hence may not be usable.
that appear in the target list, and the selection clauses of Q's
3. CODASYL tree queries. The basic operation that can be
qualification that reference the entity type. The edge set E of
performed at a CODASYL site S is to solve a natural
QG contains one edge (R,S) for each join clause or link clause
tree query (definecLbelow), jeturning-the result to the
that references R and S. Each edge is labelled by its corre-
GDM. A natural tree query Q at site S has two proper-
sponding clause (s). ties: (1) All record types referenced in Q must be stored
A query is called natural if (a) join clauses are of the form at S. (2) Let Q' be Q minus its join clauses (i.e., all
(A(R) = A(S)), that is, the functions referenced in both terms clauses of Q' are selections or links), and let QG' be the
of a join clause have the same name; and (b) if A is a non-link query graph of Q'; then QG' must be a tree.
function of two entity types R and S, then A(R) and A(S) are
To solve a tree query Q using CODASYL DML, one
"connected" by a sequence of join clauses. There is a simple
essentially expands the cartesian product of the record
and efficient algorithm that, given a database description and
types referenced by Q and evaluates the qualification en
a query Q, renames the functions of the entity types where
each element of the cartesian product. We describe how
necessary to produce an equivalent natural query Q'; Q and
this cartesian product can be systematically generated in
Q' are equivalent in the sense that they produce the same
the section "Processing CODASYL Tree Queries."
answer for any database state (up to the renaming of fields).
We will therefore assume, without the loss of generality, that 4. CODASYL Tree Semijoins. The preceding operation
our queries are natural. Given that we deal only with natural can be generalized into a semi join-like operation. Let Q
queries, the edge labels corresponding to join clauses are be a CODASYL tree query and S a GDM record type,
unnecessary. Also target lists need only contain function and suppose A l , . . . ,An are fields of S and fields of
names, instead of function terms. record types of Q. Let Q' have the same qualification as
Q, and the target list augmented by A l , . . . ,An. Finally,
Given a join clause (A(R) = A(S)) and a selection clause let R' be the result of Q'. The semijoin of Q by S on
(A(R) op k), we can deduce that (S(A) op k). We assume that A l , . . . ,An, denoted Q < A l , . . . ,An], equals
the qualification of each query is augmented by all clauses that
can be deduced in this way. A simple and efficient transitive {r' in R' | (there exist s in S)
closure algorithm is sufficient for performing such deductions. (r'.A2 = s.A2)... (r'.An = [Link])}.
498 National Computer Conferenee+1981

This can be computed as follows. Suppose A l , . . . , A n be further decomposed into two or more tree queries. (In the
are fields of R l , . . . ,Rn respectively where R l , . . . ,Rn breadboard version of Multibase, we will only handle queries
are record types of Q. ( R l , . . . ,Rn need not be distinct.) whose CODASYL subqueries are tree queries; if some CO-
Augment the qualification of Q' by adding the clauses DASYL subquery is cyclic, the query cannot be processed.)
([Link] = kl)...([Link] = kn). And execute the fol- Having extracted the File and CODASYL subqueries, we
lowing program. must now choose an order for these subqueries to be exe-
cuted. As a first-cut solution, we propose to solve all File and
Result: = 0; CODASYL subqueries before processing the results of any of
for each s in S loop these subqueries at the GDM. This strategy will be an es-
kl: = [Link];....; kn: = [Link]; pecially poor performer if a File or CODASYL subquery has
Result: = Result U Q'; no selection clauses. For such cases, we recommend use of
end loop; File and CODASYL semijoin operations, so that the results
5. GDM Queries. The GDM can process any natural query of some subqueries can be used to reduce the cost of other
Q provided (1) all entity types referenced in Q are stored subqueries. However, this tactic brings us into the realm of
at the GDM, and (2) Q contains no link clauses. Suppose new query optimization algorithms and will require further
Q references entity types R l , . . . ,Rn. Q is processed by research.
constructing a request to the local DBMS (the Datacom-
puter for the initial breadboard system) of the form:
Processing CODASYL Tree Queries
for each rl in Rl where (selection clauses on Rl)
for each r2 in R2 where (selection clauses on R2) Let Q be a CODASYL tree query and QG its tree. The
and (join clauses on Rl and R2) following algorithm compiles Q into a program that solves Q.
The program contains statements of the form:
1. for r in set(s) loop... end loop ; where S owns R via
for each rn in Rn where (selection clauses onRn) set ;
and (join clauses on Rl and Rn) 2. r: = set inv(s); where R owns S via set. Note that set-inv
and (join clauses on R2 and Rn) is the inverse function of set and is always a function.

Algorithm
and (join clauses on Rn-1 and Rn).
print (target list). 1. Do a pre-order traversal of QG. The result is a list of the
nodes of QG. Call this list P.
It is important that the "for" statements be in a "reason- 2. Let R and S be nodes of QG; with R the parent of S.
able" order for performance reasons. Optimization Cases
techniques developed by Wong for the SDD-1 DM3 are R is the root of QG; replace "R" by "for r in R
directly applicable. loop" in P.
R owns S: replace "S" by "/or s in set(r)" in P.
S owns R: replace "S" by "s: = setinv(r)" in P.
Query Decomposition
3. Push loop independent assignments up as high as possi-
To solve a query Q, we must decompose it into a sequence ble.
of basic operations. Our basic strategy is to find subqueries of 4. Add an "output (target list)" statement, add selections,
Q that can be entirely solved at File and CODASYL sites, and joins as high as possible, tack on enough ends to
move the results of these subqueries to the GDM, and solve balance the fors.
the remainder of the query at the GDM.
To follow this strategy, we must isolate File and CODASYL As an example let QG be the query graph of Figure 18.
subqueries of Q. File subqueries are easy to find. We simply
find entity types in Q that are stored at File sites. For each 1. Preorder traversal: R,S,T,U,V.
such entity type R, we produce a subquery consisting of the 2. for r in R loop
selection clauses on R. for s in Ll(r) loop
Let QG be the query graph of Q. To find CODASYL t: = L2 inv(r)
subqueries, we begin by deleting from QG all entity types not for u in L3(t) loop
stored at a CODASYL site and all join clauses. Each con- v: = L4 inv(t)
nected component of the resulting graph includes entity types 3. Push up T and V; add an output statement; add ends to
and links that are stored at the same site, because no link can balance the fors,
connect two entity types stored at different sites (c.f., the for r in R loop
section on "Overall Architecture"). If a connected com- t: = L2_inv(r);
ponent is a tree, then it corresponds to a tree query and can v: = L4_inv(r)
be solved by the CODASYL site. If it has a cycle, then it must for s in Ll(r) loop
Heterogeneous Distributed Database Systems 499

R implemented in the initial breadboard version are also de-


scribed. Although additional research is required to fill in the
details of optimization and incompatible data handling, the
architecture already contains several innovative ideas in inte-
grating distributed heterogeneous databases. These include
the following:

1. the idea of using an integration database to resolve data


incompatibility;
2. the idea of using a mapping language to uniformly define
the global schema in terms of the local schemata and the
integration schema; and
Figure 18—A query graph
3. the idea of using query modification and query graph
decomposition to transform a global query into local
queries and queries over the integration database.
for u in L3(t) loop
output (target list);
end loop; REFERENCES
end loop;
end loop; 1. Shipman, D., "The Functional Data Model and the Data Language
DAPLEX", SIGMOD 79, Boston, MA, 1979.
2. Stonebraker, M.R.: "Implementation of Integrity Constraints and Views
5. SUMMARY by Query Modifications." Proc. ACM-SIGMOD Conf., San Jose, CA,
1975, pp. 65-78.
3. Wong, E., "Retrieving Dispersed Data from SDD-1: A System for Dis-
This report describes the architecture of the Multibase sys- tributed Databases," 7977 Berkeley Workshop on Distributed Data Man-
tem. Details of the components of the architecture to be agement and Computer Networks, Univ. of CA, Berkeley, CA, May 1977.

Common questions

Powered by AI

The Integration Schema (IS) in the Multibase architecture plays a crucial role in addressing data integration issues, such as data inconsistencies and converting different scales used by Local Schemata (LS) for the same entity type. The IS enables merging LSs into a Global Schema (GS) by defining necessary mappings and integration information, thus facilitating a unified and coherent data view .

The Multibase system addresses data modeling differences by translating Local Host Schemata (LHS) into Local Schemata (LS) using the Functional Data Model. This step creates a common model for different local models like relational, file, or CODASYL, ensuring the higher system levels don't need to manage different data model formats .

The Multibase system utilizes the functional data model to express all Local Schemata (LS) uniformly, enabling easy mapping into the Integration Schema (IS) and Global Schema (GS). This unification simplifies handling heterogeneous databases and supports efficient global query formulation .

The Multibase system's design emphasizes extendability by allowing the addition of new features without major modifications. This is achieved through a flexible architecture and thoughtful design of schemas and mappings that can be adapted or expanded with minimal disruption as technological needs evolve .

DAPLEX functions within the Multibase system by operating on data in the functional data model, enabling comprehensive schema interaction. It simplifies user operations by utilizing high-level constructs that align well with relational and network data structures, making it user-friendly and reducing the complexity of writing queries over an integrated system .

The query graph in the Multibase system is important as it visually represents query relationships and aids in query optimization by depicting join and selection clauses. This aids in formulating efficient query execution plans, ensuring optimized resource use and faster query processing .

Schema design aid is critical as it provides the necessary tools for database designers to define Local Schemata (LS), the Global Schema (GS), and the mappings among them. It ensures that the system can integrate multiple databases efficiently, maintaining consistency and accuracy in global query results .

The Multibase system handles query decomposition by isolating subqueries solvable at File and CODASYL sites, then transferring these results to the Global Data Manager (GDM) for further processing. This strategy is significant because it allows for more localized query execution, minimizes data movement, and supports efficient query processing through distributed databases .

The Multibase query processing architecture consists of the query translator, query processor, local database interfaces (LDI), and local DBMSs. The query translator converts global queries into queries referencing local schemata. The query processor decomposes these into local queries and optimizes their execution. The LDI translates these local queries into the local Data Manipulation Language (DML) and formats the results for the query processor .

The Multibase system design leaves existing interfaces to local DBMSs intact to ensure compatibility. This approach is significant because it protects the investment in existing software, allowing the integration of various database systems without invalidating them .

You might also like