2011
M. Haseeb Minhas
Digitally signed by M. Haseeb Minhas DN: cn=M. Haseeb Minhas, o=[Link] [Link], ou, email=me@haseebmin [Link], c=CA Date: 2011.12.15 [Link] -05'00'
Integrated Technology
Object Oriented Model Integration
This report focuses on methodologies for integrating technology, providing a brief description about reusing existing database systems and information systems.
M. Haseeb Minhas 5/23/2011
ABSTRACT
Today information systems are a basic infrastructure of an organization due to the tremendous investments made in the past three decades. Information systems have progressed from file systems, database systems to management information systems (MIS) and executive information systems (EIS). With the development of each new technology, there is a need to integrate, redesign and re-implement existing information systems into the format needed for the new technology instead of throwing away the old systems. This approach has obvious benefits, particularly if it can be automated and/or supported by methods and tools. Large organizations have numerous heterogeneous databases for MIS operations. There is a need to integrate them into a corporate database for its decision support systems. Subsequently, schema integration must be performed to resolve the conflicts between two databases with respect to data name, type, and semantics. Schema integration must be done before data integration, which is mainly concerned with the automation of loading data from source databases into an integrated database. It is essential for companies to enhance and evolve the existing database schemas to meet the new data requirements. This report focuses on methodologies for integrating technology, providing a brief description about reusing existing database systems and information systems including: 1) The integration of hierarchical database systems into relational database and object oriented technology. 2) The integration of multiple databases. 3) Techniques of schema and data integration. In general the major components of an information system are databases for production operation, and expert systems for managerial decision making. The methodologies discussed in this report aim to protect the investment that companies have already put into these systems. The aim is to find methods of integrating these systems with new technologies.
TABLE OF CONTENTS
INTRODUCTION .......................................................................................................................... 3
The Need ............................................................................................................................... 3 The Issues ............................................................................................................................. 3 Selecting a Database Model ............................................................................................. 3 Database Conversion ....................................................................................................... 4 Integration of Multiple Databases .................................................................................... 4 Integration of Database with external Systems .............................................................. 4
THE SOLUTION AND APPROACH TO INTEGRATION...................................................... 6
Schema translation ........................................................................................................... 6 Data conversion: ............................................................................................................... 7 Program translation: ......................................................................................................... 7 RELIKEDB ......................................................................................................................... 8
CONCLUSION.............................................................................................................................. 9 BIBLIOGROPHY ........................................................................................................................ 10
INTRODUCTION
Information systems are always under continuous pressure of advancement. The data management core of an information system is recognized to be one of the most critical and difficult portion of software to evolve and integrate. The recent growth in database technology has encouraged more use of database management systems in different types of organizations. In addition to new database system installations, there is considerable interest in integrating conventional file oriented systems to database systems and upgrading outdated database systems to a newer database technology.
The Need
Over the last few years, a number of database systems have come onto the market using hierarchical, network, relational, object-oriented, EDS (Expert Database System) and XML data models. As a result of this proliferation of systems, many large organizations have found that they must support various types of database systems simultaneously. However, as the performance of the relational database systems has improved, it has created the need to convert a companys non-relational database systems to relational. Database system integration is a complex task where the acquisition and running of a new system is both a long term commitment and investment for an organization. Therefore it is vital for an organization to understand the objectives of committing to a new environment, as well as some of the problems that may lead to the collapse of such a project.
The Issues
The following are the major strategic issues that must be considered in the early stage of the integration process. Selecting a Database Model The hierarchical database systems require users to navigate through the database from one point to the next. This is difficult because of the level of skill and experience required to perform this navigation. The connections between sets of data are hard-coded into the data structure and the addition of a new relationship requires a new access path to be added. These relations of the hierarchical structures require a complex data definition language (DDL) and data manipulation language (DML). In comparison, a relational database provides relations where the access paths are not preestablished and are based on matching of values in separate tables using a join operation. This
allows a relational database to provide better flexibility and data independence as the need for information changes over time. However, there are many disadvantages to such database systems. For instance, the semantics of relational databases are often hidden within the many relationships and cannot be extracted without the users help. Also, relations stored in the database must first at least be in normal form, preventing the representation of multiple or set attributes. Furthermore, relational data models accept entities in a certain form, and structural changes to an entity require changes to all the instances of that entity in the database. Thus, it is very difficult to change a single instance without affecting the whole database. Database Conversion Data conversion can be very complicated if the existing data organization is very different from the new database model. Similar to program conversion, some software vendors also provide utilities for data conversion. However, most utilities are not up to date, logically sound nor are they properly organized. A good utility is dependant on a full understanding of management and technical requirements. Redesigning the application system should be considered, taking into account the database concept instead of the conversion process. Integration of Multiple Databases In an organization, different departments have developed their own relational database systems according to their own requirements and localized application. Thus, large quantities of data are fragmented across a variety of databases and data is redundant and inconsistent. A global view on all data is not there and therefore the contrasting data does not sufficiently support the information needs of an organization operating in a dynamic business environment. It is vital for the data to provide current and up-to-date information to support decision making in an organization. There is a great need to create a global view on all existing data by integrating them in a global database so as to support dynamic and complex business activities. Integration of Database with external Systems Integration of existing databases with a new updated computing technology is another issue. The integration will update the existing systems to meet a new requirement. Schematic and operation heterogeneity are a crucial problem because the different systems operate independently and the data or knowledge may include structural and representational discrepancies (i.e., conflicts). These discrepancies can be: Domain and Naming conflicts: Different systems use different names and values to represent the same concepts.
Meta-data conflicts: The same concepts are represented at the schema level in one system and at the instance level in another. Structural conflicts: Different data models of hierarchical, network, relational, object-oriented, and XML are used together, representing different structures for the same concepts.
THE SOLUTION AND APPROACH TO INTEGRATION
Object-oriented databases offer solutions to many of these problems. Object-oriented model is a logical schema in the form of objects with name, properties, and behavior that capture the semantics and complexity of the data using the concepts of class, instance, and inheritance. An instance is an occurrence of a class, where a class is a description of an entity. Classes may inherit the attributes of one or more superclass and thus capture some of the code and data that describe the common characteristic. (Hughes, 1991) An object-oriented model is thus more reusable and flexible in schema evolution and data storage making it more productive then a relational database. They simplify programming database updates and provide faster access to stored data by blurring the distinction between programming language and database. To successfully integrate to this all parties within an organization must have a common ground to discuss with each other their individual needs, goals, expectations and constraints. The involvement of all parties in all phases of integration (planning, requirements, design, construction, implementation and operations) is vital. This should result in management commitment, documentation that is understandable to all parties, and a jointly owned, useroriented set of structured models of the systems design. On the technical side, to integrate an object-oriented database, the static data needs to be mapped in schema translation and data conversion. Each dynamic behavior of the mapped class needs to be recorded by translating each I/O statement into the operations (methods) of each class. The overall integration can be separated into three main parts:
Schema translation This process involves solving conflicts between source databases, capturing the semantics of entity, generalization, categorization of the relations, and merging to a new integrated schema for each pair of the existing relational schemas in the source databases. In schema translation, there are two approaches: a) Direct translation: It is possible to translate a non-relational scheme to a relational, however this may cause loss of information due to the primitive mode of operation that cannot identify all the schematics. b) Indirect translation: This is done by mapping the logical schema into a conceptual model which contains all the original schematics. User input and knowledgebase can be used to recapture the semantics of the conceptual schema. Then the conceptual schema is automatically mapped to a relational schema. In order to translate a relational schema to
an object-oriented schema, the schema is first mapped to the ER model. Then it is mapped to the UML (Unified Modeling Language) and finally translated onto the objectoriented model of the target database. Data conversion: The objective is to merge data from source databases to the new global database without any loss of information. It must transform the data structure from the sources to the target integrated global database whilst preserving its semantics. There are there conventional approaches to conversion: a) Parallel Conversion: This is a very safe approach where the application programs and data for the new system are converted while the existing system is still in operation. However, managing both systems simultaneously requires extra effort and cost. b) Direct Cut-Over: This approach is cost effective and is used for small systems. The converted applications and data are used instead of the old one in a specified period of time. c) Phase-In: This approach is used when the system is a very large and cannot be completely converted at once. It divides the whole conversion process into several phases. In data conversion, there are two approaches: a) Physical conversion: The physical data of the non-relational database is directly converted to the physical data of the relational database. This can be done using an interpreter approach or a generator approach. The former is a direct translation from one data item to another. The latter is to provide a generator that generates a program to accomplish the physical data conversion. b) Logical conversion: The logical approach is to unload the non-relational database to sequential files in the logical sequence, similar to the relational model. The sequential files can then be uploaded back to a target relational database. This approach is concerned with the logical sequence of the data rather the physical attributes of each data item. Program translation: In program translation, there are four approaches: a) Rewrite: After translating the non--relational schema into a relational schema, the
software can be rewritten to run on the relational database.
b) Bridge: A relational interface layer can be used to translate the relational database into a non-relational. The user can then use relational DML commands to extract and manipulate the underlying non-relational database system. c) Emulation: A virtual environment can be used that maps the source program commands into functionally equivalent commands in the target system. Each non-relational DML is substituted by relational DML statements to access the converted relational database. d) Decompile: Another approach is to transform the program into a low level language that is an equivalent but more conceptual version to meet the new environment, database files, and DBMS requirements. RELIKEDB Even though the above approaches resolve a lot of problems, however the main difficulty still lies in the translation of semantics. It is also difficult to know the unique or primary keys in addition to finding whether there is a one-to-one relation or multiple relationships between the parent and the child node in the network schema. This complication carries over from the database scheme into the program itself. The automation of the direct translation as described above is still a challenge to database researchers. In order to resolve the above problems, the RELIKEDB (Relational-like-database) (Fong, 1993) approach can be used. RELIKEDB provides schema translation in which user input is part of the process. Direct schema translation from a hierarchical into a relational cannot guarantee the capture of all the original conceptual schema semantics. The user input provides a relational schema that is closer to the users expectations and which preserves the existing record key, relationships, and attributes. RELIKEDB also provides data conversion algorithms to unload a hierarchical database into sequential files, which can then be uploaded into a relational database. In program translation, RELIKEDB provides an open data structure by adding secondary indices in the existing hierarchical database. This eliminates the navigation access path required to retrieve a target record from a system record. Instead, each target record type can be accessed directly without database navigation. The database access time is thus reduced and the program conversion effort simplified. RELIKEDB provides algorithms to translate SQL statements into hierarchical DML statements. These are sound solutions to the program conversion problem. However there is no standard object oriented database DML at present to assist in program translation from a relational to an object oriented form.
CONCLUSION
Database integration can be accomplished by upgrading an obsolete record-based hierarchical database into a relation-based, or by upgrading a relational database into an object-oriented database. The integration process includes schema translation, data conversion, and program translation. The notion of upgrading to an object-oriented system is very attractive due to the increase in productivity and user friendliness. The problems in database integration are in the handling of different data structures of various data models. Also, the existing systems can become obsolete due to changes of user requirements and production databases. The suggested solution is to upgrade record-based data models of hierarchical databases to table oriented relational databases, object-oriented databases. The object-oriented paradigm has been seen as the most common technique for the conventional software and knowledge base reuse. The object-oriented technology is still growing.
BIBLIOGROPHY
Fong, J. and Ho, M. (1993) Knowledge-based approach for abstracting hierarchical and network schema semantics, Lecture Notes in Computer Science, ER 93, Springer Verlag.
Fong, J. (1995) Mapping Extended Entity Relationship Model to Object Modeling Technique, ACM SIGMOD RECORD, Vol. 24, No. 3., pp18-22.
Fong, J. (1996) Adding a Relational Interface to a Nonrelational Database, September, pp89-97. IEEE Software.
Hughes, J. (1991) Object-Oriented Databases, Prentice Hall Inc.
Rumbaugh, J. et al. (1991) Object-Oriented Modelling and Design, Prentice Hall Inc, pp183-185.