CORBA Architecture Case Study Overview
CORBA Architecture Case Study Overview
CORBA RMI
Programming in a multi-language RMI system such as CORBA RMI requires more of the programmer
than programming in a single-language RMI system such as Java RMI. The following new concepts
need to be learned:
• the object model offered by CORBA;
• the interface definition language and its mapping onto the implementation language.
In particular, the programmer defines remote interfaces for the remote objects and then uses an interface
compiler to produce the corresponding proxies and skeletons. But in CORBA, proxies are generated in
the client language and skeletons in the server language.
The term CORBA object is used to refer to remote objects. Thus, a CORBA object implements an IDL
interface, has a remote object reference and is able to respond to invocations of methods in its IDL
interface. A CORBA object can be implemented by a language that is not object oriented, for example
without the concept of class. Since implementation languages will have different notions of class or
even none at all, the class concept does not exist in CORBA. Therefore classes cannot be defined in
CORBA IDL, which means that instances of classes cannot be passed as arguments. However, data
structures of various types and arbitrary complexity can be passed as arguments.
CORBA IDL
CORBA IDL provides an interface consisting of a name and a set of methods that a client can request.
IDL supports fifteen primitive types, constructed types and a special type called Object.
Primitive types: short, long, unsigned short, unsigned long, float, double, char, boolean, octet, and any.
Constructed types such as arrays and sequences must be defined using typedefs and passed by value.
Interfaces and other IDL type definitions can be grouped into logical units called modules.
CORBA Architecture
Implementation
repository
Client Server
Request
Client ORB ORB Servant
skeleton
adapter
Object
Interface
repository
The architecture is designed to support the role of an object request broker that enables clients to invoke
methods in remote objects, where both clients and servers can be implemented in a variety of
programming languages. The CORBA architecture contains three additional components: the object
adapter, the implementation repository and the interface repository. CORBA provides for both static
and dynamic invocations. Static invocations are used when the remote interface of the CORBA object
is known at compile time, enabling client stubs and server skeletons to be used. If the remote interface
is not known at compile time, dynamic invocation must be used. Most programmers prefer to use static
invocation because it provides a more natural programming model.
ORB core
The role of the ORB core is that of the communication module. In addition, an ORB core provides an
interface that includes the following:
• operations enabling it to be started and stopped;
• operations to convert between remote object references and strings;
• operations to provide argument lists for requests using dynamic invocation.
Object adapter
The role of an object adapter is to bridge the gap between CORBA objects with IDL interfaces and the
programming language interfaces of the corresponding servant classes. This role also includes that of
the remote reference and dispatcher modules. An object adapter has the following tasks:
• it creates remote object references for CORBA objects;
• it dispatches each RMI via a skeleton to the appropriate servant;
• it activates and deactivates servants.
An object adapter gives each CORBA object a unique object name, which forms part of its remote
object reference. The same name is used each time an object is activated. The object name may be
specified by the application program or generated by the object adapter. Each CORBA object is
registered with its object adapter, which may keep a remote object table that maps the names of CORBA
objects to their servants. Each object adapter has its own name, which also forms part of the remote
object references of all of the CORBA objects it manages. This name may either be specified by the
application program or generated automatically.
The former have transient object references and the latter have persistent object references.
The POA allows CORBA objects to be instantiated transparently; and in addition, it separates the
creation of CORBA objects from the creation of the servants that implement those objects. Server
applications such as databases with large numbers of CORBA objects can create servants on demand,
only when the objects are accessed. In this case, they may use database keys for the object names;
alternatively, they may use a single servant to support all of these objects. In addition, it is possible to
specify policies to the POA, for example, as to whether it should provide a separate thread for each
invocation, whether the object references should be persistent or transient and whether there should be
a separate servant for each CORBA object. The default is that a single servant can represent all of the
CORBA objects for its POA.
Skeletons
Skeleton classes are generated in the language of the server by an IDL compiler. As before, remote
method invocations are dispatched via the appropriate skeleton to a particular servant and the skeleton
unmarshals the arguments in request messages and marshals exceptions and results in reply messages.
Client stubs/proxies
These are in the client language. The class of a proxy (for object oriented languages) or a set of stub
procedures (for procedural languages) is generated from an IDL interface by an IDL compiler for the
client language. As before, the client stubs/proxies marshal the arguments in invocation requests and
unmarshal exceptions and results in replies.
Implementation repository
An implementation repository is responsible for activating registered servers on demand and for
locating servers that are currently running. The object adapter name is used to refer to servers when
registering and activating them. An implementation repository stores a mapping from the names of
object adapters to the pathnames of files containing object implementations. Object implementations
and object adapter names are generally registered with the implementation repository when server
programs are installed. When object implementations are activated in servers, the hostname and port
number of the server are added to the mapping.
Not all CORBA objects need to be activated on demand. Some objects, for example callback objects
created by clients, run once and cease to exist when they are no longer needed. They do not use the
implementation repository.
An implementation repository generally allows extra information to be stored about each server, for
example access control information as to who is allowed to activate it or to invoke its operations. It is
possible to replicate information in implementation repositories in order to provide availability or fault
tolerance.
Interface repository
The role of the interface repository is to provide information about registered IDL interfaces to clients
and servers that require it. For an interface of a given type it can supply the names of the methods and
for each method, the names and types of the arguments and exceptions. Thus, the interface repository
adds a facility for reflection to CORBA. Suppose that a client program receives a remote reference to a
new CORBA object. Also suppose that the client has no proxy for it; then it can ask the interface
repository about the methods of the object and the types of parameter each of them requires. When an
IDL compiler processes an interface, it assigns a type identifier to each
IDL type it encounters. For each interface registered with it, the interface repository provides a mapping
between the type identifier of that interface and the interface itself. Thus, the type identifier of an
interface is sometimes called the repository ID because it may be used as a key to IDL interfaces
registered in the interface repository.
Every CORBA remote object reference includes a slot that contains the type identifier of its interface,
enabling clients that hold it to enquire of its type with the interface repository. Those applications that
use static (ordinary) invocation with client proxies and IDL skeletons do not require an interface
repository. Not all ORBs provide an interface repository.
Experiment-02
Aim 4- Software Simulation for Clock Synchronization in Distributed System using Lamport’s
Algorithm
These form the top levels of a naming tree with the form shown below.
There are five commonly used groups of TLDs, and one group of specialized domains being used
for internationalized domain names (IDNs). [p512]
The gTLDs are grouped into categories:
• Generic
• Generic-restricted
• Sponsored
The generic gTLDs (generic appears twice) are open for unrestricted use. The others (generic-restricted
and sponsored) are limited to various sorts of uses or are constrained as to what entity may assign names
from the domain.
There is a "new gTLD" program in the works that may significantly expand the current set, possibly to
several hundred or even thousand. This program and policies relating to TLD management in general
are maintained by the Internet Corporation for Assigned Names and Numbers (ICANN).
Because some of these two-letter country codes of ccTLDs are suggestive of other uses and meanings,
various countries have been able to find commercial windfalls from selling names within their ccTLDs.
For example, the domain name [Link] is really a registration in the Pacific island of Tuvalu, which has
been selling domain names associated with the television entertainment industry. This is called
a domain hack.
The names below a TLD in the DNS name tree are further partitioned into subdomains, which is very
common practice, especially for the ccTLDs.
Fully qualified domain name (FQDN) *
The example names we have seen so far are known as fully qualified domain names (FQDNs). They
are sometimes written more formally with a trailing period (e.g., [Link].). This trailing period indicates
that the name is complete; no additional information should be added to the name when performing a
name resolution.
Unqualified domain name *
An unqualified domain name, which is used in combination with a default domain or domain search
list set during system configuration, has one or more strings appended to the end. During configuration,
system is typically assigned a default domain extension and search list using DHCP. For example, the
default domain [Link] might be configured in systems at the computer science department at
UC Berkeley. If a user on one of these machines types in the name vangogh, the local resolver software
converts this name to the FQDN [Link]. before invoking a resolver to
determine vangogh’s IP address.
A domain name consists of a sequence of labels separated by periods. The name represents a location
in the name hierarchy, where the period is the hierarchy delimiter and descending down the tree takes
place from right to left in the name.
The hierarchical structure of the DNS name space allows different administrative authorities to manage
different parts of the name space. For example, creating a new DNS
name [Link] dealing with the owner of the [Link] subdomain
only. The [Link] and edu portions of the name space would not require alteration, so the owners
of those would not need to be bothered. This feature of DNS is one key aspect of its scalability. No
single entity is required to administer all the changes for the entire DNS name space. [p516]
Name Servers and Zones
A person responsible for managing part of the active DNS name space is supposed to arrange for at
least two name servers or DNS servers to hold information about the name space so that Internet users
can perform queries on the names.
The DNS (formed by servers) is a distributed system whose primary job is to provide name-to-address
mappings; however, it can also provide a wide array of additional information.
A zone, as the unit of administrative delegation, is a subtree of the DNS name space that can be
administered separately from other zones. Every domain name exists within some zone (even the TLDs
that exist in the root zone). Whenever a new record is added to a zone, the DNS administrator for the
zone allocates a name and additional information (usually an IP address) for the new entry into the name
server’s database. For example:
• At a small campus, one person could do this each time a new server is added to the network;
• In a large enterprise the responsibility would have to be delegated (probably by departments or
other organizational units), as one person likely could not keep up with the work.
A DNS server can contain information for more than one zone. At any hierarchical change point in a
domain name (i.e., wherever a period appears), a different zone and containing server may be accessed
to provide information for the name. This is called a delegation. A common delegation approach uses
a zone for implementing a second-level domain name, such as [Link]. In this domain, there may
be individual hosts (e.g., [Link]) or other domains (e.g., [Link]). Each zone has a
designated owner or responsible party who is given authority to manage the names, addresses, and
subordinate zones within the zone. Often this person manages not only the contents of the zone but also
the name servers that contain the zone’s database(s).
For redundancy, zone information is supposed to exist in at least two places: there should be at least
two servers containing information for each zone. All of these servers contain identical information
about a zone. Among the servers, a primary server contains the zone database in a disk file, and one or
more secondary servers obtain copies from the primary using a process called a zone transfer. DNS
has a special protocol for performing zone transfers, but copies of a zone’s contents can also be obtained
using other means (e.g., the rsync utility).
Experiment-06
Clearly, there is no architectural support for making remote procedure calls. A local procedure
call generally involves placing the calling parameters on the stack and executing some form
of a call instruction to the address of the procedure. The procedure can read the parameters
from the stack, do its work, place the return value in a register and then return to the address
on top of the stack. None of this exists for calling remote procedures. We’ll have to simulate
it all with the tools that we do have, namely local procedure calls and sockets for net- work
communication. This simulation makes remote procedure calls a language- level construct as
opposed to sockets, which are an operating system level construct. This means that our
compiler will have to know that remote procedure call invocations need the presence of special
code.
The entire trick in making remote procedure calls work is in the creation of stub functions that
make it appear to the user that the call is really local. A stub function looks like the function
that the user intends to call but really contains code for sending and receiving messages over
a network. The following sequence of operations takes place (from p. 693 of W. Richard
Steven’s UNIX Net- work Programming):
Figure 1. Functional steps in a remote procedure call
The sequence of operations, depicted in Figure 1, is:
1. The client calls a local procedure, called the client stub. To the client process, it appears
that this is the actual procedure. The client stub packages the arguments to the remote
procedure (this may involve converting them to a standard format) and builds one or more
net- work messages. The packaging of arguments into a network message is called
marshaling.
2. Network messages are sent by the client stub to the remote system (via a system call to
the local kernel).
3. Network messages are transferred by the kernel to the remote system via some protocol
(either connectionless or connection-oriented).
4. A server stub procedure on the server receives the messages. It unmarshals the arguments
from the messages and possibly converts them from a standard form into a machine-specific
form.
5. The server stub executes a local procedure call to the actual server function, passing it the
arguments that it received from the client.
6. When the server is finished, it returns to the server stub with its return values.
7. The server stub converts the return values (if necessary) and marshals them into one or
more network messages to send to the client stub.
8. Messages get sent back across the network to the client stub.
9. The client stub reads the messages from the local kernel.
10. It then returns the results to the client function (possibly converting them first).
The client code then continues its execution.
The major benefits of RPC are twofold: the programmer can now use procedure call semantics
and writing distributed applications is simplified because RPC hides all of the network code
into stub functions. Application programs don’t have to worry about details (such as sockets,
port numbers, byte ordering). Using the OSI reference model, RPC is a presentation layer
service.
If a function may be run any number of times without harm, it is idempotent (e.g., time of day,
math functions, read static data). Otherwise, it is a nonidempotent function (e.g., append or
modify a file).
Figure 2. Compilation steps for Remote Procedure Calls
What about performance?
A regular procedure call is fast typically only a few instruction cycles. What about a remote
procedure call? Think of the extra steps involved. Just calling the client stub function and
getting a return from it incurs the overhead of a procedure call. On top of that, we need to
execute the code to marshal parameters, call the network routines in the OS (incurring a
context switch), deal with network latency, have the server receive the message and switch to
the server process, unmarshal parameters, call the server function, and do it all over again on
the return trip. Without a doubt a remote procedure call will be much slower.
Aim 7- Study of Distributed Database Management System.
Definition
A distributed database is basically a database that is not limited to one system; it is spread over different
sites, i.e, on multiple computers or over a network of computers. A distributed database system is
located on various sited that don’t share physical components. This may be required when a particular
database needs to be accessed by various users globally. It needs to be managed such that for the users
it looks like one single database.
The software that creates and administers the distributed database and provides data to the users is called
the Distributed Database Management system (DDBMS).It coordinates the access to the data at various
nodes in the distributed network environment. A distributed database management system is a software
system that permits the management of a distributed database and makes the distribution transparent to
the users.
• Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit requires
its own set of local data. Thus, the overall database of the organization becomes distributed.
• Need for Sharing of Data − The multiple organizational units often need to communicate with
each other and share their data and resources. This demands common databases or replicated
databases that should be used in a synchronized manner.
• Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common data.
Distributed database systems aid both these processing by providing synchronized data.
• Database Recovery − One of the common techniques used in DDBMS is replication of data
across different sites. Replication of data automatically helps in data recovery if database in
any site is damaged. Users can access data from other sites while the damaged site is being
reconstructed. Thus, database failure may become almost inconspicuous to users.
• Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.
Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt.
However, in distributed systems, when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from
local data itself, thus providing faster response. On the other hand, in centralized systems, all queries
have to pass through the central computer for processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not feasible
in centralized systems.
• Need for complex and expensive software − DDBMS demands complex and often expensive
software to provide data transparency and co-ordination across the several sites.
• Data integrity − The need for updating data in multiple sites pose problems of data integrity.
• Overheads for improper data distribution − Responsiveness of queries is largely dependent
upon proper data distribution. Improper data distribution often leads to very slow response to
user requests.
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process user requests.
• Autonomous − Each database is independent that functions on its own. They are integrated by
a controlling application and use message passing to share data updates.
• Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.
• A site may not be aware of other sites and so there is limited co-operation in processing user
requests.
• Un-federated − The database systems employ a central coordinating module through which
the databases are accessed.
• Distribution − It states the physical distribution of data across the different sites.
• Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows −
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the tables is done
in accordance to the frequency of access. This takes into consideration the fact that the frequency of
accessing the tables vary considerably from site to site. The number of copies of the tables (or portions)
depends on how frequently the access queries execute and the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions, and each
fragment can be stored at different sites. This considers the fact that it seldom happens that all data
stored in a table is required at a given site. Moreover, fragmentation increases parallelism and provides
better disaster recovery. Here, there is only one copy of each fragment in the system, i.e. no redundant
data.
• Vertical fragmentation
• Horizontal fragmentation
• Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially fragmented
in any form (horizontal or vertical), and then these fragments are partially replicated across the
different sites according to the frequency of accessing the fragments.
Data Replication
Data replication is the process of storing separate copies of the database at two or more sites. It is a
popular fault tolerance technique of distributed databases.
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are
called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (combination
of horizontal and vertical). Horizontal fragmentation can further be classified into two techniques:
primary horizontal fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments whenever
required. This requirement is called “reconstructiveness.”
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain reconstructiveness, each fragment should contain the primary key field(s) of the table.
Vertical fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a
Student table having the following schema.
STUDENT
Now, the fees details are maintained in the accounts section. In this case, the designer will fragment
the database as follows −
CREATE TABLE STD_FEES AS
SELECT Regd_No, Fees
FROM STUDENT;
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields.
Horizontal fragmentation should also confirm to the rule of re-constructiveness. Each horizontal
fragment must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science Course needs to
be maintained at the School of Computer Science, then the designer will horizontally fragment the
database as follows −
CREATE COMP_STD AS
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information. However, reconstruction of the original table is often an expensive task.
• At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
Experiment-08
Memory Management
The Amoeba memory model is simple and efficient. A process’ address space consists of one or more
segments mapped onto user-specified virtual addresses. When a process is executing, all its segments
are in memory. There is no swapping or paging at present, thus Amoeba can only run programs that fit
in physical memory. The primary advantage of this scheme is simplicity and high performance. The
primary disadvantage is that it is not possible to run programs larger than physical memory.
Input/Output
I/O is also handled by kernel threads. To read raw blocks from a disk, for example, a user process having
the appropriate authorization, does RPCs with a disk I/O thread in the kernel. The caller is not aware
that the server is actually a kernel thread, since the interface to kernel threads and user threads is
identical. Generally speaking, only file servers and similar system-like processes communicate with
kernel I/O threads.
Bullet File Server
The standard Amoeba file server has been designed for high performance and is called the Bullet server.
It stores files contiguously on disk, and caches whole files contiguously in core. Except for very large
files, when a user programs needs a file, it will request that the Bullet server send it the entire file in a
single RPC. A dedicated machine with at least 16 MB of RAM is needed for the Bullet file server for
installation (except on the Sun 3 where there is a maximum of 12 MB). The more RAM the better, in
fact. The performance is improved with a larger file cache. The maximum file size is also limited by
the amount of physical memory available to the Bullet server.
Machines on which Amoeba Runs
Amoeba currently runs on the following architecture