Oracle RAC Concepts and Administration
Oracle RAC Concepts and Administration
For
Technology Solutions Group
By
Trushant Bagate
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
Table of contents
A. ORACLE CLUSTERWARE BENEFITS....................................................................................................................... 3
B. ORACLE RAC ARCHITECTURE ............................................................................................................................ 3
C. ORACLE CLUSTERWARE COMPONENTS ................................................................................................................. 4
D. CLUSTERWARE PROCESSES STARTUP SEQUENCE ..................................................................................................... 5
E. AUTOMATIC STORAGE MANAGEMENT (ASM) ..................................................................................................... 6
F. NEW INITIALIZATION PARAMETERS ..................................................................................................................... 8
G. CONNECTION ESTABLISHMENT IN ORACLE RAC ..................................................................................................... 8
H. CACHE FUSION ............................................................................................................................................ 10
I. BACKGROUND PROCESS AND THEIR ROLES .......................................................................................................... 12
J. ORACLE RAC TROUBLESHOOTING .................................................................................................................... 13
K. SRVCTL COMMANDS ................................................................................................................................... 15
L. CRSCTL COMMANDS ................................................................................................................................... 17
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
A. ORACLE CLUSTERWARE BENEFITS
Eliminate unplanned downtime
Reduce or eliminate planned downtime for maintenance
Increase throughput by enabling applications to run on all the nodes in a cluster
Increase the throughput on demand for clustware applications by adding servers
Reduce the total cost for infrastructure
B. ORACLE RAC ARCHITECTURE
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
C. ORACLE CLUSTERWARE COMPONENTS
Oracle Cluster Registry:
The OCR maintains cluster configuration information that is used by each node of the cluster to
determine the state of the cluster. OCR also maintains below information about cluster resources,
o Databases
o Instance
o Services
Each node in the cluster maintains a copy of the OCR in memory for better performance and also
responsible for updating the OCR as required in shared storage.
Voting Disk:
The Voting Disk Files are used by Oracle Clusterware to determine which nodes are currently members
of the cluster.
The voting disk files are also used in concert with other Cluster components such as CRS to maintain the
clusters integrity.
o Both Vote and OCR must reside on shared storage that is accessible by all nodes in a cluster
o At least 3 voting disks and maximum of 15 disks
o To ensure high availability multiplex OCR location up to 5 locations.
Oracle Clusterware Stack
Cluster ready service Stack
o Cluster Ready Services (CRS): For managing high availability operations in a cluster.
o Cluster Synchronization Services (CSS): Manages the cluster configuration by controlling
which nodes are members of the cluster and by notifying members when a node joins or
leaves the cluster.
o Automatic Storage Management (ASM): Provides disk management for Oracle
Clusterware.
o Cluster Time Synchronization Service (CTSS): Provides time management in a cluster for
Oracle Clusterware.
o Event Management (EVM): A background process that publishes events that Oracle
Clusterware creates.
o Oracle Notification Service (ONS): Publish and subscribes service for communicating Fast
Application Notification (FAN) events.
o Oracle Agent (oraagent): To support Oracle-specific requirements and complex
resources. Runs server callout scripts when FAN events occur. This process was known
as RACG in Oracle11g R1
o Oracle Root Agent (orarootagent): oraagent process that helps crsd to manage
resources owned by root, such as the network, and the Grid virtual IP address.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
High Availability Services Stack
o Grid Plug and Play (gpnpd): Provides access to the Grid Plug and Play profile, and
coordinates updates to the profile among the nodes of the cluster to ensure that all of
the nodes node have the most recent profile.
o Grid Interprocess Communication (GIPC): A helper daemon for communications
infrastructure. Currently has no functionality; to be activated in a later release.
o Multicast Domain Name Service (mDNS): Allows DNS requests. The mDNS process is a
background process on Linux and UNIX, and a service on Windows.
o Oracle Grid Naming Service (GNS): A gateway between the cluster mDNS and external
DNS servers. The gnsd process performs name resolution within the cluster.
D. CLUSTERWARE PROCESSES STARTUP SEQUENCE
From 11gr2 ASM spfile is stored on asm on the first diskgroup created by default. OCR / Vote are
present in ASM. So for cluster start ASM need to be started. But again ASM spfile is present on ASM
itself. So how does Cluster Starts,
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
When we execute for cluster start following takes place in sequence,
1. OHASD is started. It accesses the OLR to complete OHASD initialization.
2. OHASD brings up Gpnpd and CSSD.
3. CSSD acceses the GPnp profile to get the required bootstrap information.
4. Gpnp consists of,
Network classifications (Public/Private)
SPFILE location
ASM DiskString
Digital signature information
5. CSSD scan the headers of all ASM disks as indicated by ASM_DISKSTRING to identify the location
of voting file. The Voting file is accessed and CSSD is able to colplete its initialization and start /
join cluster.
6. To start ASM it is not neccessary that the diskgroup be opened. All neccessary information
required is present in Disk headers. OHASD reads the header of ASM disk containing SPfile as
indicated by ASM_Diskstring and will read the contents of ASM sp file. There after ASM instance
is started.
7. Once ASM instance is started and diskgroup mounted all disks are accessble for read / write
including OCR.
8. OHASD starts CRSD with access of OCR in ASM.
9. Clusterware will complete its initialization and brings up the services under its control.
E. AUTOMATIC STORAGE MANAGEMENT (ASM)
ASM provides portable and high performance database file system and simplifies database
administration
ASM spread data across the disks and distributes I/O load across all available resources to
optimize performance
ASM provides integrated mirroring across disks
Dynamically add the space without shutdown of the database
It is advised to use separate ORACLE_HOME for ASM install
You can configure the ASM using DBCA
A separate instance (ASM) starts in order to manage ASM disks, resources and connectivity
Both ASM and Database instances have access to common set of disks called disk groups
ASM background processes
ARBn : Performs the actual rebalance data extent movements in an Automatic Storage
Management instance. More than one process can run at a time, named ARB0, ARB1, and so on.
ASMB : Runs in a database instance that is using an ASM disk group and communicates with the
ASM instance in managing storage and providing statistics.
GMON: Maintains disk membership in ASM disk groups.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
MARK: This process marks ASM allocation units as stale following a missed write to an offline
disk. This essentially tracks which extents require resync for offline disks.
RBAL: This process runs in both database and ASM instances. In the database instance, it does a
global open of ASM disks and in an ASM instance, it also coordinates rebalance activity for disk
groups
ASM Diskgroup
ASM uses disk groups to store datafiles. An ASM disk group is a collection of disks that ASM
manages as a unit.
Within a disk group, ASM exposes a file system interface for Oracle database files.
The content of files that are stored in a disk group are evenly distributed, or striped, to eliminate
hot spots and to provide uniform performance across the disks.
The performance is comparable to the performance of raw devices.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
F. NEW INITIALIZATION PARAMETERS
Unique parameters in RAC Instance
o instance_name- Specifies the unique name of this instance
o instance_number- Specifies the unique number that maps to instance
o thread – Specifies the number of the redo thread used by the instance
Non-Unique parameters in RAC Instance
o cluster_database – specifies weather RAC enabled or not
o cluster_database_instance – equal to the number of instances in a cluster
o cluster_interconnects – Specifies the additional interconnects available for use
o active_instance_count– specifies the number of instances that will be active within a cluster
o remote_listener -specifies a network name that resolves to an address or address list of
Oracle Net remote listeners
o local_listener - specifies a network name that resolves to an address or address list of Oracle
Net local listeners
Parameters in ASM Instance
o instance_type – This parameter must be set to ASM
o asm_diskgroups – lists the name of the disk groups that will be mounted by ASM instance
o asm_diskstring – This parameter limits the set of disks that ASM consider for discovery
o asm_power_limit – specifies the Maximum power on an ASM instance for disk rebalance
operations
o asm_preferred_read_failure_groups - specifies the failure groups that contain preferred
read disks
G. CONNECTION ESTABLISHMENT IN ORACLE RAC
I. Private IP: This IP is used for Node interconnection. Systems can't be access using this IP from
outer world.
II. Public IP: This IP is used for accessing system for day to day tasks monitoring etc.
III. Virtual IP (VIP): This IP is required for fail over in case of Node is down. This will move to
surviving node.
IV. Scan IP: SCAN NAME resolves to one or more than one IP addresses, these IP address are called
as SCAN VIP or SCAN IP.
VIP:
VIP is used to deal with TCP timeouts, when a client connects to a tns alias; it uses a TCP
connection to an IP address, defined in the [Link] file.
When using RAC, we define multiple addresses in our tns alias; to be able to failover when an IP
address, listener or instance is unavailable.
TCP timeouts can differ from platform to platform or implementation to implementation. This
makes it difficult to predict the failover time.
Virtual IP (VIP) is for fast connection establishment in failover dictation.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
Using virtual IP we can save our TCP/IP timeout problem because Oracle notification service
(ONS) maintains communication between each nodes and listeners.
Once ONS found any listener down or node down, it will notify another nodes and listeners.
While new connection is trying to establish connection to failure node or listener, virtual IP of
failure node automatically divert to surviving node and session will be establishing in another
surviving node.
This process doesn't wait for TCP/IP timeout event. Due to this new connection gets faster
session establishment to another surviving nodes/listener.
Whenever a new connection request is made the SCAN listener listening on a SCAN IP address
and the SCAN port is contracted on a client's behalf.
Because all services on the cluster are registered with the SCAN listener, the SCAN listener
replies with the address of the local listener as Node VIP address on the least-loaded node (Each
scan listener keeps updated cluster load statistics) and connection is routed to that node.
SCAN:
Single client access name (SCAN) is the virtual hostname (of 1-15 Char) to provide for all clients
connecting to the cluster and It should be unique across the network domain.
SCAN is registered with DNS (or Grid Naming Service) with at least one and up to three IP
addresses from the same subnet as that of Public and VIP address.
These IP addresses should be allocated in round robin fashion.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
1. On each Node there is pair of SCAN Listener & Local Listener 3 SCAN IP address will be mapped
as:
o 3 Node : Each node will be mapped with one SCAN IP More than 3 node : only 3 nodes
will be mapped to SCAN IPs
o 2 Node : One node will have double SCAN IP & SCAN Listener and other node will have
single IP and SCAN Listener
2. PMON process of each instance report the current work load to SCAN listener Service (specified
in the REMOTE_LISTENER database parameter.)
3. On connect request DNS resolves the Scan Name and returns list of 3 SCAN IP addresses
4. Client selects the first IP from the list and connects to the RAC, if it fails then request again with
next IP.
5. Depending on the Load SCAN listener re-direct the request to Local listener of lightly loaded
node. ( Note here in case of More than 3 node cluster, this is how request can still be given to
lightly loaded node via SCAN listener service, even if it does not have SCAN IP mapped )
6. Now all further communication happens through local listener directly.
The benefit of using SCAN is that the network configuration files on the client computer do not need to
be modified when nodes are added to or removed from the cluster.
H. CACHE FUSION
Interconnects
Cluster Interconnect is very important private network used for communication between all
other nodes
Network pings are performed by Cluster Synchronization Services ([Link])
Connected via switch to other nodes
New wait events due to traffic over interconnect
Enhanced technology has helped Cache Fusion
You can use OS dependent methods like Bonding on Unix and teaming on Windows
OS independent redundant interconnect available from [Link] onwards (Not on Windows)
Cache Fusion
Oracle introduced the framework of sharing data using private interconnects between the
nodes, which was used only for messaging purposes in previous versions. This protocol is Cache
Fusion.
Data blocks are shipped throughout the network similar to messages, reducing the most
expensive component of data transfer, disk I/O, to data sharing.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
Cache coherency is the technique used to keep multiple copies of a block consistent between
different oracle instances.
GCS implements the cache coherency by using Cache fusion algorithm
GES maintains all non-cache fusion resource operations
Cache Fusion addresses several types of concurrency as below:
o Concurrent Reads on Multiple Nodes
o Concurrent Reads and Writes on Different Nodes
o Concurrent Writes on Different Nodes
Request a block for a Modification
1. Instance1 submits a request to GCS to modify the block.
2. The GCS transmits the request to the holder, i.e. instance 2
3. Instance 2 receives the request message and the LMS process sends the block to instance 1.
4. On receipt of the block, instance 1 informs the GCS that it holds the block in exclusive mode.
Write a Block to Disk
1. Instance2 sends a request to GCS to write block to disk
2. The GCS forwards the request to instance 1
3. Instance 1 receives the request and writes the block to disk.
4. Instance 1 notifies the write operation to GCS
5. After receipt of notification GCS orders PI(Past Image) holders to discard their PI’s.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
Global Resource Directory (GRD)
GRD records information about current status of the data blocks, resources and enqueues.
GRD is managed and maintained by GES and GCS.
Each running instance stores a portion of the directory.
LMON recovers the GRD during instance recovery.
I. BACKGROUND PROCESS AND THEIR ROLES
1. LMSx – Lock Monitor Services (GCS)
Primarily responsible for shipping the blocks across buffers
Provides/creates a CR image whenever there is cross instance call for a dirty block
LMS must also check constantly with the LMD background process (or our GES process) to get
the lock
Requests placed by the LMD process.
Parameter: GCS_SERVER_PROCESS up to 36 as of 10.2, Min. cpu_count/2
2. LMON – Lock Monitor Process (GES)
LMON Processes manages the global locks & resources.
Reconfiguration of locks & resources when an instance joins or leaves the cluster are handled by
LMON (
During reconfiguration LMON generate the trace files)
LMON also provides cluster group services.
3. LMD – Lock Manager Daemon
LMD process performs global lock deadlock detection local and remote. (GES)
Also monitors for lock conversion timeouts.
Basically maintains the lock queues, traverse through the GES structures
4. LCK – Lock Process
Manages instance resource requests & cross instance calls for shared resources.
During instance recovery, it builds a list of invalid lock elements and validates lock elements.;
5. DIAG – Diagnostic Daemon
From Oracle 10g, this one new background processes was introduced (New enhanced
diagnosability framework).
Regularly monitors the health of the instance.
Also checks instance hangs & deadlocks.
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
J. ORACLE RAC TROUBLESHOOTING
WHICH LOG TO CHECK AND WHERE
Sr. Component Check process Functionality Log File Location
No.
Cluster Ready ps -ef | grep crs | This process is responsible for start, Grid_home/log/host_name/crsd
Services Daemon grep -v grep stop, monitor and failover of
1 (CRSD) Log resource. It maintains OCR and also
restarts the resources when the
failure occurs.
Cluster ps -ef | grep -v Monitors node hangs(via oprocd Grid_home/log/host_name/cssd
Synchronization grep | grep css functionality) and monitors OCCSD
Services (CSS) process hangs (via oclsomon
functionality) and monitors vendor
2
clusterware(via vmon
functionality).This is the multi
threaded process that runs with
elavated priority.
Cluster Time ps -ef | grep ctss Provides Time Management in a Grid_home/log/host_name/ctssd
3 Synchronization | grep -v grep cluster for Oracle Clusterware
Service (CTSS)
Multicast Domain ps -ef | grep -v Used by Grid Plug and Play to locate Grid_home/log/host_name/mdnsd
Name Service grep | grep dns profiles in the cluster, as well as by
Daemon (MDNSD) GNS to perform name resolution. The
4
mDNS process is a background
process on Linux and UNIX and on
Windows.
Oracle Grid ps -ef | grep -v Handles requests sent by external Grid_home/log/host_name/gnsd
Naming Service grep | grep gns DNS servers, performing name
5
(GNS) resolution for names defined by the
cluster.
Oracle High ps -ef | grep -v CRSD is applicable for RAC systems. Grid_home/log/host_name/ohasd
Availability grep | grep has For Oracle Restart and ASM ohasd is
6
Services Daemon used.
(OHASD)
Event Manager ps -ef | grep evm Distributes and communicates some Grid_home/log/host_name/evmd
(EVM) information | grep -v grep cluster events to all of the cluster
7 generated by evmd members so that they are aware of
the cluster changes.
EVMD logger ps -ef | grep evm Started by [Link] reads the Grid_home/log/host_name/evmd
| grep -v grep configuration files and determines
8 what events to subscribe to from
EVMD and it runs user defined
actions for those events.
Oracle RAC RACG ps -ef | grep -v Extends clusterware to support
grep | grep Oracle-specific requirements and
The Oracle RAC high availability trace files
oraagent complex resources. This process runs
are located in the following two locations:
9 server callout scripts when FAN
Grid_home/log/host_name/racg
events occur. This process was known
$ORACLE_HOME/log/host_name/racg
as RACG in Oracle Clusterware 11g
Release 1 (11.1).
Grid Interprocess ps -ef | grep -v A support daemon that enables Grid_home/log/host_name/gipcd
10 Communication grep | grep gipc Redundant Interconnect Usage.
Daemon (GIPCD)
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
Summary
Sr. No. Important logs for cluster maintenance and troubleshooting various Cluster related problems
Each cluster node has its own alert log file and write important and useful information about
1 alert_hostname.log cluster startup, node eviction, any cluster component start-up problems, OCR/Voting disk
related information.
2 [Link] Check above table for details
3 [Link] Check above table for details
4 [Link] Check above table for details
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
K. SRVCTL COMMANDS
SRVCTL is used to manage the following resources (components):
Component Abbreviation Description
asm asm Oracle ASM instance
database db Database instance
diskgroup dg Oracle ASM disk group
filesystem filesystem Oracle ASM file system
home home Oracle home or Oracle Clusterware home
listener lsnr Oracle Net listener
service serv Database service
ons, eons ons, eons Oracle Notification Services (ONS)
The available commands used with SRVCTL are,
Command Description
add Adds a component to the Oracle Restart configuration.
config Displays the Oracle Restart configuration for a component.
disable Disables management by Oracle Restart for a component.
enable Re enables management by Oracle Restart for a component.
getenv Displays environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.
modify Modifies the Oracle Restart configuration for a component.
remove Removes a component from the Oracle Restart configuration.
setenv Sets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.
start Starts the specified component.
status Displays the running status of the specified component.
stop Stops the specified component.
unsetenv Unsets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.
Commands Objects Comment
srvctl add instance
srvctl modify database
The OCR is modified.
srvctl remove service
nodeapps
srvctl relocate service You can reallocate a service from one named instance to another named instance.
srvctl start instance
srvctl stop database
srvctl status service
asm
nodeapps
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
srvctl disable instance enable = when the server restart the resource must be restarted
srvctl enable database
service disable = when the server restart the resource must NOT be restarted
asm (perhaps we are working for some maintenance tasks)
database
service
srvctl config Lists configuration information from the OCR (Oracle Cluster Registry).
asm
nodeapps
srvctl getenv instance srvctl getenv = displays the environment variables stored in the OCR for target.
srvctl setenv database
srvctl
service srvctl setenv = allows these variables to be set
unsetenv
nodeapps srvctl unsetenv = llows these variables to be unset
Frequently used commands
srvctl start database -d DBname
srvctl stop database -d DBname
srvctl start instance -d DBname -i INSTANCEname
srvctl stop instance -d DBname -i INSTANCEname
srvctl start instance -d DBname -i INSTANCEname
srvctl stop instance -d DBname -i INSTANCEname
srvctl status database -d DBname
srvctl status instance -d DBname -i INSTANCEname
srvctl status nodeapps -n NODEname
srvctl enable database -d DBname
srvctl disable database -d DBname
srvctl enable instance -d DBname -i INSTANCEname
srvctl disable instance -d DBname -i INSTANCEname
srvctl config database -d DBname -> to get some information about the database from OCR.
srvctl getenv nodeapps
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
L. CRSCTL COMMANDS
Dual Environment CRSCTL Commands
crsctl add resource
crsctl add type
crsctl check css
crsctl delete resource
crsctl delete type
crsctl get hostname
crsctl getperm resource
crsctl getperm type
crsctl modify resource
crsctl modify type
crsctl setperm resource
crsctl setperm type
crsctl start resource
crsctl status resource
crsctl status type
crsctl stop resource
Oracle RAC Environment CRSCTL Commands
The commands listed in this section manage the Oracle Clusterware stack in an Oracle RAC environment,
which consists of the following,
Oracle Clusterware, the member nodes and server pools
Oracle ASM (if installed)
Cluster Synchronization Services
Cluster Time Synchronization Services
crsctl add crs administrator
crsctl add css votedisk
crsctl add serverpool
crsctl check cluster
crsctl check crs
crsctl check resource
crsctl check ctss
crsctl config crs
crsctl delete crs administrator
crsctl delete css votedisk
crsctl delete node
crsctl delete serverpool
crsctl disable crs
crsctl enable crs
Trushant Bagate (TSG)
Oracle RAC Concepts and Administration
crsctl get css
crsctl get css ipmiaddr
crsctl get nodename
crsctl getperm serverpool
crsctl lsmodules
crsctl modify serverpool
crsctl pin css
crsctl query crs administrator
crsctl query crs activeversion
crsctl query crs releaseversion
crsctl query crs softwareversion
crsctl query css ipmidevice
crsctl query css votedisk
crsctl relocate resource
crsctl relocate server
crsctl replace discoverystring
crsctl replace votedisk
crsctl set css
crsctl set css ipmiaddr
crsctl set css ipmiadmin
crsctl setperm serverpool
crsctl start cluster
crsctl start crs
crsctl status server
crsctl status serverpool
crsctl stop cluster
crsctl stop crs
crsctl unpin css
crsctl unset css
Oracle Restart Environment CRSCTL Commands
The commands listed in this section control Oracle High Availability Services,
crsctl check has
crsctl config has
crsctl disable has
crsctl enable has
crsctl query has releaseversion
crsctl query has softwareversion
crsctl start has
crsctl stop has
Trushant Bagate (TSG)