Aria Operations Adoption Guide
Aria Operations Adoption Guide
Adoption Guide
VMware Solution Adoption
You can find the most up-to-date technical documentation on the VMware website at:
[Link]
VMware, 2
VMware Solution Adoption
VMware, Inc.
3401 Hillview Ave.
Palo Alto, CA 94304
[Link]
©
Copyright 2023 VMware, Inc. All rights reserved. Copyright and trademark information.
VMware, 3
Contents
Contents..................................................................................................................... 3
Version History........................................................................................................... 4
Document Overview................................................................................................... 5
Solution Adoption Overview....................................................................................... 6
Solution Objectives..................................................................................................... 7
Technology Operational Processes.............................................................................8
Solution Activities....................................................................................................... 9
Best Practices for Deploying vRealize Operations Manager....................................9
Analytics Nodes................................................................................................... 9
Witness Nodes................................................................................................... 10
Cloud Proxy....................................................................................................... 10
Cloud Proxy and Telegraf Agents.......................................................................10
Management Packs and Adapters.....................................................................11
Deployment Formats......................................................................................... 11
Scalability Considerations..................................................................................... 11
High Availability Considerations............................................................................13
Continuous Availability Considerations.................................................................13
Cluster Management......................................................................................... 14
Fault Domains.................................................................................................... 14
Witness Node..................................................................................................... 14
Analytics Nodes................................................................................................. 14
Collector Group.................................................................................................. 15
Adapter and Management Packs Considerations..................................................15
Hardware Requirements for Analytics Nodes, Witness Nodes, and Cloud Proxy...16
Port Requirements for vRealize Operations Manager............................................16
Consumption Activities......................................................................................... 16
Troubleshooting Workbench Home Page...........................................................16
About vRealize Operations Reference for Metrics, Properties, and Alerts..........18
Monitoring Objects in Your Managed Environment by Using.............................19
Appendix A: Process Routine Mapping.....................................................................21
VMware,
Version History
Consultant Note Amend the text as needed and address the consultant and or
architect notes. After completion, the notes can be removed. This text provides either
sample
guidance information, configuration recommendations for specific services, or customer
specific configurations.
VMware,
Document Overview
This adoption guide document is created by VMware Professional Services to support
the subsequent operations of the VMware solution at Customer.
VMware,
Solution Adoption Overview
Adoption of any solution involves the incorporation of the technology into existing business
processes. The creation, training and verification of adopted processes is required to ensure
that the original business aims are achieved post implementation of the solution.
VMware has created a simple methodology to assist with this, the diagram below illustrates
this.
Using this methodology Customer will have details on many operational processes that
can be used post implementation of the solution.
VMware,
Solution Objectives
During the scoping of the solution with Customer, VMware identified the following
business objectives:
IT Outcomes
Centralized dashboards
Centralised dashboards
IT Capabilities
VMware,
Technology Operational Processes
The following table illustrates the list of processes included in this document and how they
relate to the IT Outcomes and IT Capabilities shown.
Consultant Note Review the Consumption processes that have been inserted below. Based
upon the process description map each process to one or more IT Outcomes and IT
Capabilities listed above. The aim is to create the linkage between the processes provided
with this engagement to the pre-sales discussion that determined the Capabilities and IT
Outcomes the customer wanted to develop.
VMware,
Solution Activities
The following sections provide details of the operational use cases that contribute towards
the development of the IT Outcomes and IT Capabilities that the technology is intended to
support.
Analytics Nodes
Analytics nodes consist of a primary node, primary replica node, and data nodes.
Note The master node is now referred to as the primary node. The master replica node is
now referred to as the primary replica node.
Deploy analytics nodes in the same vSphere Cluster except when activating
Continuous Availability.
Deploy analytics nodes with the same disk size on storage of the same type.
When activating Continuous Availability, separate analytics nodes into fault domains
based on their physical location.
Depending on the size and performance requirements for analytics nodes, apply
Storage DRS Anti-Affinity rules to ensure that nodes are on separate datastores.
Set Storage DRS to manual for all vRealize Operations Manager analytics nodes.
If you deploy analytics nodes into a highly consolidated vSphere cluster, configure the
resource reservation to ensure optimal performance. Ensure that the virtual CPU to
VMware,
physical CPU ratio is not negatively impacting the performance of analytics nodes by
validating CPU ready time and CPU co-stop.
Analytics nodes have a high number of vCPUs to ensure performance of the analytics
computation that occurs on each node. Monitor CPU Ready time and CPU Co-Stop to
ensure that analytics nodes are not competing for CPU capacity.
If the sizing guideline provides several configurations for the same number of objects,
use the configuration which has the least number of nodes. For example, if the number
of collecting is 120,000, configure the cluster with four extra-large nodes instead of 12
large nodes.
Deploy an extra even number of nodes to activate Continuous Availability. If the current
configuration is an odd number of analytics nodes, deploy an extra analytics node to
create an even pairing.
Witness Nodes
A witness node is required when continuous availability is activated to manage the analytics
nodes in the fault domains. vRealize Operations Manager can have only one witness node in
its cluster.
Deploy the witness node in a different cluster separate from the analytics nodes.
Cloud Proxy
Using cloud proxies in vRealize Operations Manager, you can collect and monitor data from
your remote data centers. You can deploy one or more cloud proxies in vRealize Operations
Manager to create a one-way communication between your remote environment and
vRealize Operations Manager. The cloud proxies work as one-way remote collectors and
upload data from the remote environment to vRealize Operations Manager. Cloud proxies
can support multiple vCenter Server accounts.
Ensure that your operating system platform is supported by Cloud Proxy, and the most
recent versions of Windows and Linux OS are supported.
VMware,
System times must be synchronized between cloud proxy, end point VMs, the vCenter
Server, ESX host, and vRealize Operations Manager. To ensure synchronized time, use
Network Time Protocol (NTP).
Ensure that all the prerequisites are met. For more information, see Prerequisites.
Disable UAC on Endpoint VMs before installing the Telegraf agent. If you cannot do this
due to security restrictions, see KB article 70780 for a work around script.
Ensure that the version later than 10.2 of VMware Tools is installed on the end point
VM on which you want to deploy the Telegraf agent.
To deploy Telegraf agents onto end point VMs, ensure that the following prerequisites
are met for the user account being used for deployment:
An administrator account
For more information, see User Account Prerequisites in the vRealize Operations
Configuration Guide.
Utilize collector groups to separate data collection into fault domains when
continuous availability is activated.
Deployment Formats
Deploy vRealize Operations Manager with the same vRealize Operations Manager vApp
version for the following node types:
Primary
Primary Replica
VMware,
Data
Witness
See the vRealize Operations vApp Deployment and Configuration Guide for more
information.
Scalability Considerations
Configure your initial deployment of vRealize Operations Manager based on the anticipated
use.
For more information about sizing, see the KB article vRealize Operations Manager
Sizing Guidelines (KB 2093783).
Analytics Nodes
Analytics nodes consist of a primary node, a primary replica node, and data nodes.
If you deploy analytics nodes in a configuration other than large, you can reconfigure the
vCPU and memory. It is recommended to scale up the analytics nodes in the cluster
before scaling out the cluster with additional nodes. vRealize Operations Manager
supports various node sizes.
To maintain a supported configuration, data nodes deployed in the cluster must be the
same node size.
For more information about increasing storage, see the topic, Add Data Disk Space to
a vRealize Operations vApp Node. You cannot modify the disks of virtual machines
that have a snapshot. You must remove all snapshots before you increase the disk size.
VMware,
To maintain a supported configuration, analytics nodes deployed in the cluster must be
the same node size.
Witness Node
vRealize Operations Manager provides a single size regardless of the cluster size
since the witness node does not collect nor process data.
Remote Collectors
vRealize Operations Manager supports two sizes for remote collectors, standard and
large. The maximum number of resources is based on the aggregate resources that are
collected for all adapters on the remote collector. In large scale vRealize Operations
Manager monitored environment, you might experience a slow responding UI, and
metrics are slow to be displayed.
Cloud Proxy
vRealize Operations Manager supports two sizes for Cloud Proxy, small and large.
The maximum number of resources is based on the aggregate resources that are
collected for all adapters on the Cloud Proxy. In large scale vRealize Operations
Manager monitored environment, you might experience a slow responding UI, and
metrics are slow to be
displayed. Determine the areas of the environment in which the latency is greater
than 20 milliseconds and install a remote collector Cloud Proxy in those areas.
Cluster Management
Clusters consist of a primary node, a primary replica node, and data nodes.
VMware,
recovery solution. When you activate High Availability, information is stored
(duplicated) in two different analytics nodes within the cluster. This doubles the
system's compute and capacity requirements. If either the primary node or the primary
replica node is permanently lost, then you must deactivate, and then reactivate High
Availability to reassign the primary replica role to an existing node. This process, which
includes a hidden cluster rebalance, can take a long time.
Analytics Nodes
Analytics nodes consist of a primary node, primary replica node, and data nodes.
When you activate High Availability, you protect vRealize Operations Manager from data
loss when only a single node is lost. If two or more nodes are lost, there may be
permanent data loss. Deploy each analytics node to separate hosts to reduce the chance
of data loss if a host fails. You can use DRS anti-affinity rules to ensure that the vRealize
Operations Manager nodes remain on separate hosts.
Collector Group
In vRealize Operations Manager, you can create a collector group. A collector group is a
collection of nodes (cloud proxy, and analytics nodes). You can assign adapters to a
collector group, rather than assigning an adapter to a single node.
Note A collector group must contain the same type of nodes. You cannot mix cloud
proxy, and analytics nodes in a collector group.
If the node running the adapter fails, the adapter is automatically moved to another
node in the collector group.
Assign all normal adapters to collector groups, and not to individual nodes. Hybrid
adapters require a two-way communication between the adapter and the monitored
endpoint.
For more information about adapters, see Adapter and Management Packs
Considerations.
Cluster Management
VMware,
Clusters consist of a primary node, a primary replica node, a witness node, and data nodes.
When you activate Continuous Availability, information is stored (duplicated) in two different
analytics nodes within the cluster but stretched across fault domains. Due to sizing
requirements, continuous availability requires doubling the system’s compute and capacity
requirements.
If either the primary node or primary replica node is permanently lost, then you must replace
the lost node, which will become the new primary replica node. If it is necessary to have the
new primary replica node as the primary node, then you can take the current primary node
offline and wait until the primary replica node is promoted to the new primary node. Then
bring the former primary node back online and it will be the new primary replica node.
Fault Domains
Fault domains consist of analytics nodes, separated into two zones.
A fault domain consists of one or more analytics nodes grouped according to their physical
location in the data center. When configured, two fault domains allow vRealize Operations
Manager to tolerate failures of an entire physical location and failures from resources
dedicated to a single fault domain.
Witness Node
Witness node is a member of the cluster but not part of the analytics nodes.
To activate CA within vRealize Operations Manager, deploy the witness node in the cluster.
The witness node does not collect nor store data.
The witness node serves as a tiebreaker when a decision must be made regarding
availability of vRealize Operations Manager when the network connection between the two
fault domains is lost.
Analytics Nodes
Analytics nodes consist of a primary node, primary replica node, and data nodes.
When you activate continuous availability, you protect vRealize Operations Manager from
data loss if an entire fault domain is lost. If node pairs are lost across fault domains, there
may be permanent data loss.
Deploy analytics nodes, within each fault domain, to separate hosts to reduce the chance
of data loss if a host fails. You can use DRS anti-affinity rules to ensure that the vRealize
VMware,
Operations Manager nodes remain on separate hosts.
Collector Group
In vRealize Operations Manager, you can create a collector group. A collector group is a
collection of nodes (Cloud Proxy, and analytics nodes). You can assign adapters to a collector
group, rather than assigning an adapter to a single node.
Note A collector group must contain the same type of nodes. You cannot mix Cloud Proxy,
and analytics nodes in a collector group.
When activating continuous availability, collector groups can be created to collect data
from adapters within each fault domain.
Collector groups do not have any correlation with fault domains. The functionality of a
collector group is to collect data and provide it to the analytics nodes, which then vRealize
Operations Manager decides how to keep the data.
If the node running the adapter collection fails, the adapter is automatically moved to
another node in the collector group.
Theoretically, you can install collectors in any place, provided the networking requirements
are being met. However, from a failover perspective, it is not recommended to put all the
collectors within a single fault domain. If all the collectors are directed to a single fault
domain, vRealize Operations Manager stops receiving data if a network outage occurs
affecting that fault domain.
Assign all normal adapters to collector groups, and not to individual nodes. Hybrid
adapters require a two-way communication between the adapter and the monitored
endpoint.
For more information about adapters, see Adapter and Management Packs Considerations.
Normal Adapters
VMware,
VMware vSphere
Hybrid Adapters
Hybrid adapters require a two-way communication between the adapter and the
monitored endpoint.
You must deploy hybrid adapters to a dedicated remote collector. Configure only one
hybrid adapter type for each remote collector. You cannot configure hybrid adapters as
part of a collector group. For example, two vRealize Operations for Published
Applications adapters can exist on the same node, and two vRealize Operations for
Horizon adapters can exist on the same node, but a vRealize Operations for Published
Applications adapter and a vRealize Operations for Horizon adapter cannot exist on the
same node.
For information about the components to install on each server profile in your deployment,
and the required hardware specifications, see the KB article vRealize Operations Manager
Sizing Guidelines (KB 2093783).
CPU requirements are 2.0 GHz minimum. 2.4 GHz is recommended. Storage requirements are
based on the maximum supported resources for each node.
vRealize Operations Manager has a high CPU requirement. In general, the more physical CPU
that you assign to the analytics cluster, the better the performance. The cluster will perform
better if the nodes stay within a single socket.
VMware,
Port Requirements for vRealize Operations Manager
The most up-to-date technical information about ports can be found on Ports and Protocol.
Consumption Activities
This section contains standard consumption procedures for the capabilities being deployed
as a part of the service.
From the Quick Start page, click Workbench in the Troubleshoot section.
The Troubleshooting Workbench home page displays a search bar, a list of active
troubleshooting sessions, and recent searches. You can open a session to find potential
evidences for your problems.
All troubleshooting workbench sessions that are active in the current login are displayed in
the Active Troubleshooting section of the Troubleshooting Workbench home page.
Changes that you make to the scope, time, or potential evidences in the troubleshooting
workbench page are not be saved on logging out. The next time you log in to , the sessions
that were earlier under Active Troubleshooting are displayed under Recent Searches.
You can start the Troubleshooting Workbench with an alert in context from the alert
information page, or you can search for an object and start the Troubleshooting Workbench
VMware,
to investigate known or unknown issues related to the object.
To start the Troubleshooting Workbench with an alert in context, in the menu, click
Troubleshoot > Alerts. Click an alert from the alert list and click Launch
Workbench from the Potential Evidence tab.
To start the Troubleshooting Workbench with an alert in context, in the menu, click
Environment, then select a group, custom data center, application, or inventory object.
Click the object and then the Alerts tab. Click Launch Workbench from the Potential
Evidence tab.
To investigate known or unknown issues with an object in context, search for the object or
click
Environment to locate the object and click Troubleshoot on the top.
You look for potential evidences of a problem within a specific scope and time range.
The Selected Scope control on the left of the Troubleshooting Workbench page is
where you vary the scope. You can vary the scope in the following ways:
You can select only the object that you are investigating, or include several upstream
and downstream relationships by increasing the scope. As you increase the scope,
more objects are displayed in the inventory tree.
You can select a custom scope to include objects of your choice. Click Custom to
open an interactive window where you use the pointer to visually rearrange your
objects, view
relationships and add peers to modify the relationships. To see details about the object,
place the pointer for a few seconds above the object. You can reset a custom scope to
start all over again.
You can use the drop-down menu to narrow down the type of objects displayed.
The default time range is two hours, and thirty minutes before the alert triggered when the
context is alert based, or one hour before the current time, when the context is object based.
You can select a different time range, up to seven days, using the date and time controls.
The potential evidences are based on Events, Property Changes, and Anomalous Metrics
which are displayed on the right of the Troubleshooting Workbench change in the
Potential Evidence tab. Information in these sections is displayed as cards.
Events
Displays events, based on a change in the metrics. Events for metrics that have
VMware,
breached the usual behavior, and major events that have occurred within the selected
scope and time are displayed. The cards are based on dynamic thresholds for a metric,
which is calculated based on historical and incoming data.
Property Changes
Displays important configuration changes that occurred within the selected scope and
time. Both single and multiple property changes are displayed. For multiple property
changes, you can view the latest and previous changes.
Anomalous Metrics
Metrics which have shown drastic changes within the selected scope and time. Ranks the
results based on the degree of change. The most recent anomalous metric based on a
time- sliced comparison in the current time range is given the highest weightage.
You can explore more details about any of the cards displayed in the Troubleshooting
Workbench by clicking the card pop-out option. You can close a card and it is no longer
displayed in the Troubleshooting Workbench. To load the cards again, click Go in the Time
Range.
When you pin a metric, it appears in the Metrics tab of the Troubleshooting Workbench.
You can perform further investigation on the metric in the Metrics tab. You can compared
the pinned metrics with other metrics displayed in the tab. You can close the pinned
metrics and browse other metrics for specific objects.
Similarly, the Alerts and Events tabs are where you investigate the potential evidences
further. You can filter and group alerts. If you want to focus on the alerts for a specific
object in your selected scope, you can clear all the alerts and then click the object in the
scope.
Intended Audience
This information is intended for anyone who wants to install and configure vRealize
Operations by using a virtual appliance deployment. The information is written for
experienced virtual
machine administrators who are familiar with enterprise management applications and data
center operations.
VMware,
Note All unit conversions in vRealize Operations are based on 1024 factor.
Metric Definitions in
Metric definitions provide an overview of how the metric value is calculated or derived.
If you understand the metric, you can better tune to display results that help you to
manage your environment.
Property Definitions in
Properties are attributes of objects in the environment. You use properties in symptom
definitions. You can also use properties in dashboards, views, and reports.
uses adapters to collect properties for target objects in your environment. Property
definitions for all objects connected through the vCenter adapter are provided. The
properties collected depend on the objects in your environment.
generates Object Type Classification and Subclassification properties for every object. You
can use object type classification properties to identify whether an object is an adapter
instance, custom group, application, tier, or a general object with property values
ADAPTER_INSTANCE, GROUP, BUSINESS_SERVICE, TIER, or GENERAL, respectively.
Alert definitions are a combination of symptoms and recommendations that identify problem
areas in and generate alerts on which you act for those areas.
When your customers experience performance problems and call you to resolve the problem,
the data that collects and processes is presented to you in graphical forms. You can then
compare and contrast objects, understand the relationship between objects, and determine
the root cause of problems.
A generated alert notifies you when objects in your environment are experiencing problems.
If you resolve the problem based on the alert before your customers notice, then you avoid
service interruptions.
You can investigate the problems that generate alerts or that result in calls by using the
Alerts, Events, Details, and Environment tabs. If you find the root cause of the problem,
VMware,
you might be able to resolve the problem by running an action. The actions change objects in
the target system, for example, the VMware vCenter Server® system, from .
VMware,
Appendix A: Process Routine Mapping
The following table provides the full mapping of process routines for all IT Outcomes related
to the VMware solution.
IT Outcomes
Centralized dashboards
Centralised dashboards
The following table provides the full mapping of all IT Capabilities linked to the VMware
solution and the process routines that contribute to the delivery of the IT Capability.
IT Capabilities
VMware,