Tuesday, April 22, 2014

Veritas Cluster Server Notes

Definition of a Cluster

A clustered environment includes multiple components configured such that if one
component fails, its role can be taken over by another component to minimize or
avoid service interruption.

The term cluster, simply defined, refers to multiple independent systems or
domains connected into a management framework for increased availability.
Clusters have the following components:
• Up to 32 systems—sometimes referred to as nodes or servers
Each system runs its own operating system.
• A cluster interconnect, which allows for cluster communications
• A public network, connecting each system in the cluster to a LAN for client
access
• Shared storage (optional), accessible by each system in the cluster that needs to
run the application

Definition of Service Group

A service group is a virtual container that enables VCS to manage an application
service as a unit. The service group contains all the hardware and software
components required to run the service, which enables VCS to coordinate failover
of the application service resources in the event of failure or at the administrator’s
request.
A service group is defined by these attributes:
• The cluster-wide unique name of the group
• The list of the resources in the service group, usually determined by which
resources are needed to run a specific application service
• The dependency relationships between the resources
• The list of cluster systems on which the group is allowed to run
• The list of cluster systems on which you want the group to start automatically

Service Group Types
Service groups can be one of three types:
• Failover
This service group runs on one system at a time in the cluster. Most application
services, such as database and NFS servers, use this type of group.
• Parallel
This service group runs simultaneously on more than one system in the cluster.
This type of service group requires an application that can be started on more
than one system at a time without threat of data corruption.
• Hybrid (4.x)
A hybrid service group is a combination of a failover service group and a
parallel service group used in VCS 4.x replicated data clusters (RDCs), which
are based on VERITAS Volume Replicator. This service group behaves as a
failover group within a defined set of systems, and a parallel service group
within a different set of systems. RDC configurations are described in the
VERITAS Disaster Recovery Using VVR and Global Cluster Option course.

Definition of a Resource
Resources are VCS objects that correspond to hardware or software components,
such as the application, the networking components, and the storage components.
VCS controls resources through these actions:
• Bringing a resource online (starting)
• Taking a resource offline (stopping)
• Monitoring a resource (probing)
Resource Categories
• Persistent
– None
VCS can only monitor persistent resources—they cannot be brought online
or taken offline. The most common example of a persistent resource is a
network interface card (NIC), because it must be present but cannot be
stopped. FileNone and ElifNone are other examples.
– On-only
VCS brings the resource online if required, but does not stop it if the
associated service group is taken offline. NFS daemons are examples of
on-only resources. FileOnOnly is another on-only example.
• Nonpersistent, also known as on-off
Most resources fall into this category, meaning that VCS brings them online
and takes them offline as required. Examples are Mount, IP, and Process.
FileOnOff is an example of a test version of this resource.

Resource Dependencies
Resources depend on other resources because of application or operating system
requirements. Dependencies are defined to configure VCS for these requirements.
Dependency Rules
These rules apply to resource dependencies:
• A parent resource depends on a child resource. In the diagram, the Mount
resource (parent) depends on the Volume resource (child). This dependency
illustrates the operating system requirement that a file system cannot be
mounted without the Volume resource being available.
• Dependencies are homogenous. Resources can only depend on other
resources.
• No cyclical dependencies are allowed. There must be a clearly defined
starting point.

Agents: How VCS Controls Resources
Agents are processes that control resources. Each resource type has a
corresponding agent that manages all resources of that resource type. Each cluster
system runs only one agent process for each active resource type, no matter how
many individual resources of that type are in use.
Agents control resources using a defined set of actions, also called entry points.
The four entry points common to most agents are:
• Online: Resource startup
• Offline: Resource shutdown
• Monitor: Probing the resource to retrieve status
• Clean: Killing the resource or cleaning up as necessary when a resource fails to
be taken offline gracefully
The difference between offline and clean is that offline is an orderly termination
and clean is a forced termination. In UNIX, this can be thought of as the difference
between exiting an application and sending the kill -9 command to the
process.
Each resource type needs a different way to be controlled. To accomplish this, each
agent has a set of predefined entry points that specify how to perform each of the
four actions. For example, the startup entry point of the Mount agent mounts a
block device on a directory, whereas the startup entry point of the IP agent uses the
ifconfig command to set the IP address on a unique IP alias on the network
interface.
VCS provides both predefined agents and the ability to create custom agents.

Cluster Communication
VCS requires a cluster communication channel between systems in a cluster to
serve as the cluster interconnect. This communication channel is also sometimes
referred to as the private network because it is often implemented using a
dedicated Ethernet network.
VERITAS recommends that you use a minimum of two dedicated communication
channels with separate infrastructures—for example, multiple NICs and separate
network hubs—to implement a highly available cluster interconnect. Although
recommended, this configuration is not required.
The cluster interconnect has two primary purposes:
• Determine cluster membership: Membership in a cluster is determined by
systems sending and receiving heartbeats (signals) on the cluster interconnect.
This enables VCS to determine which systems are active members of the
cluster and which systems are joining or leaving the cluster.
In order to take corrective action on node failure, surviving members must
agree when a node has departed. This membership needs to be accurate and
coordinated among active members—nodes can be rebooted, powered off,
faulted, and added to the cluster at any time.
• Maintain a distributed configuration: Cluster configuration and status
information for every resource and service group in the cluster is distributed
dynamically to all systems in the cluster.
Cluster communication is handled by the Group Membership Services/Atomic
Broadcast (GAB) mechanism and the Low Latency Transport (LLT) protocol

Low-Latency Transport
VERITAS uses a high-performance, low-latency protocol for cluster
communications. LLT is designed for the high-bandwidth and low-latency needs
of not only VERITAS Cluster Server, but also VERITAS Cluster File System, in
addition to Oracle Cache Fusion traffic in Oracle RAC configurations. LLT runs
directly on top of the Data Link Provider Interface (DLPI) layer over Ethernet and
has several major functions:
• Sending and receiving heartbeats over network links
• Monitoring and transporting network traffic over multiple network links to
every active system
• Balancing cluster communication load over multiple links
• Maintaining the state of communication
• Providing a nonroutable transport mechanism for cluster communications

Group Membership Services/Atomic Broadcast (GAB)
GAB provides the following:
• Group Membership Services: GAB maintains the overall cluster
membership by way of its Group Membership Services function. Cluster
membership is determined by tracking the heartbeat messages sent and
received by LLT on all systems in the cluster over the cluster interconnect.
Heartbeats are the mechanism VCS uses to determine whether a system is an
active member of the cluster, joining the cluster, or leaving the cluster. If a
system stops sending heartbeats, GAB determines that the system has departed
the cluster.
• Atomic Broadcast: Cluster configuration and status information are
distributed dynamically to all systems in the cluster using GAB’s Atomic
Broadcast feature. Atomic Broadcast ensures all active systems receive all
messages for every resource and service group in the cluster.

The Fencing Driver
The fencing driver prevents multiple systems from accessing the same Volume
Manager-controlled shared storage devices in the event that the cluster
interconnect is severed. In the example of a two-node cluster displayed in the
diagram, if the cluster interconnect fails, each system stops receiving heartbeats
from the other system.
GAB on each system determines that the other system has failed and passes the
cluster membership change to the fencing module.
The fencing modules on both systems contend for control of the disks according to
an internal algorithm. The losing system is forced to panic and reboot. The
winning system is now the only member of the cluster, and it fences off the shared
data disks so that only systems that are still part of the cluster membership (only
one system in this example) can access the shared storage.
The winning system takes corrective action as specified within the cluster
configuration, such as bringing service groups online that were previously running
on the losing system.

The High Availability Daemon
The VCS engine, also referred to as the high availability daemon (had), is the
primary VCS process running on each cluster system.
HAD tracks all changes in cluster configuration and resource status by
communicating with GAB. HAD manages all application services (by way of
agents) whether the cluster has one or many systems.
Building on the knowledge that the agents manage individual resources, you can
think of HAD as the manager of the agents. HAD uses the agents to monitor the
status of all resources on all nodes.
This modularity between had and the agents allows for efficiency of roles:
• HAD does not need to know how to start up Oracle or any other applications
that can come under VCS control.
• Similarly, the agents do not need to make cluster-wide decisions.
This modularity allows a new application to come under VCS control simply by
adding a new agent—no changes to the VCS engine are required.
On each active cluster system, HAD updates all the other cluster systems of
changes to the configuration or status.
In order to ensure that the had daemon is highly available, a companion daemon,
hashadow, monitors had and if had fails, hashadow attempts to restart it.
Likewise, had restarts hashadow if hashadow stops.

Maintaining the Cluster Configuration
HAD maintains configuration and state information for all cluster resources in
memory on each cluster system. Cluster state refers to tracking the status of all
resources and service groups in the cluster. When any change to the cluster
configuration occurs, such as the addition of a resource to a service group, HAD
on the initiating system sends a message to HAD on each member of the cluster by
way of GAB atomic broadcast, to ensure that each system has an identical view of
the cluster.
Atomic means that all systems receive updates, or all systems are rolled back to the
previous state, much like a database atomic commit.
The cluster configuration in memory is created from the main.cf file on disk in
the case where HAD is not currently running on any cluster systems, so there is no
configuration in memory. When you start VCS on the first cluster system, HAD
builds the configuration in memory on that system from the main.cf file.
Changes to a running configuration (in memory) are saved to disk in main.cf
when certain operations occur.

Networking
VERITAS Cluster Server requires a minimum of two heartbeat channels for the
cluster interconnect, one of which must be an Ethernet network connection. While
it is possible to use a single network and a disk heartbeat, the best practice
configuration is two or more network links.
Loss of the cluster interconnect results in downtime, and in nonfencing
environments, can result in split brain condition
For a highly available configuration, each system in the cluster must have a
minimum of two physically independent Ethernet connections for the cluster
interconnect:
• Two-system clusters can use crossover cables.
• Clusters with three or more systems require hubs or switches.
• You can use layer 2 switches; however, this is not a requirement.

Shared Storage
VCS is designed primarily as a shared data high availability product; however, you
can configure a cluster that has no shared storage.
For shared storage clusters, consider these requirements and recommendations:
• One HBA minimum for nonshared disks, such as system (boot) disks
To eliminate single points of failure, it is recommended to use two HBAs to
connect to the internal disks and to mirror the system disk.
• One HBA minimum for shared disks
› To eliminate single points of failure, it is recommended to have two
HBAs to connect to shared disks and to use a dynamic multipathing
software, such as VERITAS Volume Manager DMP.
› Use multiple single-port HBAs or SCSI controllers rather than
multiport interfaces to avoid single points of failure.
• Shared storage on a SAN must reside in the same zone as all of the nodes in the
cluster.
• Data residing on shared storage should be mirrored or protected by a hardwarebased
RAID mechanism.
• Use redundant storage and paths.
• Include all cluster-controlled data in your backup planning and
implementation. Periodically test restoration of critical data to ensure that the
data can be restored.








No comments:

Post a Comment