Host ha and multiple management server internal load balancing (#223)

Spaceman1984 · sureshanaparti · shwstppr · web-flow · commit a6ce2d8a366c · 2021-10-04T13:21:12.000+02:00
* Cleanup

* Added host HA detail

* Added multiple management servers internal load balancing

* Shortened FSM state descriptions

* spelling error

Co-authored-by: sureshanaparti &lt;12028987+sureshanaparti@users.noreply.github.com&gt;

* Moved note about algorithms

* Removed heading

* Added agent name

* Simplified explanation

* Changes states to uppercase

* Update source/adminguide/reliability.rst

Co-authored-by: Abhishek Kumar &lt;abhishek.mrt22@gmail.com&gt;

Co-authored-by: sureshanaparti &lt;12028987+sureshanaparti@users.noreply.github.com&gt;
Co-authored-by: Abhishek Kumar &lt;abhishek.mrt22@gmail.com&gt;
diff --git a/source/adminguide/reliability.rst b/source/adminguide/reliability.rst
@@ -61,25 +61,82 @@ still available but the system VMs will not be able to contact the
 management server.
 
 
-HA-Enabled Virtual Machines
----------------------------
+Multiple Management Servers Support on agents
+---------------------------------------------
 
-The user can specify a virtual machine as HA-enabled. By default, all
-virtual router VMs and Elastic Load Balancing VMs are automatically
-configured as HA-enabled. When an HA-enabled VM crashes, CloudStack
-detects the crash and restarts the VM automatically within the same
-Availability Zone. HA is never performed across different Availability
-Zones. CloudStack has a conservative policy towards restarting VMs and
-ensures that there will never be two instances of the same VM running at
-the same time. The Management Server attempts to start the VM on another
-Host in the same cluster.
+In a Cloudstack environment with multiple management servers, an agent can be
+configured, based on an algorithm, to which management server to connect to.
+This can be useful as an internal loadbalancer or for high availability.
+An administrator is responsible for setting the list of management servers and
+choosing a sorting algorithm using global settings.
+The management server is responsible for propagating the settings to the
+connected agents (running inside of the Secondary Storage
+Virtual Machine, Console Proxy Virtual Machine or the KVM hosts).
 
-HA features work with iSCSI or NFS primary storage. HA with local
-storage is not supported.
+The three global settings that need to be configured are the following:
+
+- hosts: a comma seperated list of management server IP addresses
+- indirect.agent.lb.algorithm: The algorithm for the indirect agent LB
+- indirect.agent.lb.check.interval: The preferred host check interval
+  for the agent's background task that checks and switches to an agent's
+  preferred host.
+
+These settings can be configured from the global settings page in the UI or
+using the updateConfiguration API call.
+
+The indirect.agent.lb.algorithm setting supports following algorithm options:
 
+- static: Use the list of management server IP addresses as provided.
+- roundrobin: Evenly spread hosts across management servers, based on the
+  host's id.
+- shuffle: Pseudo Randomly sort the list (this is not recommended for
+  production).
 
-HA for Hosts
-------------
+.. note:: 
+   The 'static' and 'roundrobin' algorithms, strictly checks for the order as
+   expected by them, however, the 'shuffle' algorithm just checks for content
+   and not the order of the comma separate management server host addresses.
+
+Any changes to the global settings - `indirect.agent.lb.algorithm` and
+`host` does not require restarting of the management server(s) and the
+agents. A change in these global settings will be propagated to all connected
+agents.
+
+The comma-separated management server list is propagated to agents in
+following cases:
+- An addition of an agent (including ssvm, cpvm system VMs).
+- Connection or reconnection of an agent to a management server.
+- After an administrator changes the 'host' and/or the
+'indirect.agent.lb.algorithm' global settings.
+
+On the agent side, the 'host' setting is saved in its properties file as:
+`host=<comma separated addresses>@<algorithm name>`.
+
+From the agent's perspective, the first address in the propagated list
+will be considered the preferred host. A new background task can be
+activated by configuring the `indirect.agent.lb.check.interval` which is
+a cluster level global setting from CloudStack and administrators can also
+override this by configuring the 'host.lb.check.interval' in the
+`agent.properties` file.
+
+When an agent gets a host and algorithm combination, the host specific
+background check interval is also sent and is dynamically reconfigured
+in the background task without need to restart agents.
+
+To make things more clear, consider this example:
+Suppose an environment which has 3 management servers: A, B and C and
+3 KVM agents.
+
+Setting 'host' = 'A,B,C', agents will receive lists depending on
+'direct.agent.lb' value:
+
+'static': Each agent will receive the list: 'A,B,C'
+'roundrobin': First agent receives: 'A,B,C', second agent 
+receives: 'B,C,A', third agent receives: 'C,B,A'
+'shuffle': Each agent will receive a list in random order.
+
+HA-Enabled Virtual Machines
+---------------------------
 
 The user can specify a virtual machine as HA-enabled. By default, all
 virtual router VMs and Elastic Load Balancing VMs are automatically
@@ -96,7 +153,7 @@ storage is not supported.
 
 
 Dedicated HA Hosts
-~~~~~~~~~~~~~~~~~~
+------------------
 
 One or more hosts can be designated for use only by HA-enabled VMs that
 are restarting due to a host failure. Setting up a pool of such
@@ -126,6 +183,107 @@ that you want to dedicate to HA-enabled VMs.
    a crash.
 
 
+HA-Enabled Hosts
+----------------
+
+The user can specify a host as HA-enabled, In the event of a host 
+failure, attemps will be made to recover the failed host by first 
+issuing some OOBM commands. If the host recovery fails the host will be
+fenced and placed into maintenance mode. To restore the host to normal 
+operation, manual intervention would then be required.
+
+Out of band management is a requirement of HA-Enabled hosts and has to be 
+confiured on all intended participating hosts.
+(see `“Out of band management” <hosts.html#out-of-band-management>`_).
+
+Host-HA has granular configuration on a host/cluster/zone level. In a large 
+environment, some hosts from a cluster can be HA-enabled and some not, 
+
+Host-HA uses a state machine design to manage the operations of recovering
+and fencing hosts. The current status of a host is reported when quering a 
+specific host.
+
+Timely health investigations are done on HA-Enabled hosts to monitor for
+any failures. Specific thresholds can be set for failed investigations,
+only when it’s exceeded, will the host transition to a different state.
+
+Host-HA uses both health checks and activity checks to make decisions on 
+recovering and fencing actions. Once determined that the host is in faulty 
+state (health checks failed) it runs activity checks to figure out if there is 
+any disk activity on the VMs running on the specific host.
+
+The HA Resource Management Service manages the check/recovery cycle including
+periodic execution, concurrency management, persistence, back pressure and 
+clustering operations. Administrators associate a provider with a partition 
+type (e.g. KVM HA Host provider to clusters) and may override the provider on a
+per-partition (i.e. zone, cluster, or pod) basis. The service operates on all
+resources of the type supported by the provider contained in a partition.
+Administrators can also enable or disable HA operations globally or on a
+per-partition basis.
+
+Only one (1) HA provider per resource type may be specified for a partition.
+Nested HA providers by resource type is not supported (e.g. a pod
+specifying an HA resource provider for hosts and a containing cluster
+specifying a HA resource provider for hosts). The service is designed to be
+opt-in where by only resources with a defined provider and HA enabled will be
+managed.
+
+For each resource in an HA partition, the HA Resource Management Service
+maintains and persists an "Finite State Machine" composed of the following
+states:
+
+- AVAILABLE - The feature is enabled and Host-HA is available.
+- SUSPECT - There are health checks failing with the host.
+- CHECKING - Activity checks are being performed.
+- DEGRADED - The host is passing the activity check ratio and still providing
+  service to the end user, but it cannot be managed from the CloudStack
+  management server.
+- RECOVERING - The Host-HA framework is trying to recover the host by issuing
+  OOBM jobs.
+- RECOVERED - The Host-HA framework has recovered the host successfully.
+- FENCING - The Host-HA framework is trying to fence the host by issuing OOBM
+  jobs.
+- FENCED - The Host-HA framework has fenced the host successfully.
+- DISABLED - The feature is disabled for the host.
+- INELIGIBLE - The feature is enabled, but it cannot be managed successfully by
+  the Host-HA framework. (OOBM is possibly not configured properly)
+
+When HA is enabled for a partition, the HA state of all contained resources 
+will be transitioned from DISABLED to AVAILABLE. Based on the state models, the
+following failure scenarios and their responses will be handled by the HA 
+resource management service:
+
+- Activity check operation fails on the resource: Provide a semantic in the 
+  activity check protocol to express that an error while performing the 
+  activity check and a reason for the failure (e.g. unable to access the NFS 
+  mount). If the maximum number of activity check attempts has not been 
+  exceeded, the activity check will be retried.
+
+- Slow activity check operation: After a configurable timeout, the HA resource
+  management service abandons the check. The response to this condition would 
+  be the same as a failure to recover the resource.
+
+- Traffic flood due to a large number of resource recoveries: The HA resource 
+  management service must limit the number of concurrent recovery operations 
+  permitted to avoid overwhelming the management server with resource status 
+  updates as recovery operations complete.
+
+- Processor/memory starvation due to large number of activity check 
+  operations: The HA resource management service must limit the number of 
+  concurrent activity check operations permitted per management server to 
+  prevent checks from starving other management server activities of scarce
+  processor and/or memory resources.
+
+- A SUSPECT, CHECKING, or RECOVERING resource passes a health check before the
+  state action completes: The HA resource management service refreshes the HA
+  state of the resource before transition. If it does not match the expected
+  current state, the result of state action is ignored.
+
+For further information around the inner workings of Host HA, refer
+to the design document at 
+`https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA 
+<https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA>`_
+
 Primary Storage Outage and Data Loss
 ------------------------------------