A Basic understanding of screen on Centos.
In cases where yet your current ssh session gets disconnected, then do the following:
# screen -ls
command.# screen -r XXXX
, where XXXX
is the last session ID of the last screen.From version 5.1.0 onwards, the process for configuring HA has been simplified, i.e., the join-cluster operation is now a single step operation, which does not require you to perform the following steps:
ha.conf
, and then copy the ha.conf
file to the node that you want to configure as a secondary node.Important: You cannot parallelly join nodes to a HA cluster in version 5.1.0. Therefore, in version 5.1.0 you can only join nodes sequentially to a HA cluster.
Process that you can use for configuring HA for version 5.1.0 and later:
csadm
) to configure HA for your FortiSOAR™ instances. For more information, see the FortiSOAR™ Admin CLI chapter. Connect to your VM as a root user and run the following command:# csadm ha
# csadm ha join-cluster --status <active, passive> --role <primary, secondary> --primary-node <DNS_Resolvable_Primary_Node_Name>
# csadm ha join-cluster --status <active, passive> --role <primary, secondary> --primary-node <DNS_Resolvable_Primary_Node_Name> --primary-node-ssh-key <Path_To_Pem_File>
list-nodes
command does not display that a node is in the process of joining the cluster. The newly added node will be displayed in the list-nodes
command only after it has been added to the HA cluster.join-cluster
operation) to a cluster having some connectors installed, then you are required to manually reinstall the connectors that were present on the existing node on the new node.Alternative process that can be followed to configure HA:
# csadm ha
csadm ha export-conf
command to export the configuration details of the active primary node to a configuration file named ha.conf
.ha.conf
file from the active primary node to the node that you want to configure as a secondary node.# csadm ha whitelist --nodes
--nodes
argument. pg_hba.conf
file.# csadm ha join-cluster --status <active, passive> --role <primary, secondary> --conf <location of the ha.conf file>
# csadm ha join-cluster --status passive --role secondary --conf tmp/ha.conf
csadm ha join-cluster
command without whitelisting the hostnames of the secondary nodes, then you will get an error such as, Failed to verify....
list-nodes
command does not display that a node is in the process of joining the cluster. The newly added node will be displayed in the list-nodes
command only after it has been added to the HA cluster.join-cluster
operation) to a cluster having some connectors installed, then you are required to manually reinstall the connectors that were present on the existing node on the new node.Important
In the case of an HA cluster, proxy settings get replicated only for FortiSOAR™ services on the secondary/passive nodes. OS services or commands such as 'yum', 'curl', or 'wget' do not honor the proxy settings of the primary node. Therefore, to configure proxy settings on the secondary node, you can either configure the proxy setting when the FortiSOAR Configuration Wizard is run on the first login of the 'csadmin' user (using SSH) or by using the csadm network {set-https-proxy|set-http-proxy|set-no-proxy}
command.
Certain operations, such as takeover, join cluster, etc. might take a longer period of time to run, therefore you must ensure that your ssh session does not timed out by entering into the screen
mode. For more information, see Handling session timeouts.
You can get help for the csadm ha
command and subcommands using the --help
parameter.
Note
It is recommended that you perform operations such as join-cluster, leave-cluster, etc sequentially. For example, when you are adding nodes to a cluster, it is recommended that you add the nodes one after the other rather than parallelly.
The following table lists all the subcommands that you can use with the csadm ha
command:
Subcommand | Brief Description |
---|---|
list-nodes | Lists all the nodes that are available in the cluster with their respective node names and ID, status, role, and a comment that contains information about which nodes have joined the specific HA cluster and the primary server. ![]() You can filter nodes for specific status, role, etc. For example, if you want to retrieve only those nodes that are active use the following command: csadm ha list-nodes --active , or if you want to retrieve secondary active nodes, then use the following command: csadm ha list-nodes --active --secondary . Note: The list-nodes command will not display a node that is in the process of joining the cluster, i.e., it will display the newly added node only after it has been added to the HA cluster. |
export-conf | Exports the configuration of details of the active primary node to a configuration file named ha.conf . For more details on export-conf , see the Process for configuring HA section. |
whitelist | Whitelists the hostnames of the secondary nodes in the HA cluster on the active primary node. For more details on whitelist , see the Process for configuring HA section. Important: Ensure that incoming TCP traffic from the IP address(es) [xxx.xxx.xx.xxx] of your FortiSOAR™ instance(s) on port(s) 5432, 9200, and 6379 is not blocked by your organization's firewall. |
join-cluster | Adds a node to the cluster with the role and status you have specified. For more details on join-cluster , see the Process for configuring HA section. |
firedrill | Tests your disaster recovery configuration. You can perform a firedrill on a secondary (active or passive) node only. Running the firedrill suspends the replication to the node's database and sets it up as a standalone node pointing to its local database. Since the firedrill is primarily performed to ensure that the database replication is set up correctly, hence it is not applicable when the database is externalized. Important: The node on which a firedrill is being performed will have their schedules and playbooks stopped, i.e., celerybeatd will be disabled on this node. This is done intentionally as any configured schedules or playbooks should not run when the node is in the firedrill mode. Once you have completed the firedrill, ensure that you perform restore, to get the nodes back to replication mode. |
restore | Restores the node back to its original state in the cluster after you have performed a firedrill. That is, csadm ha restore restores the node that was converted to the active primary node after the firedrill back to its original state of a secondary node. The restore command discards all activities such as record creation, that is done during the firedrill since that data is assumed to be test data. This command will restore the database from the content backed up prior to firedrill. |
takeover | Performs a takeover when your active primary node is down. Therefore, you must run the csadm ha takeover command on the secondary node that you want to configure as your active primary node. |
list-commands | Lists all pending, in-progress, or failed commands that were propagated across the cluster nodes. You can filter this command for a specific nodeID or state. For example, if you want to retrieve a list of failed commands use the following command: csadm ha list-commands --status failed . In case of failed commands, you must check the reason for failure and re-run the failed command manually after resolving the error. |
leave-cluster | Removes a node from the cluster and the node goes back to the state it was in before joining the cluster. |
firedrill
operation at regular intervals to ensure that the passive node can takeover successfully, when required.License Manager
page, you will display the information about the nodes in the cluster, if you have added a secondary node as shown in the following image:.tgz
file of the connector on all the nodes within the HA cluster..tgz
file on all the nodes, you must ensure that you select the Delete all existing versions checkbox. You must also ensure that you have uploaded the same version of the connector to all the nodes.Use the csadm ha takeover
command to perform a takeover when your active primary node is down. Run this command on the secondary node that you want to configure as your active primary node.
From version 5.1.0 onwards, takeover
is a single-step operation, i.e., you do not need to manually reconfigure all the nodes in the cluster to point to the new active primary node. The takeover operation reconfigures the nodes to point to the new active primary node during the process.
However, if during takeover
you specify no to the Do you want to invoke ‘join-cluster’ on other cluster nodes?
prompt, or if any node(s) is not reachable, then you will have to reconfigure all the nodes (or the node(s) that were not reachable) in the cluster to point to the new active primary node using the csadm ha join-cluster
command.
In case of an internal database cluster, when the failed primary node comes online after the takeover, it still thinks of itself as the active primary node with all its services running. In case of an external database cluster, when the failed primary node comes online after the takeover, it detects its status as "Faulted" and disables all its services. In both cases, run the csadm ha join-cluster
command to point all the nodes to the new active primary node. For details on join-cluster
, see Process for configuring HA.
You can tune the following configurations:
max_wal_senders
= 10 10
.wal_keep_segments
= 320 max_wal_senders
and wal_keep_segments
attributes are applicable when the database is internal.Every secondary/passive node needs one wal sender process on the primary node, which means that the above setting can configure a maximum of 10 secondary/passive nodes.
If you have more than 10 secondary/passive nodes, then you need to edit the value of the max_wal_senders
attribute in the /var/lib/pgsql/12/data/postgresql.conf
file on the primary node and restart the PostgreSQL server using the following command: systemctl restart postgresql-12
Note: You might find multiple occurrences of max_wal_senders
attribute in the postgresql.conf
file. You always need to edit last occurrence of the max_wal_senders
attribute in the postgresql.conf
file.
The wal_keep_segments
attribute has been set to 320, which means that the secondary nodes can lag behind by the maximum of 5GB. If the lag is more than 5GB, then replication will not work properly, and you will require to reconfigure the secondary node by running the join-cluster
command
Also note that Settings changes that are done in any configuration file on an instance, such as changing the log level, etc., apply only to that instance. Therefore, if you want to apply the changed setting to all the node, you have to make those changes across all the cluster nodes.
The clustered instances should be fronted by a TCP Load Balancer such as HAProxy, and clients should connect to the cluster using the address of the proxy.
The following steps list out the steps to install HAProxy on a CentOS Virtual Machine:
# yum install haproxy
/etc/haproxy/haproxy.cfg
file, add the policy as shown in the following image:sudo firewall-cmd --zone=public --add-port=<portspecifiedwhilebindingHAProxy>/tcp --permanent
sudo firewall-cmd --reload
haproxy
using the following command:# systemctl restart haproxy
When you have initiated a publish for any module management activity and you are accessing your HA cluster with one or more active secondary nodes using HAProxy, then you might observe the following behaviors:
This issue occurs when you are performing join-cluster
of any node and that node sticks at service restart, specifically at PostgreSQL restart.
Resolution
Terminate the join-cluster
process and retry join-cluster
using an additional parameter --fetch-fresh-backup
.
If your primary node is halted due to a system crash or other such events, and a new cluster is made with the other nodes in the HA cluster, the list-nodes
command on other nodes will display that the primary node is in the Faulted
state. Since the administrator has triggered takeover on other cluster nodes, the administrator will be aware of the faulted primary node. Also, note that even after the primary node resumes, post the halt, the primary node still remains the primary node of its own cluster, and therefore, after the resume, the list-nodes
command on the primary node will display this node as Primary Active
.
Resolution
To fix the HA cluster to have only one node as primary active node, do the following:
leave-cluster
, which will remove this node from the HA cluster.join-cluster
command to join this node to the HA cluster with the new primary node.You are unable to join a node to an HA cluster using the join-cluster
command when you have enabled a proxy using which clients should connect to the HA cluster.
Resolution
Run the following commands on your primary node:
# sudo firewall-cmd --zone=trusted --add-source=<CIDR> --add-port=<ElasticSearchPort>/tcp --permanent
# sudo firewall-cmd --reload
For example,
# sudo firewall-cmd --zone=trusted --add-source=64.39.96.0/20 --add-port=9200/tcp --permanent
# sudo firewall-cmd --reload