Recently a Customer Asked
“How can I simulate a failure of the Active half of an Active/Passive Cluster to test the Failover to the Passive half?”
In this blog post we explore the best practices for simulating a failure of the Active cluster in an Active/Passive Composite Cluster.
Firstly, the Documentation
Here is the online page for that procedure:
https://docs.continuent.com/tungsten-clustering-6.1/operations-composite-failover.html
Secondly, the Summary
Below are the simple steps needed to simulate failure and then invoke a failover from Active to Passive:
shell> ssh tungsten@{node_on_the_passive_side}
shell> cctrl -multi
cctrl> datasource {primary_cluster_here} fail
cctrl> failover
Next, the Detailed Example
Now let’s demonstrate the procedure, step-by-step, to simulate failure and then invoke a failover from Active to Passive:
Of critical importance is the place that you run the commands from. You must start by logging in to a node on the passive, “working” side, i.e. db16 in this example.
Run the `cctrl -multi` command, and use the composite service:
tungsten@db16-demo:/home/tungsten # cctrl -multi
Tungsten Clustering 7.0.0 build 573
emea: session established, encryption=true, authentication=true
[LOGICAL] / > ls
global_active_passive
emea
usa
[LOGICAL] / > use global_active_passive
[LOGICAL] /global_active_passive > ls
COORDINATOR
emea:COORDINATOR
usa:COORDINATOR
...
DATASOURCES:
+---------------------------------------------------------------------------------+
|emea(composite master:ONLINE) |
|STATUS [OK] [2021/11/03 04:14:11 PM UTC] |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|usa(composite slave:ONLINE) |
|STATUS [OK] [2021/11/03 04:10:22 PM UTC] |
+---------------------------------------------------------------------------------+
Signal the Managers that the Primary cluster is “broken”:
[LOGICAL] /global_active_passive > datasource emea fail
WARNING: This is an expert-level command:
Incorrect use may cause data corruption
or make the cluster unavailable.
Do you want to continue? (y/n)> y
COMPOSITE DATA SOURCE 'emea' IS NOW IN THE FAILED STATE
[LOGICAL] /global_active_passive > ls
COORDINATOR
emea:COORDINATOR
usa:COORDINATOR
...
DATASOURCES:
+---------------------------------------------------------------------------------+
|emea(composite master:FAILED(MANUALLY-FAILED)) |
|STATUS [CRITICAL] [2021/11/03 04:21:56 PM UTC] |
|REASON[MANUALLY-FAILED] |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|usa(composite slave:ONLINE) |
|STATUS [OK] [2021/11/03 04:10:26 PM UTC] |
+---------------------------------------------------------------------------------+
Move the Primary role to the currently passive, “working” cluster:
As a side benefit to being in AUTOMATIC mode, the Managers will recover the “broken” cluster as well!
Finally, the Wrap-Up
In this blog post we demonstrated the best practice for simulating a failure of the Active cluster in an Active/Passive Composite Cluster.
If you’re a customer and have any questions, please feel free to reach out to support. If you’re not a customer and would like to learn more about this, please contact us.
Comments
Add new comment