Continuent Blog: How To Simulate a MySQL Cluster Site Failure with Tungsten

Blog

Our team of MySQL database experts regularly blogs on topics that range from MySQL availability, MySQL replication, multi-master MySQL, and MySQL-aware proxies, all the way through to ‘how to’ content for our solutions: Tungsten Clustering, Tungsten Replicator and Tungsten Proxy.

Recently a Customer Asked

“How can I simulate a failure of the Active half of an Active/Passive Cluster to test the Failover to the Passive half?”

In this blog post we explore the best practices for simulating a failure of the Active cluster in an Active/Passive Composite Cluster.

Firstly, the Documentation

Here is the online page for that procedure:
https://docs.continuent.com/tungsten-clustering-6.1/operations-composite-failover.html

Secondly, the Summary

Below are the simple steps needed to simulate failure and then invoke a failover from Active to Passive:

shell> ssh tungsten@{node_on_the_passive_side}
shell> cctrl -multi
cctrl> datasource {primary_cluster_here} fail
cctrl> failover

Next, the Detailed Example

Now let’s demonstrate the procedure, step-by-step, to simulate failure and then invoke a failover from Active to Passive:

Note

Child cluster emea contains nodes db13-15, and child cluster usa has nodes db16-18.

Of critical importance is the place that you run the commands from. You must start by logging in to a node on the passive, “working” side, i.e. db16 in this example.

Run the `cctrl -multi` command, and use the composite service:

tungsten@db16-demo:/home/tungsten # cctrl -multi
Tungsten Clustering 7.0.0 build 573
emea: session established, encryption=true, authentication=true
[LOGICAL] / > ls
global_active_passive
  emea
  usa

[LOGICAL] / > use global_active_passive 
[LOGICAL] /global_active_passive > ls

COORDINATOR
   emea:COORDINATOR
   usa:COORDINATOR
...

DATASOURCES:
+---------------------------------------------------------------------------------+
|emea(composite master:ONLINE)                                                    |
|STATUS [OK] [2021/11/03 04:14:11 PM UTC]                                         |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|usa(composite slave:ONLINE)                                                      |
|STATUS [OK] [2021/11/03 04:10:22 PM UTC]                                         |
+---------------------------------------------------------------------------------+

Signal the Managers that the Primary cluster is “broken”:

[LOGICAL] /global_active_passive > datasource emea fail

WARNING: This is an expert-level command:
Incorrect use may cause data corruption
or make the cluster unavailable.

Do you want to continue? (y/n)> y
COMPOSITE DATA SOURCE 'emea' IS NOW IN THE FAILED STATE

[LOGICAL] /global_active_passive > ls

COORDINATOR
   emea:COORDINATOR
   usa:COORDINATOR
...

DATASOURCES:
+---------------------------------------------------------------------------------+
|emea(composite master:FAILED(MANUALLY-FAILED))                                   |
|STATUS [CRITICAL] [2021/11/03 04:21:56 PM UTC]                                   |
|REASON[MANUALLY-FAILED]                                                          |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|usa(composite slave:ONLINE)                                                      |
|STATUS [OK] [2021/11/03 04:10:26 PM UTC]                                         |
+---------------------------------------------------------------------------------+

Move the Primary role to the currently passive, “working” cluster:

As a side benefit to being in AUTOMATIC mode, the Managers will recover the “broken” cluster as well!

Finally, the Wrap-Up

In this blog post we demonstrated the best practice for simulating a failure of the Active cluster in an Active/Passive Composite Cluster.

If you’re a customer and have any questions, please feel free to reach out to support. If you’re not a customer and would like to learn more about this, please contact us.

Published In

Categories:

24/7/365 Support, Database Administration

Series:

MySQL High Availability (HA) & Disaster Recovery (DR)

Tags:

MySQL, MariaDB, testing, site-level failure

Author

Eric M. Stone

COO and VP of Product Management

Eric is a veteran of fast-paced, large-scale enterprise environments with 40 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

View All Eric M.’s Posts

How To Simulate a MySQL Cluster Site Failure with Tungsten