“When you’re millions of miles from home, it’s hard to install an operating system update - but not impossible. From the dawn of the Space Age through the present, NASA has relied on redundant hardware to make up for physical defects, wear and tear, sudden failures, or even the effects of cosmic rays.” - Glenn Fleishman, Author and Technology Journalist
While my team is yet to support an application that goes beyond the stratosphere, we have experience in other unique WAN environments such as cruise liners and military; in other words, the rocketship design-approach resonates with us. Continuent has been developing and providing software and support for mission-critical MySQL databases for a long time (since 2004). Our goal is to remove the worry and manual overhead typically involved in ensuring MySQL availability especially geo-distributed or global MySQL databases. We do that by designing fully-managed clusters of clusters.
A Redundancy of Redundancies
For multi-site, multi-region, or geo-distributed MySQL/MariaDB clusters, organizations with business-critical MySQL databases like Saas and Fintech typically require what’s called a "composite" cluster, or a "cluster of clusters."
The managed composite “cluster of clusters” may span over:
- geographic locations
- cloud providers (multi-cloud)
- on-premises hardware with cloud-based instances (hybrid-cloud)
- along with any other combination you can think of ;-}
In a composite cluster, two or more managed clusters work together, specifically through three unique, synergistic processes:
- Replication pipelines via Tungsten Replicator both within and between clusters are fully managed.
- Cluster orchestration via Tungsten Managers is both locally and globally aware.
- Application-Database connections are routed optimally via the intelligent Tungsten Proxies, which are also multi-site aware.
Active/Passive
You may choose the Active/Passive design, as shown in the graphic below, where all writes are directed to a single Primary.
In a simple Active/Passive clustering topology, the secondary cluster, also known as a “Disaster Recovery cluster” or “DR cluster” shown on the right, has a “Relay” which may be promoted as the new Primary with a single command (or click with Tungsten Dashboard, the freely included GUI).
That means - it takes a single command to activate the DR site; gracefully, without disruption to the service or lost connections, writes get rerouted from the Primary node to the Relay node in the Secondary cluster.
Some people ask why we do not allow automatic switches over to a different site or region. A human must trigger the switch by design - it is deliberate to prevent flapping. If this switch feature were automatic, any network blip might trigger a switch, and issues that require human attention to be resolved could result in repetitive switching back and forth between sites. So, while a site switch could easily be automated, by design it is not - and for good reason.
Active/Active
Heads of technology, infrastructure, site reliability and database operations at mid-to-large SaaS, Telco and Fintech organizations sometimes ask us for Active/Active multi-site clustering for their MySQL databases.
Active/Active means you may have two or more database nodes simultaneously serving writes and no command is needed to reroute connections if a cluster goes down. As shown in the graphic below, there are two or more Primaries serving writes.
This is great for applications with a high write-to-read ratio and applications serving a geographically-distributed user base. It brings Primary database servers physically closer to the end users and improves application responsiveness.
It also means - in the event of a local disaster or data center outage, you have continuous database operations. Downtime is prevented.
Naturally, the risk with Active/Active is the potential for data conflicts, especially in a high transaction environment. If, for example, the same row of data is updated at the same time in two sites - how does the database system decide in which order to commit the writes?
Active/Active versus Active/Passive
There are pros and cons to each multi-site or multi-region design - it depends on many factors, and our highly experienced team will discuss, help design and test yours. We actually have three ways to go:
- Composite Active/Passive (CAP) with one active site and one or more passive sites. The active site contains the lone Primary node accepting writes, and all other sites have a special Relay node, which does not accept client writes. If the primary site fails, the `failover` command must be manually issued to promote another site to be a primary.
- Composite Active/Active (CAA) with multiple active sites — if an active site fails, all traffic to that site is automatically forwarded to a different, available active site. This provides faster failover, but it is subject to potential conflicts due to having multiple Primaries.
- Dynamic Active/Active (DAA) configured to route all traffic to one active site — if the configured active site fails, all traffic is automatically routed to a different, available active site. This is also a faster failover, and is handled by the connectivity layer level, and is less subject to potential conflicts.
If, as in the DAA option, your Active/Active cluster behaves in an Active/Passive manner by only writing in one place at a time, you may automatically reroute traffic to the other site with no risk of conflict. This results in less administration for the cluster layer. Learn more in this blog: "De-Mystifying Tungsten Cluster Topologies, Part 3: CAP vs. CAA vs. DAA."
Whatever design you choose, all Continuent solutions are fully-integrated, fully-tested for mission-critical and business-critical use cases, and our team backs it up with 24/7 enterprise support.
So to conclude this blog, Glenn Fleishman said: “NASA’s obsessive focus on software testing to find and remove bugs, plus a strategy to allow software to recover in the worst of circumstances...” is key to keeping a service going. And that is what we do for organizations running business-critical and mission-critical MySQL.
For more information about active/active vs active passive or anything else, feel free to contact us.
Additional Resources:
https://www.continuent.com/products/tungsten-clustering/subscriptions
https://www.continuent.com/resources/blog/dig-and-de-mystify-tungsten-cluster-topologies
https://www.continuent.com/products/tungsten-clustering/topologies
Comments
Add new comment