We often hear that asynchronous replication creates a possible data loss window, thus it is not an acceptable solution for certain types of applications.
Yes, this is true. In theory, that is.
But in practice how we have developed the Tungsten Cluster solution, it really does not happen in real life. Let’s explore how Tungsten Cluster addresses the “data loss window” and why our customers do not worry about data loss even in their busiest environments.
A Quick Review of Asynchronous and Synchronous Replication
Asynchronous replication is used by native MySQL replication and Tungsten Replicator. With asynchronous replication, replication happens in the background, independent of the application. All replication is handled by a separate non-blocking thread (or threads, if using parallel replication) so the primary node is free to handle application requests. Thus, it’s quite fast and requires negligible load on the database node and application servers.
Synchronous replication on the other hand blocks the commit of the transaction until at least one other node has a copy of that transaction, and in some cases, applies it. This also called as a two-phase commit.
This additional step of sending the transaction and waiting for an acknowledgement can (read: will) cause performance issues, as the database is blocked, which in turn blocks the application until the primary database node receives an acknowledgement from a replica. This can (read: will) cause performance issues and poor user experience, and thus synchronous replication is almost never used, or at least should not be used, over long geographic distances -- the replication lag is just too great.
Want to get a deeper understanding of replication technologies? If so, have a look at this blog: https://www.continuent.com/resources/blog/comparing-replication-technologies-mysql-clustering-part-1
Understanding the Internals
Tungsten Replicator, which is used by Tungsten Cluster, is a high performance replication tool for MySQL (and other targets). The replication pipeline is composed of several stages as detailed below.
Extract:
- Read events from MySQL binary logs into memory
- Write those events plus metadata information into THL on disk
- Serve THL to replicas
Apply:
- Read THL from an upstream node, writing to local disk
- Read THL from disk into a memory-based queue
- Apply events from memory to the local replica
By understanding the stages, we can address potential data loss scenarios at each stage.
Please visit this blog post for more information about stages: https://www.continuent.com/resources/blog/mastering-tungsten-replicator-series-understanding-pipelines-and-stages
Closing the Data Loss Window
Tungsten Replicator reads binary logs directly from the disk, which makes it FAST.
There is almost never any discrepancy between what has been written to the binary log and what is in the THL on the primary. You can of course see this for yourself using trepctl perf
, and the latency is usually very close to 0 at this stage (q-to-thl). If however, there is a failover and there’s a discrepancy, it’s easy to see what it is - just run tungsten_find_orphaned
to get a list of transactions that didn’t make it to the THL on the former primary. You can then decide what to do with these transactions.
How about if data got written to the THL on the primary, but the THL didn’t get transferred to the replica?
First, during a failover, Tungsten Cluster will select the replica that is most up to date. You can be assured a replica that is lagging more than others will NOT be selected for promotion. Note too that each replica is aware of THL that hasn’t been received yet (cluster is able to keep track of it because of the use of global transaction IDs or GTIDs, within Tungsten Cluster). During promotion of a replica, Tungsten Cluster will download all of the THL from the source before putting the new primary online. Since replication is independent of the MySQL database server, a crash of MySQL does not affect Tungsten Replicator -- we can still read from the binary logs and serve THL. This almost always resolves the data loss window at this stage. If however, we cannot download the THL, we can once again use tungsten_find_orphaned
on the failed node to see what those events were and decide what to do next.
Finally, if the Cluster needs to promote a replica and the THL is already on the replica, it will simply process all outstanding transactions before promoting the replica to primary and bringing it online. Usually this involves one or two seconds on a busy system and prevents data loss and corruption. Tungsten Cluster takes all precautions to avoid data corruption.
Summary
Although data loss during failover is a concern, Continuent addresses it by studying the issues at each stage of replication and then mitigating the risk at each stage, thus reducing the risk of data loss and database corruption.
We’ve focused on this issue, because asynchronous replication is the only method to reliably deploy geo-distributed active/active and active/passive clusters, or any cluster that spans a considerable distance, including using a DR site.
Asynchronous replication is very fast and powerful, and it provides the best user experience. That’s why Continuent is invested in it, and we have made it faster, and we have reduced the risk of data corruption.
NOTES:
- Previous blog on this subject: https://www.continuent.com/resources/blog/mysql-clustering-asynchronous-synchronous-replication
- Blog post on Pipelines and Stages: https://www.continuent.com/resources/blog/mastering-tungsten-replicator-series-understanding-pipelines-and-stages
- Docs on failover behavior: https://docs.continuent.com/tungsten-clustering-6.1/manager-failover-behavior.html
- The tungsten_find_orphaned command docs: https://docs.continuent.com/tungsten-clustering-6.1/cmdline-tools-tungsten_find_orphaned.html
Comments
Add new comment