For some industry-critical core business applications, 24×7 hours of uninterrupted service must be met. This requires that the database must be highly available to avoid a single point of failure for the database node. The choice of a high-availability architecture for a database directly affects the level of availability of business application services. Critical core services in general industries require extremely high availability, i.e., 99.999% availability.
Oracle has successively launched various forms such as Highly Available (HA), Active DataGuard (ADG), Real Application Cluster (RAC), and Oracle Golden Gate (OGG) in the high-availability architecture. These high-availability architectures can be used individually or in combination, and many financial institutions have built extremely high-availability business services in two cities and three centers with the help of Oracle's combination of multiple high-availability architectures.
In the context of localized substitution, domestic databases need to reach the same level as Oracle in terms of high-availability technology, or even make breakthroughs, in order to gradually realize the replacement of key core services in thousands of industries. At present, most of the domestic databases have built their own complete high-availability architecture based on Oracle, taking the general-purpose GBase 8s database of NTU as an example, there is a complete set of high-availability technology stacks benchmarking against Oracle to ensure the high-availability services of industry business systems.
GBase 8s主备式集群
GBase 8s provides active/standby clusters based on real-time synchronization and redo of redo logs. The data transfer between the primary node and the standby node supports multiplexed connections, which can reduce the demand for network resources and improve the communication efficiency between servers.
Active/standby clusters support HAC and RHAC clusters.
HAC is a typical one-primary, one-standby cluster mode, and can be fully synchronous, near-synchronous, and asynchronous based on network transmission conditions, which perfectly benchmarks the three working modes of Oracle ADG: maximum protection, maximum availability, and maximum performance.
Full synchronization means that after the transaction is completed on the primary node, before submitting, the redo log of the transaction needs to be copied to the standby node, and the standby node replays the log after receiving the redo log, and after the standby node replays the replay, the active and standby nodes commit it together. The biggest advantage of this method is to ensure the reliability of the transaction, as long as the transaction is successful, the redo log must be written to both the active and standby nodes.
Near-synchronization means that after the transaction is completed on the primary node, the redo log of the transaction needs to be sent to the standby node through the network, and when the standby node confirms the acceptance of the redo log of the transaction, the transaction on the primary node is submitted immediately.
Asynchronous means that the transaction is sent to the standby node after it is completed, and the master node can submit it without waiting for the standby node to reply with any messages. However, when the checkpoint comes, you need to check the consistency of the checkpoints of the active and standby nodes, that is, the database allows the checkpoints to be asynchronous between the two checkpoints, but the checkpoints of the active and standby nodes must be consistent.
For the actual deployment of HAC clusters, it is recommended that servers have dual NICs, with north-south NICs for service traffic and east-west NICs for redo log synchronization.
Features of HAC clusters:
1. The cluster mode is simple to deploy and has data redundancy.
2. Transparent access to the database for the application;
3. Failover time is completed within 30 seconds;
4. Flexible and multi-choice data synchronization mode;
5. Both the active and standby nodes can be read and written.
Usage scenarios of HAC clusters:
1. It is recommended to consider network latency as the main factor affecting HAC clusters when the same cabinet, data center, or city is the same.
2. It is recommended that HAC be considered when the data volume does not exceed 10 TB, mainly considering the limitation of the available capacity of a single server disk.
RHAC cluster is another manifestation of GBase 8s cluster, which supports one active and multiple standby clusters and is mainly used for long-distance transmission and remote disaster recovery. RHAC is a typical asynchronous transfer mechanism, the master node is responsible for accepting and processing transactions, and continuously sends the redo log of transactions to the target node, without waiting for the target node to accept the message, nor waiting for the checkpoint of the target node, only the target node will return the location information of the redo log to the master node after redoing the redo log.
Features of RHAC clusters:
1. Replication is performed through the redo log of the database, and the redo log can be automatically equaled through the last checkpoint when the standby node is restored, ensuring the data integrity and consistency of the primary and standby nodes.
2. Minimal impact on the performance of the primary node;
3. Transparent access to applications;
4. Failover time is completed within 30 seconds;
5. Both the active and standby nodes can be read and written.
Scenarios for RHAC clusters:
1. Long-distance transmission or remote disaster recovery.
2. The latency and bandwidth requirements of the network are relatively relaxed.
The following figure shows the HAC and RHAC deployment architectures:
Connection Manager CM
Connection Manager (CM) is a database cluster management component that comes with GBase 8s. This component has two functions: the application client connects to GBase 8s for database cluster fault discovery and failover.
This component can be deployed separately or together with the database, and multiple CMs can be deployed to avoid a single point of failure.
GBase 8s is a shared storage-based database cluster
For large-scale business systems, two-node or multi-node database clusters based on shared storage are generally used to support the high availability of services. GBase 8s shared storage high-availability cluster SSC uses shared disks to achieve high availability of nodes, and only one copy of data is stored, effectively using hardware resources to avoid duplicate data storage problems. Shared storage supports disk arrays and distributed storage, and IO devices support bare devices and shared file systems.
The cluster is in the master-controlled peer-to-peer management mode, accessing the local cache during queries, no network overhead, good linear scalability, and the number of nodes in the cluster can reach up to 16, and all nodes can read and write. When the primary node fails, the secondary node can be upgraded to the primary node to ensure the high availability of the system.
Features of SSC clusters:
1. RAC-like technology, support shared storage, and ensure strong data consistency;
2. Transparent access to the application;
3. All nodes in the cluster can read and write;
4. Cluster failover completes within 30 seconds;
5. In business scenarios where the data volume is less than 100 TB, the most cost-effective database cluster solution compared with distributed databases.
SSC cluster usage scenarios:
1. Business scenarios where the data volume exceeds 10 TB but is less than 100 TB.
2. Business scenarios in which the single-node capability or the processing capability of the active/standby cluster does not meet the requirements.
The deployment architecture is shown in the following figure:
For key core business systems with stricter SLAs, GBase 8s also has a high-availability solution with three data centers in two locations. The deployment architecture is shown in the following figure:
As shown in the figure above, the host room adopts a 4-node SSC cluster architecture, with one HAC node deployed in the disaster recovery data center in the same city and one RHAC disaster recovery node deployed in the remote disaster recovery data center, which perfectly builds the deployment architecture of three centers in two cities.
The application accesses the database cluster through the CM, and the application cannot know the deployment status of the database cluster, which is more secure than the database.
When the primary node fails, one SSC node is automatically promoted to the primary node to take over services, and the other nodes are automatically aligned with the new master node, and the synchronization is automatically synchronized without manual intervention. If a standby node fails, it does not affect the access of the entire database cluster and the continuous operation of services.
GBase 8s数据实时交换共享集群
GBase 8s has built-in real-time data synchronization capability on a table-by-table basis, which is benchmarked against OGG.
The characteristics of this capability are as follows:
1. The database can have two or more nodes, each of which can be read and written.
2. The synchronization on each table supports one-way synchronization and two-way synchronization.
3. It can support the data synchronization of complete rows and specific fields.
4. Support full synchronization, incremental synchronization, and resumable transmission.
5. Support fast data comparison;
6. When one node fails, the business system can be switched to another node within 1 second to maximize business continuity.
7. Support one-click deployment;
8. You can use the table replication capability to connect between two DB clusters, so as to achieve the ADC capability of the DB cluster.
Scenarios for using this capability:
1. Scenarios in the field or unattended environment, with less intervention in the database;
2. Scenarios that require very short database switching time;
3. Real-time data exchange and sharing business scenarios of provinces, cities and counties;
4. Real-time data exchange and sharing scenarios between parallel units.