"Data cannot be lost, business cannot be stopped" is the demand of key core business systems, which puts forward high requirements for the high availability of databases. In order to take the lead and serve key business scenarios, China databases must have strong high-availability capabilities and relatively complete high-availability solutions to ensure business continuity to the greatest extent. What are the mainstream database HA solutions? How do enterprises choose the right HA solution? How to better ensure business continuity?
In response to the above problems, Cui Zhiwei, general manager of the GBase 8s product management department of NTU General Motors, recently accepted an interview with ITPUB, a database technology community, and he said: The pursuit of business continuity is endless, and the current domestic database market has a rich database high-availability architecture, and enterprises need to choose the appropriate high-availability solution according to the needs of business scenarios.
The evolution of the database high-availability architecture
High Availability (HA) refers to the ability of a system to maintain operation and provide services in the face of various fault conditions, such as hardware failures, software crashes, and network problems. The system is designed to reduce the time that the system cannot provide services, improve the availability of the system through redundancy and automatic failover, and avoid single points of failure, so as to ensure the high stability and continuity of the system.
At that time, high availability was more about providing cluster software at the operating system level to manage software and hardware including databases, such as IBM high-availability clustering software HACMP, which is mainly used for AIX operating system to provide high-availability and failover functions to ensure the continuous operation of business-critical applications.
In the 90s of the last century, with the rise of the Internet and the increasing dependence of enterprises on data, high availability became a key factor in the design of database systems. Redundancy and failover mechanisms began to be introduced in databases to improve the reliability and availability of the system.
With the development of hardware technologies such as storage and networking, and the evolution of business requirements, the database high-availability architecture has evolved from stand-alone to active/standby clusters, to shared storage clusters, and then to distributed clusters.
Taking Oracle, an industry leader, as an example, it has successively launched various forms such as Highly Available (HA), Active DataGuard (ADG), Real Application Cluster (RAC), and Oracle Golden Gate (OGG). These high-availability architectures can be used individually or in combination, providing stable and reliable high-availability services for the key core services of financial institutions and telecom operators over the years.
For example, an Oracle active/standby high-availability ADG, known as a DG in the early days, allows the primary node to read and write, but the secondary node cannot provide queries. Oracle 11g introduces Active Data Guard (ADG), which allows the standby node to be used for queries, thereby offloading the primary node and further improving system high availability. Later, ADG provides three data protection modes: maximum protection, maximum availability, and maximum performance, for users to choose flexibly.
The release of Oracle 9i in 2000 was a major milestone with the introduction of Real Application Clusters (RAC) technology. RAC is a typical shared-everything architecture, that is, a shared storage cluster technology, which allows multiple servers to share a database instance, thus providing high availability and load balancing capabilities. If one node fails, the other nodes can continue to provide services, ensuring the continuity of the system. As RAC technology continues to mature, more and more large enterprises are choosing RAC technology for their core databases.
In the context of localization substitution, although the technology stack of domestic databases has changed, the business requirements for high availability of databases will not weaken, but will only get higher and higher. Domestic databases need to reach the same level as Oracle in terms of high availability technology, or even make breakthroughs, in order to gradually replace key core businesses in various industries.
At present, most of the domestic databases have built their own complete high-availability architecture based on Oracle, taking the NTU General GBase 8s database as an example, there is a complete set of high-availability technology stacks benchmarking against Oracle to ensure the high-availability services of business systems.
For example, a GBase 8s primary/standby cluster supports HAC and RHAC clustering. HAC is a single-primary, one-standby cluster mode, which can be fully synchronous, near-synchronous, or asynchronous based on network transmission conditions, and is based on the three working modes of Oracle ADG: maximum protection, maximum availability, and maximum performance. RHAC supports one master and multiple standby and adopts an asynchronous transmission mechanism.
GBase 8s shared storage high-availability cluster SSC (benchmarked against Oracle RAC) uses shared disks to achieve node high availability, only one copy of data is stored, multiple writes and multiple reads are supported, hardware resources are effectively utilized, and duplicate data storage is avoided.
In the field of centralized databases, RAC shared storage clusters have high technical difficulties and are regarded as Everest-like existences, while the breakthrough of domestic databases in RAC-like clusters has undoubtedly opened the door to high-end scenarios. However, although many domestic databases have RAC-like clusters, there are not many that can support multiple writes and reads, and many standby nodes are read-only, and the cluster throughput is less than the processing capacity of a single machine.
Cui Zhiwei pointed out that all IT technologies ultimately have to serve business development. The high-availability architecture of domestic databases has been realized from scratch. On the road from existence to optimization, while continuously improving the high-availability solution, everyone will work hard to reduce the active/standby switchover time, continuously improve business continuity, and improve cluster throughput.
How Do I Choose a Suitable HA Solution?
Some people might say that since business continuity is so important, try to use a higher-level HA architecture, and if you're not short of money, listen to it. However, in reality, even a very powerful financial institution such as a bank has to repeatedly weigh and consider when building high availability, and dare not be capricious, because the high availability solution is too expensive. Regardless of intra-city disaster recovery or remote disaster recovery, a large amount of real money is required for computer rooms and networks.
Cui Zhiwei introduced that different high-availability solutions have their own characteristics, and he suggested that enterprises can choose different high-availability solutions according to the needs of business scenarios, combined with their own funds, computer rooms, and network conditions.
For example, the common high-availability solution of active/standby clusters is suitable for business scenarios with small data volumes and less stringent requirements for data consistency.
On the one hand, the redundancy of active/standby clusters requires multiple data replicas to be stored, which incurs additional storage costs. On the other hand, after a fault occurs, it takes a certain amount of time for data synchronization and active/standby switchover. If the data volume reaches tens of terabytes, the storage cost will be high and the time window for primary/standby switchover will be increased. In addition, it is not easy to ensure strong data consistency between two nodes for data synchronization in active/standby clusters, and many domestic databases use read/write splitting plug-ins to make up for this shortcoming.
Shared storage clusters solve the problems of storage cost and strong data consistency, and are suitable for business scenarios with large data volume and high requirements for strong data consistency.
Shared storage clusters use the least amount of hardware and the least number of databases to achieve high performance and maintain business continuity, which is a cost-effective solution for high availability. However, shared storage clusters are complex and have a high technical threshold, requiring database maintenance and developers to have relatively high professional capabilities. In addition, shared storage clusters have stringent requirements for hardware and database software. In order to achieve strong data consistency, information synchronization requires very high network bandwidth, for example, the Oracle RAC heartbeat network basically starts at 10 Gigabits, and the Oracle Appliance even uses a 40GB professional high-speed network for internal heartbeats.
Cui Zhiwei pointed out that in business scenarios where the amount of data is not so large, most government and enterprise customers will choose the active/standby cluster high-availability solution, but in business scenarios that require strong data consistency in the financial field, as well as business scenarios with large data volumes (data volume exceeding 10T and 20Tbit), they will choose the shared storage cluster solution.
GBase 8s provides relatively complete HA solutions for enterprises to choose from, such as active/standby HAC/RHAC, shared storage cluster HAC, and real-time data synchronization ER.
• GBase 8s HAC clusters are recommended for services that are sensitive to network latency and are recommended to be deployed in the same city or data center. For long-distance transmission and remote disaster recovery, RHAC clusters can be used. Due to the small bandwidth due to long-distance transmission, which increases network latency, RHAC optimizes the use of bandwidth through asynchronous checkpointing mechanisms and data compression.
• GBase 8s shared storage high-availability cluster SSC is a RAC-like technology that supports shared storage to ensure strong data consistency, and the cluster is the master-controlled peer-to-peer management mode, with a maximum of 16 cluster nodes, all of which can read and write. When the primary node fails, the secondary node can be upgraded to the primary node to ensure the high availability of the system. In scenarios where the data volume is less than 100 TB, shared storage clusters are the most cost-effective and high-availability solution compared with distributed databases. For businesses with a data volume of more than 100 TB, a distributed database may be more suitable than a centralized database.
• GBase 8s real-time data exchange and sharing cluster (benchmarked against OGG) has built-in real-time data synchronization capability in table-based units, which is more widely used in data exchange and sharing scenarios, such as real-time data exchange and sharing business scenarios in provinces, cities and counties, and real-time data exchange and sharing scenarios between parallel units such as shopping malls and supermarkets.
Enterprises can deploy active/standby HA or shared storage clusters separately, or combine them to build higher-level HA solutions, such as dual data centers in the same city (SSC+HAC) and three data centers in two cities (SSC+HAC+RHAC).
GBase 8sTwo地三中心方案
At present, the GBase 8s HA cluster solution can achieve RPO=0 and RTO<30s, and in real business scenarios, the active/standby switchover can be completed in about 10-15 seconds.
With its outstanding capabilities, GBase 8s has served many key core businesses in industries such as finance, rail transit, energy, and government, and has won the trust of customers in key industries.
For example, the State Grid dispatching cloud platform, with more than 50TB of business data, successfully replaced Oracle with the SSC+RHAC high-availability solution, and realized second-level data synchronization between local and remote read/write splitting active-active clusters through the construction of thousands of kilometers of remote remote disaster recovery solutions, and the database is completely transparent to applications, with a maximum continuous operation of more than 600 days.
The second phase of Shenzhen Metro CLC and the Internet ticketing management system set up SSC+HAC mode at the main production cloud center and each station to realize the high-availability cluster function. It can meet the stable operation of 4000 concurrency for more than 1 hour, tens of millions of data volumes, and millisecond-level query response.
The core system of a commercial bank in a southwest city adopts the SSC+HAC high-availability solution, and after the database is replaced, it continues to run online for more than 760 days, which fully reflects the stability of GBase 8s.
Summary: The pursuit of business continuity knows no bounds
In the future, GBase 8s will continue to optimize and improve its high-availability solutions, such as providing more fine-grained resource management and control capabilities, and locking the tables involved in some transactions after the primary node fails, so that other tables can still be opened for transactions. In addition, session persistence and transaction persistence capabilities are provided. Support data sharding in shared storage clusters, reduce conflicts, and more.
"Ensuring the best business continuity for our customers, no matter what kind of database failure, is not perceived to the customer, which is our ultimate goal." Cui Zhiwei said that GBASE will continue to build a trustworthy database for users, and more and more real user scenarios have polished GBase 8s to be more stable and reliable, and its SSC cluster has been launched hundreds of sets, and the recognition of customers has given him great confidence.
In the era of the digital economy, the pursuit of business continuity is endless. However, in the case of limited resources, whether it is high availability or business continuity, users and vendors need to work together to achieve better results.
"On the user side, leaders and technical teams should formulate reasonable high-availability handover goals, break the strong binding between key systems and high-availability capabilities, and design high-availability goals according to the real characteristics of the system business. On the database vendor side, they should optimize the high availability of their products according to the needs of users' application scenarios, and do not always spell an unreliable failover time. To borrow the end of this paragraph in the article, I hope that with the joint efforts of users and manufacturers, the high availability of the system can be continuously improved, and the business continuity performance can be better guaranteed.