laitimes

Analysis of industrial open source real-time database

author:Everybody is a product manager
Under the premise of the rapid development of the industrial Internet, a large number of equipment has been put into the production site, such as monitoring, sensors, etc., and the real-time data generated by these devices can reflect the situation and production progress of the equipment, and the processing and analysis of these data need the support of databases.
Analysis of industrial open source real-time database

In the industrial field, a large amount of time-stamped sensor data may be generated during the production, testing, and operation phases, which are typical time series data. Time series data is mainly collected or generated by various types of real-time monitoring, inspection and analysis equipment, involving manufacturing, electric power, chemical industry, engineering operations and other industries, with typical characteristics such as writing more and reading less, and the amount is very large.

Low write throughput: The write throughput of a single machine is low, and it is difficult to meet the write pressure of tens of millions of time series data, and the storage cost is large: the performance is poor when compressing the time series data, and a large amount of machine resources is required, and the maintenance cost is high: the stand-alone system needs to manually shard the database and table at the upper layer, and the maintenance cost is highPoor query performance: the query speed is slow, especially the aggregation and analysis performance of massive real-time data is poor

1. The needs and pain points of the industrial Internet time series database

The main issues can be summarized as follows:

  • Low write throughput: The write throughput of a single machine is low, making it difficult to meet the write pressure of tens of millions of time series data.
  • High storage cost: Poor performance when compressing time series data requires a large amount of machine resources.
  • High maintenance cost: The stand-alone system needs to be manually strapped at the upper level, and the maintenance cost is high.
  • Poor query performance: The aggregate analysis performance of massive real-time data is poor.

Features that need to be supported:

  • Stable function
  • Efficient data writing
  • Efficient data queries, including the latest and historical data
  • Cloud-ready deployment
  • Can be deployed privately
  • Linear scaling
  • High Availability
  • Easy to connect to big data platforms

2. Data source requirements

From the perspective of data sources, designers can analyze the applicability in the target application system from the following perspectives.

  • The overall amount of data is huge
  • The speed of data entry is occasionally or consistently huge
  • The number of data sources is huge

3. Architecture

With the introduction of time series database products, the number of components has been reduced, the complexity of the architecture has been reduced, the storage cost has been reduced, the real-time service response has been improved, the personnel requirements have been reduced, and the business innovation capability has been released.

Analysis of industrial open source real-time database

4. Benefits and value

High performance, can support millions of concurrent writes, 10,000 concurrent reads, a large number of aggregate queries still have high performance and high availability, can support cluster deployment, can be scaled horizontally, there is no single point of failure, provide basic low cost for the stable operation of the production environment, the database has low requirements for hardware resources, high data compression rate, saving at least 70% of hardware resources on average, making full use of the characteristics of time series data, highly integrated, with the functions of message queue, stream computing and caching, greatly simplifying the architecture and making it easy to use and use SQL is easy to learn and supports complex queries, reducing development difficulty and O&M pressure.

5. System analysis

A real-time database is a database with timing characteristics or displayed timing limits for data and transactions, and is designed according to the nature of the real-time database and the characteristics of how the real-time data is used, some of which are not available in standard relational databases. In this system, according to the structure and functional characteristics of the real-time database, the real-time database design is divided into two parts: real-time database structure design and real-time database management program design.

Analysis of industrial open source real-time database

6. Benefits and Value

  • High performance: It can support millions of concurrent writes and tens of thousands of concurrent reads, and still has high performance when a large number of aggregate queries are collected
  • High availability, can support cluster deployment, can be scaled horizontally, there is no single point of failure, and provides a foundation for the stable operation of the production environment
  • Low cost, the database has low requirements for hardware resources, high data compression ratio, and saves at least 70% of hardware resources on average
  • Highly integrated, with message queuing, stream computing, and caching capabilities, greatly simplifying the architecture
  • It is easy to get started and uses SQL for database operations, which is easy to learn and supports complex queries, reducing development difficulty and O&M pressure

7. Industry application

In the context of the rapid development of the industrial Internet, a large number of equipment sensors and monitoring systems have been put into the industrial production site, and the real-time data provided by them can reflect the status of the equipment and the progress of production.

In the future, it is hoped that the database can provide more complex capabilities such as stream computing, query analysis, monitoring and early warning, and provide data basis for visual operation and maintenance, predictive maintenance, and remote intelligent management of products, so as to reduce the cost of personnel and time, accelerate the deep integration of industrialization and informatization, promote the transformation and upgrading of the complex heavy equipment manufacturing industry, and generate social and economic benefits.

This article was originally published by @Nate on Everyone is a Product Manager. Reproduction without the permission of the author is prohibited

The title image is from Unsplash and is licensed under CC0

The views in this article only represent the author's own, everyone is a product manager, and the platform only provides information storage space services

Read on