laitimes

Exploration of Xiaohongshu graph database on distributed parallel queries

author:DataFunTalk

Reading guide As a community-based product, Xiaohongshu covers various fields of living communities, and stores a large number of social network relationships. In order to solve the application pain points in social scenarios and the problems in the implementation of distributed parallel queries, we have developed a graph database system REDgraph for ultra-large-scale social networks, which greatly improves the query efficiency and performance of the system.

The content of this sharing mainly includes five parts:

1. Background

2. Analysis of the original architecture

3. Distributed parallel query implementation

4. Summary and outlook

5. Q&A session

Guest Speaker|Li Ningrui (potato name: Zaixing) Xiaohongshu Distributed Database Architect

Edited and organized|Zhang Jindong

Content proofreading|Li Yao

Exhibiting Section|DataFun

01

Background

1. Introduction to graph databases

Exploration of Xiaohongshu graph database on distributed parallel queries

The concept of a graph database is not elaborated on here. Instead, it compares it to several other NoSQL products in the form of charts. Graph databases are inherently NoSQL stores, while other NoSQL products, such as KV types, wide table types, document types, and time series types, have their own unique characteristics. As you can see from the coordinate axes on the left side of the figure above, from KV to wide tables, documents, and graphs, the data correlation and query complexity are getting higher and higher. The first three, KV, wide tables, and documents, focus primarily on richness within a single record, but do not address relationships between records. Graph databases, on the other hand, focus on dealing with these relationships. Graph databases are mainly suitable for business scenarios that require deep links or multi-dimensional relationships.

Exploration of Xiaohongshu graph database on distributed parallel queries

Let's take a concrete example and compare graph databases with relational databases. This is a common table structure in social networks, which includes four data tables: the user table, the friendship table, the like behavior table, and the note details table. For example, to query the details of notes liked by Tom's friends as a user, you might need to write a lengthy SQL statement. In this SQL statement, there are three join operations, first of all, to join the user table and the friends table to get all of Tom's friend information. The resulting intermediate results are then concatenated with a table of likes to determine which notes Tom's friends have liked. Finally, you also need to join the previously generated temporal table and the note detail table in order to finally get the full content of those notes.

Join operations in relational databases are usually more complex and consume a lot of CPU resources, memory space, and IO in the process, although we can achieve a certain degree of performance improvement through careful design, such as creating indexes for the columns to be associated, to reduce the proportion of scan operations, and index matching. However, the cost of such a move is relatively high, because all the new scenarios require indexing, and it takes a lot of effort and time to think about how to write the join condition in SQL, which table to choose as the driving table, and so on.

With a graph database, it's much simpler. Firstly, the graph is modeled, two types of vertices are created, which are users and notes, and two types of edges are created, one is friendship, that is, user-to-user edges. The other type is the user-to-note liking relationship. When we store this data in a graph database, they logically present a kind of mesh structure, and their associations are already very clear. When querying, using the Gremlin statement in the figure above, you can get the information you need with just four lines of code. The first line, g.V().has('name', 'Tom'), is used to locate the Tom node, and two out clauses, the first out clause is used to find Tom's friends, and the second out clause is used to find Tom's like notes. When the second out clause is executed, it is possible to iterate over all the external green vertices, which are the note nodes. Finally, read their content attributes. It can be found that compared with relational databases, the query statements of graph databases are more concise, clear and easy to understand.

In addition, the graph database has a more significant advantage, that is, when storing, it has designed and stored vertices and their relationships as first-class citizens, so it is extremely efficient when performing adjacency edge access and relationship extraction. Even if the size of the data continues to grow, it will not result in a significant increase in query time.

2. Usage scenarios of graph databases in Xiaohongshu

Exploration of Xiaohongshu graph database on distributed parallel queries

Xiaohongshu is a lifestyle sharing platform for young people. In Xiaohongshu, users can intuitively record the bits and pieces of life through short videos, pictures, etc. Within Xiaohongshu, graph databases are widely used in a variety of scenarios, and the following are examples of online, near-line, and offline scenarios.

The first example is the social real-time recommendation feature. Xiaohongshu has a typical community character, where users can like, post, follow others, retweet information, etc. For example, if I go to a user's Page and stay for a long time, the system will determine that I am interested in that user, and that person may also attract the attention of others. So, the system will recommend other followers of that user and other users they follow to me, because we have common interests, so I may also be interested in what they follow, which is a simple real-time recommendation mechanism.

The second case is the community risk control mechanism, where the Xiaohongshu community will reward creators of high-quality notes or high-quality videos, but this also gives some wool parties an opportunity to publish some low-quality posts or notes, post them in mutual swiping groups, or forward them to relatives and friends for them to like and retweet, so as to disguise themselves as so-called high-quality notes, in order to defraud the platform of rewards. The community business department has some offline algorithms that can analyze the existing data and identify which users and notes belong to cheating users, and mark them with red dots in the graph. In the near-line scenario, the system will determine the number or proportion of cheating users that each vertex has been exposed to within the multi-hop relationship, and if it exceeds a certain threshold, it will mark this person as a potential risky user, that is, the yellow vertex, and take preventive measures.

The third case is the scheduling of offline tasks, in the big data platform, there are often a large number of offline tasks, and the dependencies between tasks are complex, and how to reasonably schedule tasks has become a thorny problem. Graph structures are well suited to solving this kind of problem, using topological ordering or other algorithms to identify the most relied upon tasks and perform reverse inference.

3. Business dilemmas

Xiaohongshu uses graph databases in scenarios such as social networking, risk control, and offline task scheduling, but it encounters many challenges in the actual application process. Here, a pain point based on real-time recommendation scenarios is briefly introduced.

Exploration of Xiaohongshu graph database on distributed parallel queries

As shown in the figure, A and F only need to go through three jumps to reach the "friends" or "content" that users may be interested in, so A and F form a recommendable association, and if this recommendation can be completed immediately, it can effectively improve the user experience and improve the retention rate. However, due to the imperfection of REDgraph's capabilities in some aspects, the business has only used one-hop and two-hop queries instead of three-hops, and the risk control scenario is similar.

The specific requirements for service latency are that the P99 of the three hops is less than 50 ms for social recommendation and less than 200 ms for risk control, which is a major problem faced by REDgraph.

So why is one to two jumps feasible, but three jumps and above difficult to achieve? In this regard, we have made some difficulties and feasibility analysis based on the differences between graph databases and other types of systems in terms of workloads.

Exploration of Xiaohongshu graph database on distributed parallel queries

First of all, in terms of concurrency, OLTP has a high degree of concurrency, whereas OLAP has a relatively low degree of concurrency. For the three-hop query of the graph, the service is still in the online scenario, and its concurrency is relatively high, which is closer to the OLTP scenario.

Secondly, in terms of computational complexity, the query statement in the OLTP scenario is relatively simple, and the computational complexity of OLTP is relatively low if it includes one or two join operations. OLAP is specifically designed for computation, so it is naturally computationally complex. A three-hop query for a graph is somewhere between OLTP and OLAP, and although it does not require a lot of computation like OLAP, it still accesses a large amount of data compared to OLTP, so it is of medium complexity.

Third, in terms of data timeliness, OLTP has high requirements for timeliness and must provide accurate and real-time responses based on the latest data. However, in the OLAP scenario, there is no such high aging requirement, and the early OLAP database usually provides T+1 aging time. Since we serve online scenarios, we have certain requirements for timeliness, but they are not very high. Using an hour's or 10 minutes ago status for recommendations doesn't have too serious consequences. Therefore, we define it as moderately time-sensitive.

Finally, the cost of query failure. OLTP is less expensive to query, so it is less expensive to fail; OLAP, on the other hand, is expensive to fail because it consumes a lot of computing resources. Graph queries are more like OLTP scenarios, but after all, the amount of data accessed is large, so it is also classified as medium.

To sum up, a three-hop query on a graph has OLTP-level concurrency, but it has much larger data access and computational complexity than general OLTP, so it is difficult to use in online scenarios. Fortunately, it doesn't have that high requirements for data timeliness and can tolerate some query failures, so we can try to optimize it.

As mentioned earlier, in Xiaohongshu, the primary goal of a three-hop query is to reduce latency. Some systems will consider sacrificing a little latency in exchange for a significant increase in throughput, which is unacceptable for Xiaohongshu's business. If the throughput cannot be increased, you can also increase the size of the cluster to cover the bottom, but if the latency is high, it cannot be used directly.

02

Analysis of the original architecture

The second part will detail the problems that existed in the original architecture and how to optimize them.

1. RedGraph 整体架构

Exploration of Xiaohongshu graph database on distributed parallel queries

The overall structure of REDgraph is shown in the figure above, which is similar to the architecture of the more popular NewSQL such as TiDB. An architecture that separates storage and compute is adopted, and storage is shared-nothing. The three types of nodes are meta-server, meta-information management; query-server, the processing of user query requests; store-server, which stores data.

2. RedGraph graph segmentation method

Exploration of Xiaohongshu graph database on distributed parallel queries

The meaning of graph sharding is that if we have a huge graph with a scale of tens of billions to hundreds of billions, how should we store it in a distributed cluster, and how should we slice it. In the industrial world, there are two typical sharding strategies, namely edge slicing and point slicing.

Edge sharding, which is centered on vertices, is based on the idea that each vertex is hashed based on its ID and routed to a specific shard. Each edge on each vertex is stored in two copies on disk, one in the same shard as the start point and the other in the same shard as the end point. For example, in the image above, the hash positioning results of the three vertices of ABC are involved. In this example, the outgoing edges from A to C are placed on the same node as A. Similarly, the outgoing edges of B to C are placed with B, and the last bucket holds C and the inlet edges of C, i.e., the two incoming edges pointing from A and B to C.

Point slicing, which corresponds to edge slicing, is centered on the edge, and each vertex will hold multiple copies in the cluster.

There are pros and cons to both of these types of slicing. The advantage of edge shard is that each vertex and its neighbors are stored in the same shard, so when you need to query the neighbors of a vertex, its access locality is excellent. The disadvantage is that it is easy to have uneven loads, and due to the uneven distribution of nodes, it causes hot spots. Point slicing, on the other hand, has the advantage of being more load balanced, but the disadvantage is that each vertex is split into multiple parts and distributed across multiple machines, making it more prone to synchronization issues.

As an online graph query system, REDgraph chooses the edge sharding scheme.

3. Optimization Solution 1.0

Exploration of Xiaohongshu graph database on distributed parallel queries

We've implemented some optimizations before, which we can call Optimization 1.0. At that time, the main consideration was how to quickly meet the needs of users, so our solution included: first, we provided some customized algorithms according to common query patterns, which could skip the tedious steps such as parsing, validation, optimization, and execution, and directly process the request to achieve acceleration. Second, we will optimize the fan-out operation of each vertex, that is, when each vertex scales out, the number of extensions will be limited to avoid the impact of super points and reduce latency. In addition, we have improved the push-down strategy of operators, such as filter, sample, and limit, so that they can be completed at the storage layer as much as possible to reduce the consumption of network bandwidth. At the same time, we also allow optimizations such as read-slave nodes, separation of read-write threads, and increasing the frequency of garbage collection.

However, one thing these optimization strategies have in common is that each point is relatively localized and fragmented, so it is less versatile. For example, in the first optimization, if the user needs to initiate a new query pattern, then the previously written algorithm cannot meet its needs and needs to be written separately. The second optimization, if all you need is the full result of the vertex, is no longer applicable. The third optimization, if these operators do not exist in the query itself, then it is naturally impossible to push down the operation. As such, the commonality is low, so you need to find an optimization strategy that is more versatile and can reduce duplication of effort.

4. Thinking about new solutions

Exploration of Xiaohongshu graph database on distributed parallel queries

As shown in the figure above, is a profile analysis of a three-hop query that takes nearly one second. We found that the number of records produced per hop spread more than 200 times from the first to the second hop and more than 20 times from the second to the third hop, and in terms of results, the number of rows of data that needed to be counted jumped directly from 66 to 450,000, and the output increased at an astonishing rate. In addition, we find that the three-hop operator occupies a large proportion of the entire query process, and its time consumption at the query layer accounts for more than 80% of the entire query.

So how do you optimize for it? When it comes to database performance optimization, there are many possible options, which fall into three main categories: storage layer optimization, query plan optimization, and execution engine optimization.

Since most of the time is spent at the query layer, we focus on this area. Because the optimization of query plans is an endless project, users may write various query statements and generate various operators, and it is difficult to find a common and convergent solution to cover all cases. The execution engine, on the other hand, can have a relatively fixed optimization scheme, so we give preference to the optimization execution engine.

The core of the graph database is the multi-hop query execution framework, and due to its large amount of data and large amount of computation, the query time is long, so we draw on the ideas of MPP database and other computing engines to propose a distributed parallel query solution.

Exploration of Xiaohongshu graph database on distributed parallel queries

The original multi-hop query execution process is shown in the preceding figure. Let's say we want to query the three-hop neighbor node ID of a 933 vertex, i.e., retrieve all the vertices in the blue circle. After processing by the query layer, an execution plan is generated as shown in the figure on the right, START represents the starting point of the plan, and there is no actual operation itself. The GetNeighbor operator is responsible for actually querying the neighbors of vertices, such as finding A and B based on 933. Subsequent operations such as Project, InnerJoin, and Project are operations such as conversion, processing, and trimming the data structure of the previously generated results to ensure the smooth progress of the entire calculation process. It is precisely the subsequent operators that consume a high latency.

Exploration of Xiaohongshu graph database on distributed parallel queries

The physical execution process of the operator is shown in the preceding figure. After the Query Server executes the START command, it sends the request to one of the Store Servers, which obtains its neighbor information and feeds it back to the query layer. After receiving the results, the query layer deduplicates or otherwise processes the data in it and delivers it again, this time targeting the other two Store Servers. This step is to obtain the information of the second-degree neighbors, return to the query layer, and then summarize and deduplicate the results, and so on.

Throughout the process, we clearly observed three issues. First of all, the operators in the blue boxes in the figure are all running serially, and you must wait for the previous calculation to be completed before the next one can be executed. For large-scale data, the efficiency of serial execution is obviously not comparable to that of parallel execution. Second, there is a synchronization point inside Query Server that is marked in red on the left (wait for all responses to return) that requires query server to wait for responses from all storage nodes before proceeding with subsequent operations. If a storage node has a large amount of data or is overloaded and the response speed is slow, it will take a lot of time to wait, so we consider canceling the synchronization wait process. Finally, the results of the storage layer need to be forwarded back to the query layer for simple processing before being sent down, which undoubtedly increases the cost of forwarding unnecessarily. If the Store Server can forward itself, it can avoid a network forwarding process and reduce overhead.

The corresponding solution strategy is three points: parallel execution of operators, cancellation of synchronization points, and direct forwarding of the results of the Store Server. Based on this, we put forward the following transformation ideas.

Exploration of Xiaohongshu graph database on distributed parallel queries

First, the Query Server transfers the entire execution plan and the initial data required to execute the plan to the Store Server, and then the Store Server itself drives the entire execution process. In the case of Store Server 1, when it completes its first query, it forwards the results to the appropriate Store Server based on the partition where the result ID is located. Each Store Server can continue subsequent operations independently, parallelizing the entire execution with no synchronization points and no additional forwarding.

It should be noted that the white box on the right side of the figure is shorter than the left side, because when the data is transferred from the top to the bottom, it is distributed in partitions, which is bound to be less than the total amount of data received by the query server.

As you can see, after each part is driven independently, there are no waits or additional forwarding, and Query Server only needs to collect the results of each Store Server in the last step, aggregate the deduplication, and then return it to the client. As a result, the overall time is significantly shortened compared to the original model.

03

Distributed parallel query implementation

The implementation of distributed parallel query involves several key elements. Let's take a look at some of these details.

1. How to ensure that there is no negative optimization for 1-2 hops

Exploration of Xiaohongshu graph database on distributed parallel queries

The first question is how to make sure that there is no negative optimization of the original 1-2 hops when doing the retrofit. When undertaking new transformations and optimizations within an enterprise, it is important to carefully assess whether the measures taken will negatively optimize the existing solution. We don't want the new solution to fail to deliver benefits and break the old system. As a result, the architecture is generally consistent with the original. A layer is inserted inside the Store Server, called the execution layer, which has network interconnection capabilities and is mainly used for forwarding distributed queries. The Query Server tier remains largely unchanged.

In this way, when the execution plan of the user is received, different processing paths can be selected according to the number of hops. If it is 1 to 2 hops, the original process is still used, because the original process can meet the business requirements of 1-2 hops, while distributed queries are used for 3 hops and above.

2. How to be compatible with the existing implementation framework

Exploration of Xiaohongshu graph database on distributed parallel queries

The second key issue is how to maintain compatibility with the original implementation framework, that is, when carrying out distributed technology transformation, it is not necessary to make significant changes to the original code, but to achieve the goal by minimizing adjustments. Specifically, an operator with routing function is added before some operators that need to switch partition access (such as GetNeighbor). There are three types of them, Forward, Converge, and Merge. The effect of Forward is obvious, that is, when any operator is encountered, it means that the data needs to be forwarded to another node for processing, and the current node cannot continue to process it. The Converge operator is added at the last step of the execution plan to indicate that the final result should be returned to the node that originally received the user request. After Converge, you also need to add a Merge operator, which needs to aggregate the results after receiving them before they can return them to the client. After such modifications, we only need to implement the three operators themselves, without any modification to the other operators, and will not interfere with the network layer, achieving a very lightweight transformation. In the process of executing plan modifications, we have also made some additional optimizations, such as pushing down operators such as GroupBy and OrderBy.

3. How to deal with hot spots

Exploration of Xiaohongshu graph database on distributed parallel queries

The third question is how to deal with hot spots, or duplicate IDs. When the entire execution process is transformed into a self-driven store server, there will be a situation, for example, the edge AC and edge BC are located on two different Store Servers, and the queries are all single-hop operations, maybe the machine on the left queries AC operations faster, while the machine on the right query BC operations are slower, so the machine on the left first finds C, and then forwards the results to other machines, and queries the neighbors of C to the next intermediate machine, that is, executes GetNeighbor from C, the node on the right, although slightly lagged, also needs to perform query C neighbor operations.

If no operation is performed, the intermediate node will query C's neighbor twice, resulting in a waste of resources. The optimization strategy is as simple as adding NeighborCache on top of each storage node. The essence is such a Map structure, whenever a read request comes, first find out whether there is a neighbor node of C in the map, if there is, then get it, otherwise access the storage layer, and fill the NeighborCache entries after the access, and the survival time of each entry is very short. The reason why it is short is that the interval between the left and right nodes to send requests will definitely not be long, and it will not reach the level of a few seconds, otherwise the business will not be able to bear it. As a result, each entry in NeighborCache only needs to survive for seconds, and is automatically deleted after that. The necessity lies in the combination mode of the key of the map, that is, Vid+edgeType, which is still very numerous, and if it is not cleaned up in time, the memory is easy to explode. In addition, when the query layer queries data from the Disk Store and backfills it with NeighborCache, it also needs to perform memory checks to avoid OOM.

4. How to do load balancing

Exploration of Xiaohongshu graph database on distributed parallel queries

The fourth problem is how to do load balancing, including two blocks, one is storage balancing, and the other is computing balancing.

First of all, the equilibrium of storage is actually difficult in the graph store that is divided by edges, because it naturally exists all the vertices and their neighbors, which is the advantage of the graph database over other databases, and it is also the price it has to bear. Therefore, there is no complete solution, and the only way to do this is to expand the cluster size when this problem is really encountered, so that the hash scattering of data can be more even, and avoid the situation where multiple hotspots fall on the same machine. For example, in a relatively large cluster for risk control, the highest and lowest disk usage is not more than 10%, so the problem is not as serious as imagined.

Another optimization method is to clean up the expired data in time at the storage layer, which can also reduce some imbalances if it is cleaned up quickly.

Problems with calculating equalization. The storage layer adopts a three-replica policy, and if the business can accept weakly consistent reads (in fact, most services can), we can see which node of the three replicas has a light load and forward the request to that node to balance the load as much as possible. In addition, as mentioned earlier, hotspot result caching is also a solution, as long as the hotspot processing speed is fast enough, computational imbalances are not easily visible.

5. How to do process control

Exploration of Xiaohongshu graph database on distributed parallel queries

The next question is how to control the process. When the execution process is self-driven by the Store Server, only the first stage is involved by the driver, and the subsequent steps are transferred and controlled by the workers. So, how does the Driver know which stage is currently executing and when one of its corresponding stages can be executed? One solution is to require each worker to send a response back to the driver after receiving the request or after the request is delivered, so that the progress information of all nodes can be recorded in the driver, which is possible.

However, this design is heavy because the driver doesn't need to know the specific state of each node, it only needs to determine whether it is qualified to execute, so in the engineering implementation, we took a lighter approach, that is, each stage generates a 32-bit binary numeric reqId, and sends it to the ACKer confirmer to communicate the relevant information. Acker also records this information in the form of 32-bit integers, and Stage1 will also receive the reqId sent by Stage 0, and after a series of internal processing, it will XOR the received reqId and the 3 reqIds generated by itself, and send the XOR result to the confirmer again. Due to the nature of XOR operations, when two numbers are the same, the result is 0, so when the 0010 number is XOR, this part becomes 0. This means that Stage 0 has been executed. All subsequent stages follow a similar pattern, when the result of the confirmer becomes 0 again, it means that the entire execution process has been completed, that is, the previous Stage 0 to Stage 3 have been read, and Stage 4 can be executed, thus achieving process driving.

Another important issue is the timeout self-test of the full link, for example, if one node in Stage2 or Stage3 runs for too long, you can't keep all the other nodes waiting because the client has already timed out. Therefore, we set some buried points in the internal execution logic of each operator to check whether the execution of the operator exceeds the limit time on the user side, and once exceeded, it will immediately terminate its own execution, so as to quickly self-destroy and avoid unnecessary waste of resources.

That's it for some of the key designs.

6. Performance testing

After the retrofit project was completed, we conducted performance tests and generated an SF100-level social network graph with 300 million vertices and 1.8 billion edges using the SNB dataset provided by the LDBC organization. We mainly examine the query performance of one-hop, two-hop, three-hop, and four-hop.

Exploration of Xiaohongshu graph database on distributed parallel queries

According to the evaluation results, in the case of one-hop and two-hop, the performance of native query and distributed query is basically the same, and there is no negative optimization. From three hops, distributed queries can achieve a 50% to 60% performance improvement compared to native queries. For example, distributed queries in the Max degree scenario have a latency of less than 50 milliseconds. With Max degree or Limit values, the latency is less than 200 milliseconds. Although there are differences between the dataset and the actual business dataset, they both belong to the field of social networking, so they still have some reference value.

Exploration of Xiaohongshu graph database on distributed parallel queries

Four-hop queries, whether original or distributed, have latency in the range of seconds to more than 10 seconds. Because the amount of data involved in four-hop queries is too large, reaching hundreds of thousands or even millions, it is difficult to meet the needs by relying only on distributed parallel queries, so other strategies are required. However, even so, the improvements we propose can still achieve a 50% to 70% improvement over the original query pattern, which is still very effective.

04

Summary and outlook

Exploration of Xiaohongshu graph database on distributed parallel queries

Combined with the idea of MPP, we have successfully implemented a framework-level innovation in the execution process of the original REDgraph, and proposed a more general distributed parallel query scheme in the graph. After the improvements, at least at the business level, the three-hop mission that was previously impossible is now realized, which is undoubtedly a major breakthrough. At the same time, through experimental verification, the efficiency has been significantly improved by 50%.

As Xiaohongshu's DAU continues to rise, the scale of business data is gradually developing towards the scale of trillions. In this context, the demand for multiple queries will also become increasingly strong. Therefore, the solution itself has the potential to be optimized, has the possibility of landing, and has practical application scenarios. Therefore, we will continue to work on improving the query capabilities of REDgraph. In addition, although the scheme is mainly implemented on graph databases, its ideas also have certain reference value for other online storage systems with similar requery requirements. Therefore, other products can also learn from this solution to design an efficient execution framework that meets their own needs.

At last. We sincerely invite like-minded students who have the ultimate pursuit of technology to join our team. Here, we especially recommend two channels: one is to scan the QR code above to join the WeChat group to discuss technical issues related to graph databases; The second is to pay attention to Xiaohongshu's technical official account REDtech, which will publish technical articles from time to time, and everyone is welcome to follow and forward it.

05

Q&A session

Q: What is the size of the LDBC-SF 100 dataset selected for testing in the introduction? In addition, the distributed method can improve performance, but the distributed implementation process may bring the cost of message communication, which may lead to poor test results.

A: The three jumps are basically in the order of hundreds of thousands.

This is indeed a problem with regard to distributed-initiated message communication, but in our scenario, it is not the most serious problem at the moment. Because the amount of data generated in each hop, especially the three hops, is huge, the time required for the computational operator to process this amount of data far exceeds the time required for message communication. Especially in the environment where multiple hops coexist, such as one hop and two hops, in fact, the amount of data they have as an intermediate result is not large, one hop is only dozens or hundreds, and two hops may be tens of thousands, but three hops as the final need to participate in the calculation of the result directly to hundreds of thousands, so the communication overhead is actually very small compared to this.

In terms of message communication, we also have some solutions, such as opening some small windows (such as 5 milliseconds) on the sending side to do some aggregation, and aggregating those requests with the same target point, which can reduce the number of requests for some communication.

That's all for this sharing, thank you.

Exploration of Xiaohongshu graph database on distributed parallel queries

Read on