laitimes

What exactly is a data grid, and can it really replace data warehouses and data lakes?

The concept of a data mesh was coined by Zamach · Dehegani, who pointed out in her seminal 2019 article, "How to Move from a Single Data Lake to a Distributed Data Mesh": "Traditional centralized data management models cannot adapt to rapidly changing business needs, while data mesh manages data in a distributed way, allowing business units to own and manage their data, while enabling cross-departmental data sharing through standardized APIs and self-service platforms." In 2022, De Gegani's book, Data Mesh: Delivering Data-Driven Value at Scale, was published, detailing the design principles, implementation methods, and technical architecture of data mesh, driving the concept to further popularize.

What exactly is a data grid, and can it really replace data warehouses and data lakes?

Data mesh first appeared in the 2022 Gartner Data Management Technology Maturity Curve, positioned in the "Innovation Trigger" stage, although it is the first time it has appeared, Gartner predicts that data mesh will become obsolete before reaching the "Plateau of Productivity", as shown in the following chart:

What exactly is a data grid, and can it really replace data warehouses and data lakes?

This view of Gartner has been controversial, with some experts arguing that Gartner's view is too biased towards vendors and technologies and ignores actual business needs. There are also experts who say that data mesh will continue to grow, but will be broken down into smaller components that will be absorbed by other aspects of emerging data tools. In order to better understand the data grid, I took the original book "Data Grid: Delivering Data-Driven Value at Scale" by De Hegani and found that I underestimated the data grid, which is not a simple concept, but an architectural system built around the data grid. Therefore, write a reading note and share it with everyone. This reading note is divided into five parts, which are consistent with the original book catalog, as follows: 1. What is a data grid? 2. Why choose a data grid? 3. How to design a data mesh architecture? 4. How to design the data product architecture? 5. How do I get started with implementing a data grid? After reading this English book, I have an experience that it is not difficult to propose a new concept, but to explain this concept clearly, and to be able to use a book to interpret it systematically, it shows that I have really thought about this matter clearly, which is called learning in practice. Let's take a look at how the proposers of data grids did it. 1. What is a data grid? In today's digital age, data has become a core asset for businesses. However, as data continues to grow in size and complexity, traditional approaches to centralized data management are facing unprecedented challenges. Data Mesh is a revolutionary data management paradigm that provides us with a new perspective on how to look at and process analytical data in large-scale, complex environments. The essence of a data grid is a decentralized approach to social technology. It's not just a technology architecture, it's a mindset shift that touches multiple layers of organizational structure, culture, and technology. At its core, data mesh is about decentralizing data responsibilities to the business areas that know the data best, while standardizing and automating to ensure overall consistency and interoperability. The following diagram shows the multidimensional technology and organizational change of the data mesh compared to the earlier analytical data management approach.

What exactly is a data grid, and can it really replace data warehouses and data lakes?

1. Organizationally, it has shifted from a centralized data ownership model by experts running data platform technology to a decentralized data ownership model, pushing the ownership and responsibility of data back to the business domain where data is generated or used. 2. Architecturally, it shifts from collecting data in a single warehouse and lake to a distributed network connection of data products accessed through standardized protocols. 3. Technically, it moves from a technical solution that treats data as a by-product of running pipeline code to a solution that treats data and the code that maintains it as a dynamic, autonomous unit. 4. Operationally, it transforms data governance from a top-down centralized operating model with manual intervention to a federated model that embeds computing policies in network nodes. 5. In principle, it shifts our value system from treating data as an asset that can be collected to seeing data as a service and a product that satisfies the users of the data (both inside and outside the organization). 6. In terms of infrastructure, it has transformed from two sets of fragmented and point-to-point integrated infrastructure services—one for data and analytics, and the other for applications and operating systems—to a well-integrated infrastructure that serves operations and data systems. A data mesh is built on four interrelated principles that together form the basis of its theory and practice: 1. Domain ownership: This principle assigns responsibility for data to the business domain closest to data. It borrows from the ideas of Domain-Driven Design (DDD), which logically breaks down data into business domains, with each domain team responsible for managing and sharing the data they know best. This approach not only improves the authenticity and timeliness of data, but also better adapts to changes in the business. 2. Data as a product: Applying product thinking to data management is a major innovation of data mesh. Each dataset is considered a "product" and needs to be discoverable, understandable, and trustworthy. This shift in mindset has led teams to focus more on the quality and user experience of their data, thereby increasing the overall value of their data. 3. Self-service data platform: In order to support decentralized data management, the data grid needs a powerful self-service platform. The platform needs to simplify the process of creating and using data products, lower the technical barrier, and enable more general-purpose technicians to participate in data work. 4. Federated computing governance: While decentralizing data responsibilities, data mesh also recognizes the importance of global consistency. The federated computing governance model balances domain autonomy and global interoperability through automation and computation for policy enforcement. The four principles are common, necessary and sufficient. They are mutually reinforcing, and each principle responds to new challenges that may arise from other principles. The diagram below shows the interplay between these principles.

What exactly is a data grid, and can it really replace data warehouses and data lakes?

2. Why choose a data grid? Today's problem stems from yesterday's solution, the current analytical data architecture has undergone three generations of evolution, and De Hegani specifically pointed out that the previous technology partition-driven approach has problems: 1. Data warehouse architecture This is the earliest centralized data management method, extracting data from business systems, transforming and loading into a centralized warehouse. While data silos are solved, they often become complex and difficult to maintain over time. 2. Data lake architecture was born to meet the needs of big data and machine learning, and the original data form is retained. But there is also the risk of a "data swamp", where data quality and availability become challenges. 3. Multi-modal cloud architecture combines the advantages of the previous two generations of architectures and leverages the advantages of the cloud. But the challenges posed by organizational complexity are still not fundamentally addressed. Although these three generations of architectures continue to advance at the technical level, they still have several common limitations: 1. Monolithic: Architecture, technology, and organizational structure tend to be centralized, making it difficult to cope with business complexity and change. 2. Centralized data ownership: Although the problem of data silos is solved, it is getting farther and farther away from the data source, which affects the data quality and response speed. 3. Technology orientation: Architecture design focuses too much on technical functions and ignores the natural boundaries of the business domain. The data grid learns from past solutions and works to address their shortcomings. It reduces the point of centralization that acts as a coordination bottleneck. Found a new way to decompose data architectures without slowing down your organization with synchronization. It eliminates the gap between where the data comes from and how the data is consumed, and eliminates the unexpected complexity that occurs between the two data planes. The goal of a data mesh is to enable organizations to capture the value of data at scale, using data not only to improve and optimize the business, but also to reshape the business in three ways:

What exactly is a data grid, and can it really replace data warehouses and data lakes?

1. Respond to change gracefully

  • Align business, technical, and now analytics data

    Each business unit of the data mesh assumes ownership and management responsibility for the data analyzed. This is because the people closest to the data are in the best position to understand what analytics data exists and how best to interpret it. Domain name ownership results in a distributed data architecture in which data artifacts—datasets, code, metadata, and data policies—are maintained by their corresponding domains.

What exactly is a data grid, and can it really replace data warehouses and data lakes?
  • Close the gap between analytical and operational data

    Data mesh proposes to close the gaps and feedback loops between the analytics and operational levels by sharing data in the form of a product. The data mesh connects these two layers under a new structure – a network of data products and applications connected peer-to-peer.

What exactly is a data grid, and can it really replace data warehouses and data lakes?
  • Localize data changes to the business domain

    A data mesh localizes changes to various domains and gives them the power to model data autonomously based on a deep understanding of the business, without the need for central coordination of a single, shared standard model.

  • Reduce the unexpected complexity of pipelines and data replication

    Data Grid solves the complexity of traditional centralized data exchange by creating a new architectural unit, Data Product Quantum, which has a clear set of contracts and guarantees for each of its local access modes – SQL, files, events, etc. Provides access control and policy enforcement for each interface at the time of access. Data quanta encapsulates the code that transforms and maintains its data. The data pipeline is broken down and becomes an internal implementation of the quantum logic of the data. Data quanta share data without the need for an intermediary pipeline.

2. Maintain agility in growth

  • Eliminate centralized and monolithic bottlenecks

    Data mesh takes a peer-to-peer approach to serving and consuming data. This architecture enables consumers to directly discover and consume data from source data products.

What exactly is a data grid, and can it really replace data warehouses and data lakes?
  • Reduce the coordination of data pipelines

    The existing architecture is based on the technical decomposition of components – i.e., pipeline tasks such as data acquisition, processing, services, etc. This style of schema decomposition results in a lot of coordination between these capabilities whenever a new data source or new use case is delivered. Data mesh moves away from the technical partitioning of data management to domain-oriented partitioning. Domain-oriented data products develop and evolve independently of other data products. This domain-oriented decomposition reduces the coordination needed to achieve results.

What exactly is a data grid, and can it really replace data warehouses and data lakes?
  • Reduce the coordination of data governance

    The centralization of data governance and highly manual processes inhibit the flexibility of data sharing. The data network reduces governance coordination friction through two functions: first, automating and embedding policies as code in every data product, and second, delegating the core responsibility for governance to data product owners in various domains.

What exactly is a data grid, and can it really replace data warehouses and data lakes?

3. Improve data ROI

  • Abstract technical complexity with a data platform

    Data Grid critically examines the existing technology landscape and reimagines technology solutions as a developer-centric platform for data product developers (or users). It is designed to eliminate the need for data specialists and enable generalists to develop data products.

  • Embed product thinking everywhere

    Data mesh shifts our mindset from thinking of data as an asset to thinking of data as a product. It has changed the way we measure success, moving from the volume of data to the satisfaction of data users.

  • Push the boundaries

    Data Quantum provides a set of interfaces that allow anyone with appropriate access controls to discover and use data products, regardless of their physical location.

3. How to design a data grid architecture? Starting from four core principles, the author gradually constructs the overall architectural framework of the data grid, showing the derivation process from concept to practice. 1. Domain-oriented analytical data sharing interfaceThe principle of domain ownership extends the boundaries of the domain, requiring each domain to control its data - operational data and analytical data, and each domain provides an analytical data sharing interface. This breaks through the separation between domain and analysis in the traditional data architecture, and lays the foundation for data source governance.

What exactly is a data grid, and can it really replace data warehouses and data lakes?

2. As an architectural quantum data grid, each data product is designed as an "architectural quantum", which is the smallest architectural unit that can be deployed and managed independently. It has high functional cohesion, i.e., performing specific analytical transformations and securely sharing the results as domain-oriented analytical data. It has all the structural components needed to perform its functions: conversion code, data, metadata, policies that govern data, and dependencies with infrastructure. This design greatly improves the autonomy and reusability of data products.

What exactly is a data grid, and can it really replace data warehouses and data lakes?

3. The data grid of the multi-plane data platform adopts a multi-plane platform architecture design, which mainly includes three planes:

  • Data infrastructure (utility) plane: Atomic services are used to provision and manage physical resources such as storage, pipeline orchestration, compute, and so on.
  • Data Product Experience Plane: A more advanced abstract service that operates directly with a data product, enabling producers and consumers of the data product to create, access, and secure the data product, as well as other operations that run on the data product.
  • Mesh Experience Plane: Services that run on a network of connected data products, such as searching for data products and observing data lineage between them.
What exactly is a data grid, and can it really replace data warehouses and data lakes?

The mesh experience layer relies on the interfaces of the data product layer because it aggregates them, while the data product experience layer relies on the interfaces of the underlying infrastructure service layer because it abstracts them. This hierarchical design not only ensures the optimization of user experience, but also takes into account the efficient use of underlying resources. The various planes of the platform interact with each other through APIs, maintaining good decoupling. 4. Embedded computing policy data grid adopts a distributed governance model, embedding various policies (such as access control, encryption, privacy protection, etc.) into each data product in the form of code. The platform provides a unified control interface, but specific policy enforcement takes place in the runtime context of the data product. This design not only ensures the consistency of governance, but also avoids the performance bottlenecks that may be brought about by centralized governance. The data mesh architecture introduces several logical components to manage data product policies as code:

  • Data product sidecar: A sidecar is a process that implements policy enforcement and other aspects of data products that need to be standardized across the network. It is provided by the platform. It is deployed and run as a unit with data products.
  • Data Product Compute Container: A method for the platform to encapsulate execution policies into a deployable unit and data product. For the sake of brevity, I sometimes call it a data container.
  • Control ports: Control ports provide a standard set of interfaces to manage and control policies for data products. The following diagram shows these logical components. Ideally, the platform provides and standardizes domain-agnostic components such as sidecars, control port implementations, and input and output ports.
What exactly is a data grid, and can it really replace data warehouses and data lakes?

5. Designing a multi-plane data platform with a user journey-driven platform The ultimate purpose is to serve cross-functional domain teams so that they can deliver or consume data products. There are several major high-level roles in the data mesh ecosystem, including data product developers, data product consumers, data product owners, data governance members, data platform product owners, data platform developers, and more. The following diagram exemplifies a data product developer's journey to creating and operating a data product: in the case of a source-aligned data product that gets data from the operating system, the data product developer works closely with the source application developer. Together, they design and implement how applications can share their operational data as input to data products. Please note that these people belong to the same domain team.

What exactly is a data grid, and can it really replace data warehouses and data lakes?

The following diagram illustrates how the platform interface is designed to support data product development:

What exactly is a data grid, and can it really replace data warehouses and data lakes?

The following diagram illustrates how a data infrastructure platform supports data product delivery:

What exactly is a data grid, and can it really replace data warehouses and data lakes?

4. How to design the data product architecture? Data products are at the heart of a data mesh and require an efficient, flexible, and scalable data product architecture. 1. The essence of data productsData products are the basic building blocks in the data grid, which is not only a collection of data, but also an autonomous entity that can independently manage, process and provide data services. The first task in designing a data product architecture is to understand its essential characteristics:

  • Autonomy: Data products can operate independently and do not rely on central control.
  • Domain-oriented: Each data product represents a specific business domain.
  • Discoverability: Users can easily find and understand data products.
  • Composability: Different data products can be combined to create new value.

2. Design ideas: The core idea of affordances-oriented design data product architecture is based on affordances. Affordance refers to the ability of a data product to interact with a user (person or system). Key affordability includes:

What exactly is a data grid, and can it really replace data warehouses and data lakes?
  • Service data: Provide diversified ways to access data.
  • Consumption data: Ingest data from a variety of sources.
  • Transform data: Process and analyze data.
  • Discover and understand: Make it easy for users to find and understand data.
  • Combine data: Combine data from different sources.
  • Lifecycle management: Manage the entire lifecycle of a data product.
  • Observation and debugging: Monitor the health status of data products.
  • Governance: Ensure the proper use and management of data.

This design approach ensures that data products can adapt to change, scale easily, and continue to create value. 3. Core function design(1) Service dataService data is the main function of data products, and its design should follow the following principles:

  • Multimodal access: Provide a variety of data formats and access methods, such as APIs and file downloads, to meet the needs of different users.
  • Immutability: Once data is created, it should not be modified. This ensures the reproducibility of the analysis and the consistency of the data.
  • Bitemporality: Records the actual time of occurrence and processing time of the data. This is essential for time series analysis and data provenance.
  • Read-only access: Users can only read data and cannot modify it directly to ensure data integrity.
What exactly is a data grid, and can it really replace data warehouses and data lakes?

Case in point: Consider a customer behavior data product. It can provide both API access in JSON format and file download in CSV format. The data contains the customer ID, the type of behavior, the time it occurred and the time it was recorded. This design allows data scientists to perform real-time analysis via API, while marketing teams can download CSV files for offline analysis. (2) Consumption data products need to obtain data from various sources. Design considerations include:

  • Multi-source support: Data can be obtained from multiple sources such as operating systems, other data products, and external APIs.
  • Cross-environment consumption: Enables data acquisition between different environments, such as cloud and on-premises.
  • Input Ports: Standardized input interfaces are designed for easy management and expansion.
What exactly is a data grid, and can it really replace data warehouses and data lakes?

Example: A sales data product may need to obtain data from CRM systems, ERP systems, and marketing platforms. By designing a unified input port, it can easily ingest and integrate data from these disparate sources. (3) Conversion of dataData transformation is a key part of the value-added of data products. Design considerations include:

  • Flexibility: Programmatic (e.g., using Python, Java) and declarative (e.g., SQL) conversion methods are supported.
  • Extensibility: Allows new transformation logic to be easily added.
  • Version control: Versioning the conversion logic to ensure traceability.

Example: A customer segmentation data product may require complex analysis that combines transaction history, customer attributes, and behavioral data. It can use SQL for initial data aggregation and then implement machine learning models using Python for customer segmentation. 4. Discoverability and composability design (1) Discoverability is key to ensure that users can easily find and understand data products. Design considerations include:

  • Metadata management: Provides detailed data dictionaries, data genealogy, and quality metrics.
  • Search function: Implement powerful search capabilities, support keyword, tag, and semantic search.
  • Samples and documentation: Provides sample code and detailed documentation for data usage.

Example: Design a data catalog system with a detailed landing page for each data product. This page contains a description of the data, sample data, usage guides, and quality metrics. Users can quickly find the data product they need through the search box. (2) Composability data product Should can be easily combined with other data products to create new insights. Design considerations include:

  • Standardized interfaces: Define standard data exchange formats and protocols.
  • Semantic interoperability: Use a common data model and terminology.
  • Version compatibility: Ensure that different versions of data products can work together.

Case study: Consider combining customer data products and transactional data products to create a customer lifetime value data product. With standardized interfaces and common customer IDs, these two data products can be seamlessly integrated to generate higher-value insights. 5. Management, Governance and Observation Design (1) Lifecycle management uses a data product manifest to describe and manage the entire lifecycle of a data product. The list of shoulds contains:

  • Basic information: name, version, owner, etc.
  • Data model: Detailed data structures and relationships.
  • Access control: Define who can access data and how.
  • Quality SLA: A committed data quality metric.

(2) Data governance: Co-opt governance rules directly into data products to ensure the correct use of data. Design considerations include:

  • Embedded policies: Encode policies such as data access and privacy protection directly into data products.
  • Audit trail: Record all data access and usage.
  • Compliance checks: Automatically check compliance with relevant regulations.
What exactly is a data grid, and can it really replace data warehouses and data lakes?

(3) Observability design: comprehensive monitoring and diagnosis capabilities:

  • Logs: Detailed records of data processing and access activities.
  • Metrics: Monitor key performance indicators in real time.
  • Tracking: Trace the flow of data throughout the system.

Example: Design a dashboard for a data product that displays data quality metrics, usage, and processing latency in real time. When an anomaly is detected, an alarm is automatically triggered and detailed diagnostic information is provided. 6. Summary of design principles

  • Decentralization: Avoid a single point of control and improve system resiliency.
  • Standardization: Ensure interoperability without sacrificing flexibility.
  • Time-sensitive: Attaches importance to the time dimension of data and supports time series analysis.
  • Scalability: The design should support the easy addition of new features and data sources.
  • Autonomy: Data products should be self-managing and monitoring.
  • Immutability: Ensure data consistency and traceability.
  • Multimodal: supports multiple data access modes to meet the needs of diverse users.

Designing a data product architecture is key to effective data management and utilization. By adopting a forforgate-oriented design approach and following the principles described above, organizations can create a data ecosystem that is flexible, scalable, and value-driven. This architecture not only improves the availability and trustworthiness of data, but also better adapts to rapidly changing business needs. 5. How do I get started with a data grid? Launching a data mesh is a complex and ongoing process that requires a complete change in technology, business, and organizational culture. By integrating a data mesh into an overall data strategy, adopting a business-driven execution framework, driving organizational change and culture, and developing a sound migration strategy, organizations can gradually establish a flexible and scalable data management architecture that supports data-driven innovation and decision-making. 1. Data mesh as the core of data strategyThe first step in launching a data mesh is to incorporate it into the overall data strategy of the enterprise. Data mesh should not be seen as an isolated technology project, but rather as a key component of realizing data-driven business value. Before you start a data mesh, you need to assess your organization's readiness. It can be evaluated from the following aspects:

What exactly is a data grid, and can it really replace data warehouses and data lakes?
  • Organizational complexity: The number and diversity of data sources and use cases
  • Data-driven strategy: Whether data is used as a strategic differentiator
  • High-level buy-in: Whether there is buy-in and input at the executive level
  • Data technology maturity: Whether data technology is considered a core competency
  • Culture of innovation: Whether or not it is an early adopter type
  • Engineering Practice: Whether or not you have modern software engineering practices
  • Domain-oriented organizational structure: Whether teams have been organized by business area

If your organization scores moderately or high in these areas, you have a good foundation for adopting a data grid. 2. The implementation of the business-driven execution framework data grid should adopt a business-driven approach, closely integrating the technical implementation with specific business value. (1) Identify high-value use cases and choose business use cases that can quickly demonstrate value as a starting point. These include some of the following principles:

  • Start with complementary use cases
  • Understand and prioritize data consumer and provider roles
  • Start with use cases that rely on the least of the platform's missing features
  • Create long-term ownership and budget management of platform services and data products

These use cases should be able to demonstrate the benefits of a data mesh, such as cross-domain data integration and real-time analytics capabilities. (2) End-to-end iterative execution adopts an end-to-end iterative approach, and each iteration covers a complete process from business requirements analysis to data product development to platform capability building. This approach enables continuous delivery of value and rapid feedback. (3) The implementation of the evolutionary execution model data grid should follow a multi-stage evolutionary model:

  • Exploratory Phase: Select 1-2 areas for piloting and establish basic concepts and practices
  • Expansion phase: Expand the successful experience to more fields and improve the platform capabilities
  • Extraction stage: Optimize and consolidate existing achievements to achieve benefits at scale

The following diagram illustrates the evolution of data as a product at different stages of development. Let me break down the characteristics of each stage one by one:

What exactly is a data grid, and can it really replace data warehouses and data lakes?

(1) Exploration/facilitation phase

  • 数据产品数量: 少量(Small number of DPs)
  • 数据产品功能: 基本功能(Essential affordances)
  • 开发重点: 制定标准和合理做法(Set standards, sensible practices)
  • 数据产品类型: 低风险(安全性、可靠性)(Low risk (security, reliability))
  • 数据产品角色: 主要是源对齐的(Majority source-aligned)

At this stage, only a small number of data products are created, mainly to implement basic functions. Developers focus on establishing standards and best practices. The data products chosen are generally low-risk, primarily those with source data alignment to ensure security and reliability. (2) Scale-up/scale-up stage

  • 数据产品数量: 大量快速增长(Large number of DPs rapidly growing)
  • 数据产品功能: 为快速开发提供大多数功能(Most affordances for rapid DP dev)
  • 开发重点: 支持多样性(Support diversity)
  • Data product type: High risk
  • 数据产品角色: 包括所有类型,包括聚合数据(All including aggregates)

At this stage, the number of data products is growing rapidly, and the functions are more abundant to support rapid development. Development focus shifts to support diversity, including higher-risk data products. The data product type is extended to all types, including aggregated data. (3) Extraction/maintenance phase

  • 数据产品数量: 趋于稳定(Number of DPs stabilizing)
  • 数据产品功能: 所有功能都为弹性而设计(All affordances for resilience)
  • 开发重点: 优化数据产品(Optimize data products)
  • Data Product Type: Legacy System (Legacy)
  • Data Product Role: Majority consumer-aligned

At this stage, the number of data products stabilizes, and all functions are designed to improve system resilience. The development focus shifted to optimizing existing data products. At this time, the integration of legacy systems also began, and the data products were mainly oriented to meet the needs of consumers. At each stage, the fitness function should be used to assess progress. These functions can include:

  • Domain ownership: The number of domains involved in the development of data products and the growth rate of data product usage
  • Data as a product: Data product usage, user satisfaction, and data quality metrics
  • Self-service platform: data product development cycle and adoption rate of platform services
  • Federated computing governance: the coverage of automated policies and the number of cross-domain data connections

3. Organizational Change and Culture Shaping Initiating Data Grid is not only a technological change, but also a profound transformation of organization and culture. (1) Cultivating a data culture and promoting the core values required for data mesh:

What exactly is a data grid, and can it really replace data warehouses and data lakes?
  • Shared responsibility for data: Each team is responsible for their own data
  • Cross-border data connectivity: Encourage cross-domain data sharing and integration
  • User center: Design data products based on the needs of data consumers
  • Autonomy and collaboration: Balance local autonomy with global consistency
  • Continuous Change: Build data products that are resilient, durable, and independent
  • Automation-first: Increase the speed and quality of data sharing through automation

(2) Adjust the organizational structure and reshape the organizational structure by adopting the method of team topology:

  • Domain Data Product Team: As a streaming alignment team, responsible for the development and maintenance of data products for specific business domains
  • Data platform team: As a platform team, it provides a self-service platform to support the development and operation of data products
  • Federal Governance Team: Acts as an enablement team to develop global policies and standards, and coordinate cross-domain collaboration
What exactly is a data grid, and can it really replace data warehouses and data lakes?

The domain data product team is responsible for the end-to-end delivery of the data product and is considered a stream-aligned team. They share their data products as a service with other teams. The data mesh platform team provides their platform capabilities as a service to the data product team. The governance team part acts as an enablement team that supports the platform and data product teams. The governance team sometimes collaborates with the platform team. (3) Introduce new roles, create and define new roles to support data grids:

  • Data Product Owner: Responsible for the vision, roadmap, and value delivery of the data product
  • Domain data product developer: responsible for the technical implementation and operation and maintenance of data products
  • Platform Product Owner: Responsible for the user experience and feature prioritization of the data platform

At the same time, existing roles need to be adjusted, for example, the role of the chief data officer may shift from directly managing data to more of an enabling and strategic guidance role. (4) Invest in skills development and improve the data literacy of all employees:

  • Basic data analysis and visualization training for everyone
  • Provide technical personnel with professional training such as data modeling and data quality management
  • Provide training to managers on data strategy and data governance

Create new career paths to enable more general-purpose technologists to participate in the development and use of data products. 4. Migration StrategyFor most organizations, starting a data mesh means migrating from an existing data architecture, such as a data warehouse or data lake. This process requires careful planning: (1) Avoid coexistence with centralized architectureThe goal of data mesh is to eliminate centralized bottlenecks, so it should not coexist with existing centralized data architectures for a long time. (2) Leverage existing technologies Existing data technologies can be leveraged before technologies designed specifically for data grids emerge, but configured in a way that supports autonomous and distributed data products. (3) Direct connection to the source systemDuring the migration process, the existing data lake or warehouse should be bypassed and the data product should be built directly from the source system. This allows for better domain ownership and shortening the distance between the source and the consumer. (4) Atomized evolutionary step migration should be carried out in atomized evolutionary steps, and each step should reduce technical debt and architectural entropy. For example, create new data products, migrate existing consumers, and retire old forms, files, and pipelines. epilogue

At this point, all the five parts of the book have been covered, and I have benefited a lot from reading it myself. Each part of the original book is very rich in content, limited by space, I can only click on most of them, if you are interested in the relevant content, you can go to the original book to read. As for the question of whether a data grid can replace a data warehouse or a data lake, I think De Hegani underestimated the dimensional complexity of data relative to function, the impact of the network, the difficulty of coordination, the huge impact of corporate culture, and the huge technical challenges of distributed converged analytics. De Gegani's concept of letting the people closest to the business do the data is very reasonable, but the autonomy of the domain can be achieved through tenants on a unified data lake, which can also take into account both centralization and decentralization, and there are not a few people in China who have adopted this form, but they have not done it as thoroughly as the data grid, and I think that the data grid has put forward too high requirements for infrastructure and automated governance, and the current vendor cannot take it. But one thing that I really agree with is that the OLTP team and the OLAP team need to be completely integrated, and there are very few companies that can do that now. However, with the advent of AIGC, the integration of OLTP and OLAP has become a trend, and any AI application team needs at least a professional corpus data engineer. I have made some attempts to integrate the information and data teams that I am responsible for, and I think that the benefits are great, and in the past, there was no driving force to kneading the two teams, but now AI seems to be able to do so.