Reading guide This article will share Ping An's practice in data security compliance management. Data security compliance management is a part of the whole data management, and its philosophy and ideas are in the same vein. "Model Empowerment" is also mentioned in the title, and issues related to the application of large models will also be discussed.
The main content consists of three parts:
1. Ping An's data management value proposition
2. Application of data security compliance scenarios based on large models
3. Q&A
Guest speaker|Zhang Sensen, Senior Manager of Ping An Technology
Edited by Xu Qian
Content proofreading|Li Yao
Produced by DataFun
01
Ping An Data Management Value Proposition
First, let's introduce the background of data security and the value proposition of Ping An Data Management.
In 2021, the country promulgated the Personal Information Protection Law and the Data Security Law, which put forward some new requirements for data security compliance at the national level, and put forward directions for the efficient application of data in the future. Now with the National Data Bureau, the requirements for data may go up to a higher level.
In response to the promulgation of relevant national laws, Ping An Group commissioned the establishment of a data management department to be responsible for the convergence of national strategies.
The work of data management does not start with the introduction of national laws, but has undergone continuous improvement and evolution.
The evolution of Ping An's data management mainly includes three eras:
- The first era is the information age, and the main tasks are data quality improvement, regulatory reporting, etc. Ping An has a large number of data applications in the insurance and financial industries, which need to be submitted for regulatory purposes.
- The second era is the construction of a data management system with data assets as the core. Not only at the group level, but also in each subsidiary, it will also do its own data asset management and data value mining.
- The third era is a global data management system based on data compliance. That is, on the premise of meeting the requirements of external supervision for data security, the construction of a quality management system that can ensure the efficient circulation of data and put forward requirements for improving asset operation is carried out.
In the process of development, we need to gradually solve each pain point:
- Data responsibility and capability assessment: including the construction of data strategy, assessment, organization, capability, system and domain.
- Measurement and operation of data value: including the transformation, quantification, display, and operation of data value.
- Scientific and complete assurance system: Do a good job in the planning, system construction, operation and guarantee of the entire compliance system.
The entire compliance system includes two parts: compliant data assurance and efficient data exchange.
- First of all, data compliance is guaranteed, including customer privacy agreements, entrusted sharing agreements, authorization authentication management, etc. Under this system, we will carry out an overall and unified interpretation of national laws, and then conduct assessments to achieve a collaborative mechanism.
- In addition, on the basis of compliance, to ensure efficient data interaction, we have made unified data asset management, hierarchical classification, and promote data interaction and unified value asset evaluation.
The business understanding of data management for external regulation, enterprise enablement, and asset governance includes the following aspects:
- The lowest level is the support of the technology platform, including the data asset management platform, the operation monitoring platform, the data encryption sharing platform, and the customer authorization management platform.
- Upward is the empowerment of enterprises, including the implementation of data work, daily monitoring, and some special review work. On this basis, data compliance assurance and data asset management are achieved.
- Finally, there is the docking of external supervision, which used to be the China Banking and Insurance Regulatory Commission, and now the Financial Bureau, and some banks' own regulatory departments or securities regulatory departments. There should be corresponding organizational processes and systems, as well as assessment indicators, to disclose and expose to the outside world.
The above diagram shows the data management solution built by Ping An over the past 10 years, including 1 platform, 4 types of rule bases, 5 services, and 6 types of customers.
- 1 platform, which refers to the data governance platform, including asset, operations, encryption, and authorization management.
- The 4 types of rule libraries include the compliance rule library, the compliance knowledge base, the data asset library, and the system tool library.
- The five services are to provide data management mechanism, data compliance assurance, data asset management, data operation management and technical tool implementation.
- The six types of customers include business executives, data managers, business people, IT people, legal personnel, and finance personnel.
We need to face the governance and management of more than a dozen professional companies of Ping An Group, so it is not the same as the data management of only one company. After our work is completed, it will be sent to each professional company, and the professional company will carry out the corresponding implementation and feedback.
02
Application of data security compliance scenarios based on large models
The following describes the application of Ping An's large-scale model-based data security compliance scenarios.
Our application scenarios mainly include data compliance management, data asset management, and data capability assessment. In these scenarios, we are also experimenting with large models. LLM tasks mainly include classification tasks, summary tasks, evaluation tasks, question answering tasks, and SQL tasks.
Model layering includes the base layer, the decision-making layer, and the execution layer. The basic layer is mainly to do some Q&A and improve the knowledge base. At the decision-making level, the model will be re-trained, fine-tuned, and combined with the knowledge graph to support special scenarios such as compliance review and pre-review, material summary, content judgment, and maturity assessment. Execution layer, there are some tools and capabilities to sink. We are now also trying to use large models to make decisions, drive tools to interpret accordingly, and then evaluate based on the results of the explanation.
The overall technical architecture is shown in the figure above. A multi-modal large model is adopted. From the perspective of the signal side, it includes text extraction, PDF to image, and then stores and retrieves textual vectors through the index vector, and then transmits them to the model side through instruction routing. On the model side, Ping An has its own GPT, and professional companies will also have their own GPT, so it is the existence of multi-GPT. On top of multi-GPT, the instruction is processed, the Prompt and Job are distributed, and finally the data extraction is carried out through the assembly and formatting of the results of the entire model on the application side, so as to drive the development of the three businesses mentioned above, and establish a knowledge application center, a capability application center, and an indicator application center.
For data compliance scenarios, it mainly includes the following tasks: first, the issuance of systems; Next, we will conduct a compliance check on the system; After the inspection, it will be sent to the professional company, and the professional company will declare the entire information; We evaluate based on the declarations of professional firms; After the evaluation, the risk test is carried out; In response to the recent laws and regulations of the country, we may carry out some special inspections, such as PIA assessment, entry and exit assessment, etc.
In these works, the difficulty of issuing the system is relatively high, because the compliance work starts from the state to legislate, and many things are not detailed, and enterprises need to continue to explore and find the direction of work. So the workload is very large, very time-consuming, and very difficult. Sometimes the interpretation of laws and regulations requires not only an understanding of the law, but also an in-depth understanding of the entire technology, so the comprehensive requirements for people are higher.
In the work of compliance inspection, there is often a lack of guidance, and we need to form a more standardized engineering guidance based on people's subjective experience and judgment ability, and apply it to the entire compliance interpretation process.
Regarding the declaration of information, there may be hundreds of materials submitted for a check. Even some of the data that we have connected to the interface, although it has been formatted, the form and status of the data may not be completely consistent each time, and some more cleaning may be needed, and the workload of review is very large.
Risk monitoring should calculate a large number of indicators and report them to the group leaders, and at the same time, it should also be provided to various professional companies to provide them with risk guidance.
Finally, there is the special explanation, PIA's assessment. Ping An will review and review the outbound data, which is also a very large workload. Therefore, with the help of large models, we can precipitate the entire knowledge base, and do a good job of intelligent audit and abnormal monitoring and alarm through multi-modal methods, so as to greatly improve work efficiency.
The similarity between our asset management and the asset management of professional firms is that the data is collected from collection to governance, to inventory, to use. However, our asset management is more focused on the use of assets, facilitating compliant data exchange between different professional firms.
In the data collection part, the main work is to promote the use and standardization of DataOps tools in the professional companies of the group.
In the part of data governance, it is necessary to do a good job in data standard management and data quality management. We don't focus so much on the specifics of each professional company's naming and other details, but on whether the end result is the result of his own guidance.
In terms of asset inventory, it mainly focuses on classification and classification, and focuses on the investigation and audit of key data. It is also necessary to do a good job of data accountability, once the data is leaked or other risks appear, the responsible person can be traced.
Finally, there is the use of assets. The use of assets must go through the approval chain, and there must be an application for interaction and a compliance review. Sensitive data cannot be used, and the application may need to be rectified by masking or encrypting it.
In the whole process, the most important work includes using AI analysis to do hierarchical classification, and data accountability through knowledge graphs.
The third scenario is the evaluation of data capabilities based on large models. As data management and data governance continue to be strengthened in enterprises, digitalization has become an essential part of the company's development process. The Academy of Information and Communications Technology will also carry out a large number of review work. Before the review was carried out, many professional companies were not very clear about their current level. Therefore, we will do some preliminary evaluations internally, including the evaluation of DCAM, the evaluation of security capabilities, the evaluation of the degree of data status, and so on.
The work of evaluation is very complex and huge, and it cannot be completed by manual review alone, so we will do a review of the content based on the knowledge base and make suggestions for rectification.
That's all for this sharing, thank you.
03
Q&A
Q1: How is the ability of large model classification and grading realized, and what is the effect? In compliance management, how do you understand the risks of compliance and what are the rule bases? What role does the big model play in compliance management?
A1: First of all, to answer the first question, there is a professional guide in the financial industry, we will go to a professional company at the beginning, implement its practice, and then precipitate the results of classification and classification. For professional companies with better metadata governance, they can quickly identify high-risk data through knowledge graphs and other methods, avoid the outbreak of high-risk problems, and then do hierarchical processing of other data. That's how we do it now.
As for the big model, we are currently trying to identify high-risk data, but the later data may be more challenging for our knowledge base.
Ping An's large model work is not done by a team, Ping An itself is doing its own large model, they are doing the L1 layer, and we are doing the vertical domain, so it is equivalent to us being the demand proposer, and they add our needs in the process of making a large model of the overall safety system, do a good job of tuning, and output it to us.
In the future, we will consider engineering the process of building a knowledge base, combining parameter tuning and finetune, so that the large model in the compliance field can have some correlation with the vertical model in the financial field.
Q2: How to test the results of the large model? How to do a good job of mutual coordination with people, not only reduce the input of labor, but also ensure the accuracy?
A2: At the group level, the scenario will be a little simpler and simpler, because it is the data assets reported by various professional companies, and the cleanliness is guaranteed, so the process is relatively simple when we use the large model to do the review. But in fact, this app cannot be used in professional companies. In the final step of building the knowledge base, you should write a case, and then verify the case. For example, in the traditional process, the quality verification results of data governance or metadata should be aligned with the results of the large model, which is what we want to do in the future, and consider making it platform, so that the whole alignment process only needs to see if the result is OK in the platform, and if it is OK, fix this function of the large model, and focus on doing it in the future.
Q3: How is the display of the indicator calculation included in the risk monitoring module in the sharing process, where will the output be placed, and who are the specific users for? These data should come from the underlying business system of Ping An's various professional companies, and the group should have an overall database after getting these data, so where is the empowerment of professional companies? What is the value of output data from the outside and from the inside?
A3: That's a very good question. Everybody is talking about indicators, especially risk indicators. The management pays the most attention to risk indicators, so the final display of indicators will be a large screen or a board, on which you can see whether personal information protection has been achieved, whether the signing of privacy agreements has been achieved, whether the compliance process of asset interaction has been achieved, and so on. Based on more than 70 metrics, it was finally condensed into a dozen metrics, which were displayed on the board and provided to management.
The second level is to help the leaders of each professional company understand the extent of their compliance work, such as whether the following apps have been changed to the agreement, and what level of classification and classification is, so that each company knows in mind.
The third level is to give the operational level that is really doing data management, and they may be more concerned about the results reflected in my work, such as whether there is an actual risk in a certain material exchanged, and whether it needs to be rectified.
It is equivalent to the focus below, the middle focus, and the leadership is more three-dimensional.
Q4: The data security section mentions data accountability, how is the balance between data accountability and data usage efficiency handled? For example, some business data may be assigned some responsible persons, but for data analysts and data model personnel, they may not have much contact with the business, but they need to use data, and may need certification, approval and other processes, so how to ensure work efficiency?
A4: That's a very good question. Data accountability is a very difficult problem to solve, and there is no best practice in the industry that can completely solve the problem of data accountability. The "20 Data Points" proposes that the roles of data should be separated, and the producers and owners should have different responsibilities. We assign an owner to each piece of data, who is responsible for whether the data is shared and who is responsible for compliance. For processors, for example, if company A shares the data with company B, then company B has the right to use the data, and there is a specific scenario behind each data processing, and the data owner is only responsible for whether the data should be used in this data scenario. We approach accountability and compliance in this scenario-driven way.
That's all for this sharing, thank you.