laitimes

Four questions to understand how the bank's data team builds a data system

author:InfoQ

Guest|Xu Xiaolei, Head of Business Intelligence of Credit Card Center of Guangfa Bank

Editor|Gao Yuxian

As a new factor of production in the era of digital economy, the importance of data has become a general consensus. Even so, there are still many misunderstandings in the industry's perception of data, and many enterprises mistakenly regard data resources as data assets, believing that "having data" is equivalent to "using data well", but due to the unclear construction of data system and data asset operation strategy, it is difficult to fully explore the final value of data.

In order to realize the whole process service from data collection, analysis, to management decision-making, and transform data resources into data assets, more and more enterprises are stepping up the establishment of data-related departments, hoping to realize the continuous operation and value mining of the whole life cycle of data assets. However, many non-Internet enterprises in China have been plagued by a series of problems such as data system construction and data team management.

In the recent "Super Connected Microphone. Xu Xiaolei, head of business intelligence of the credit card center of China Guangfa Bank, shared the practical experience of the credit card center of China Guangfa Bank and his personal insights on four topics: how to solve the challenges in the construction of the data system, how to efficiently collaborate between data and other teams, how to build a data team, and how to plan the future data system.

On August 16-17, FCon Global Fintech Conference will be held in Shanghai. With the theme of "Technology-driven, Wisdom Enlightens the Future: Stimulating the Endogenous Power of Digital Finance", the conference invited experts from domestic and foreign financial institutions and fintech companies to share their practical experience and in-depth insights. Mr. Xu Xiaolei, the person in charge of business intelligence of the credit card center of China Guangfa Bank, will share at the conference as the special producer of "Data Asset Operation and Data Intelligence Application" at the conference, discussing how AIGC and other technologies can mine the value of data in the online operation scenario of banks and realize the empowerment of business. More speech topics are being recruited, click the link to view the current topic schedule and submit the topic: https://fcon.infoq.cn/2024/shanghai/

The following content is based on the dialogue, with cuts:

How to achieve data standard unification

and data quality?

InfoQ: Based on the experience of the credit card center of China Guangfa Bank, please introduce the key stages of the overall process of data system construction? What are the key problems that need to be broken through at each stage?

Xu Xiaolei: First of all, let's share our data environment and system construction. As a financial institution, we have both online user platforms, such as our own app, WeCom, etc., as well as offline channels and users. On this basis, we have accumulated a large amount of financial data, with 120 million credit cards issued alone, and dozens of gigabytes of new data added to online channels every month. In the face of such data volume and complexity, the construction of our data system is divided into several key stages:

1. Data governance framework and standards: Any systematic construction requires a clear and firm goal. Without a clear goal, data efforts can be lost. We need to determine what data is necessary and what criteria it should meet. For example, a persona may have hundreds or thousands of metrics, but we may only focus on 100 of them, not all of them. After defining the framework and standards for data governance, we will build a data architecture that will be led by our R&D and technology teams to build a scalable, efficient, and secure system to protect our customers' private data.

2. Product selection: We will select the appropriate data platform products to ensure the effective and reliable storage of massive data, and can be called efficiently and accurately in upper-layer applications.

3. Data integration and management: This step is more important than the previous two steps because it is a continuous and normalized process, constantly iterating and improving itself. In this process, we need to solve the problem of combining and mapping data from different subject areas to ensure the consistency and accuracy of the data.

4. Data analysis and application: At this stage, we are faced with the challenge of how to understand the business with data, and how to understand the data from the standpoint of the business. We use data analytics to derive business conclusions, and models to help us uncover deeper insights.

5. Audit and control: As a financial enterprise, you need to regularly audit and control data application and management to ensure compliance and security.

6. Data Operations: Maintain data on an ongoing basis to ensure that it is authentic and valid, and to keep it active.

InfoQ: The construction of enterprise data system often involves the linkage between different business links and business segments, so how to achieve the unity of standards in the data governance phase? And how can we ensure the smooth implementation of these standards?

Xiaolei Xu: We have set up a fintech committee, which is responsible for coordinating the entire data work. Within the committee, there is an important department called the Decision Management Department, which is responsible for managing the data definitions and data metrics of the entire credit card center. The management scope of this department includes seven aspects: the name of the indicator, the business classification, the type, the business caliber, the technical caliber, the correlation dimension, and the iterative update cycle. Every change to a metric must go through a change management process to take effect, and these definitions form a public data dictionary where people with different permissions can view the definitions of the relevant metrics according to their work permissions.

The work on building a business indicator system is usually led by business students because they understand the business needs. Once the metrics have been determined, the data students will assist in confirming the caliber, such as determining how the DAU will be calculated. The technical team then implements these definitions in the form of code, which is verified by the data peers and fed back to the business classmates.

InfoQ: Data quality determines the effectiveness of data application, what factors may affect data quality in the credit card business scenario of China Guangfa Bank? How to circumvent it?

Xu Xiaolei: Share some common data problems and our experience in dealing with them.

First of all, the diversity of data sources leads to the inconsistency of formats and standards, which is a problem with data sources. We typically address this through data governance and norms.

Secondly, errors in data entry are also a common problem. Not all data is automatically generated by the system, and there are many offline manual data entry. Human error is inevitable, and we reduce it by developing restrictive features, such as validating data validity on the entry platform.

Third, there is the problem of redundancy and duplication of data. Sometimes, due to network or data source layer issues, the same piece of data may be collected twice. We handle this situation through data governance and norms, which define criteria for the uniqueness of data in the ETL process.

Fourth, there is the problem of incomplete data. Problems with the front-end metadata system result in data being lost during transmission and acquisition. Sometimes the field is blank or the data content is incomplete. In this regard, we will deal with it after the fact by smoothing back and forth to avoid similar problems in the future.

Fifth, the timeliness of data is also an important issue, and sometimes the data is not up to date. Taking banks as an example, we may see that the user's credit data is half a year ago. If you use credit data from half a year ago to issue a credit card or apply for installments, there may be an error because the user's situation may have changed in half a year and we didn't know about it in time. This problem is a very important flaw in data processing.

In the past, when we worked in Internet companies, we always thought that data was highly time-sensitive. But actually in banking institutions, the timeliness of the data can be t + 3 to t + 4, that is, the data that we see today is actually the data of the previous three to four days. Yesterday's and the day before yesterday's data has not been aggregated, stabilized, or converged and is still being calculated, so it cannot be used. But too often, data applicators tend to have a habit of believing that the metrics they see today are accurate. However, you may find that the indicator has changed again the next day.

Finally, there is the issue of data security and privacy. As a bank, we take data security and privacy very seriously. We welcome the entry of external data, but we must not allow the bank's data to be leaked. I'd also love to know how the bank's customers are doing on the global internet. To that end, in 2019, we partnered with a data company to do federated modeling, and after the data was matched, the company directly typed their tags into our system, and we used user matching to refine the user personas and develop subsequent targeted strategies. In the end, it turned out that a lot of the data was inaccurate because we couldn't verify its accuracy, and we could only say that it corresponded to the user labels that were matched by the external company.

InfoQ: If dirty data is unavoidable, what are the ideas and methods used by the credit card center of China Guangfa Bank in terms of data quality monitoring?

Xiaolei Xu: When discussing the handling of dirty data, we first need to clarify what dirty data is and how acceptable it is. Dirty data generally refers to erroneous or incomplete data that occurs during data transmission and analysis. For example, data may be lost or become inaccurate during transmission for various reasons. Defining dirty data and determining an acceptable percentage (e.g., no more than 0.1%) is critical and requires sufficient discussion and consensus within the team.

In our IT systems, dirty data is not caused by manual operations, but naturally during the transmission and processing of data. Although modern information systems usually have strong data normative governance, and a lot of verification and repair work will be carried out in the process of data extraction, dirty data may still exist. Controlling the amount of dirty data is a core KPI in our data governance team. If there is too much dirty data, the upstream system will not be able to use it effectively.

Over the years, we have found that the proportion of dirty data in the dataset has been very low, usually around 0.1%, which can be considered negligible in our work. This percentage of dirty data does not affect the results of data analysis. It can be compared like this: if a person is two meters tall, he will stand very conspicuous in an open place; But if he stands in a crowd, especially in a crowd of 1 million people, his height is no longer so prominent, and it does not affect the average height calculation of the entire population. Similarly, even if there is a small amount of dirty data, as long as the proportion is controlled within an acceptable range, it will not have a significant impact on the overall data analysis.

InfoQ: In some industries, there may not be a solid information foundation and a sound data governance system like the financial industry. This may lead to a lot of uneven quality data in these industries, that is, dirty data, in the face of such a situation, how to distinguish between normal data fluctuations and abnormal data?

Xu Xiaolei: In our work, we often use several effective methods to determine whether data fluctuations are normal or abnormal.

The first method is the 3:3 rule, which is a simple but not always accurate trick. When data changes, we compare it to quarter-on-quarter, year-over-year, and targets to see if volatility is normal. This method is very common, but its limitation is that it may not catch all the anomalies.

The second way is to use descriptive statistics and quadrant plots in Excel. By creating a quadrant diagram, we can identify outliers, or outliers, in the graph. It's an intuitive and easy-to-implement approach that helps us quickly spot anomalies in our data.

The third approach is to build a model, such as a simple linear regression or a more complex decision-making model. With models, we can analyze the data more systematically and identify possible anomalous patterns.

The most common approach we use is to use quadrant charts to quickly identify data anomalies. Quadrant charts can help us quickly identify anomalies from a data perspective, and importantly, anomalies in data do not necessarily mean anomalies in the business. For example, on a big sale day like 11.11, the unusually high volume of transactions is expected, and the business team actually wants the higher the number, the better. Therefore, we need to understand and evaluate data anomalies in the context of the business.

How data teams relate to the business,

Efficient collaboration between technical and other departments?

InfoQ: How can business and data departments better understand each other when they have different discourses?

Xu Xiaolei: First of all, how to use data to understand business. We usually start with marketing and operational strategy. For example, when we run a campaign during the Chinese New Year, we might look at the conversion rate of similar campaigns in the past, let's say 5.1%. And this year we might want to increase that conversion rate to 6%. In this case, AB experiments are often conducted, with different scenarios for different customers. However, there is a problem with this: while the AB experiment shows a 0.1 percentage point improvement from the baseline, from 5.1% to 5.2%, the 0.1% is not significant to the business, which is a typical conflict that the data proves significant but the business does not recognize.

Another example is that if we are running a short video platform such as Douyin or Kuaishou, the average time per user is a key indicator. We may use a variety of algorithms and strategies to increase the average time spent per user from 90 minutes to 100 minutes. While the data proves to be significant, the business side does not recognize the improvement. In this case, a 10-minute improvement is not significant compared to 90 minutes, and that's the gap between the data and the business. It takes a long time to understand this gap because data teams are often more technical and algorithmic, while the business is more focused on practical results.

Second, making sense of data from the business is also challenging. For example, if you take a conversion rate as an example, any indicator corresponds to the business model, operation strategy, target customer group, and business process. It's important to understand what's behind these to really derive recommendations and direction from the data. For example, there can be a variety of reasons for the change in conversion rate, such as the numerator going up, the denominator going down, the numerator rising faster than the denominator, etc. However, on the business side, you may be more concerned about the problems in the conversion chain, the accuracy of the target customer group, and the effectiveness of the business model and strategy. This understanding requires in-depth run-in with the business.

InfoQ: How can we make collaboration or communication between different roles more efficient and smooth through organizational processes or institutional means?

Xu Xiaolei: It's really not a technical issue, it's a matter of organizational structure and collaboration. In recent years, we have been committed to digital transformation and digital empowerment, with this as the premise to unify our technical team, middle office, front office and channels, and ensure the linkage between various departments.

First of all, as mentioned earlier, we have an important department called the Decision Management Department, which is responsible for managing the data indicators, from the original data processing to the entire process of indicator output. Any addition, deletion, modification and review of this process must go through the standardized process, and the department will finally review the changes.

Second, we clarified the collaboration process, including who is responsible for what, who acts first, and who acts later, all of which are defined and labeled through clear specifications. For example, now that traditional analysis methods can no longer meet complex business needs, we need to build models. In this process, the data team of the business unit is responsible for developing, building, and tuning the model, while the systems or technology department is responsible for deploying and maintaining the model, as well as subsequent optimization.

Third, we have developed communication norms to ensure efficient communication. Our FinTech Committee regularly communicates with business teams and technical leaders to ensure that the production and operation process of data work is smooth.

In addition, we will carry out a number of data-based empowerment activities. For example, data empowerment competitions, data analyst talent development programs, and training camps for data or AI algorithms. Through these activities, business colleagues are included and a strong connection between business and data is established. The so-called efficient linkage means that the data needs to understand the ideas and methods of the business, and the business also needs to understand the strategies and methods of the data. That's why every year we hold a credit card center contest in July and August, where business teams come up with projects that they can do with data to reduce costs and increase efficiency. The judges, including business experts and data experts, will review the recommended projects and select the best solutions from them, and then publicize them for everyone to share and learn.

InfoQ: Data teams tend to be limited in size, how can you efficiently meet the needs of the business for large-scale data usage?

Xu Xiaolei: In the process of satisfying needs, we will face two different types of needs: indicator needs and daily needs.

1. Indicator requirements. For metric requirements, such as the proposal of new indicators, we will conduct in-depth discussions to understand the purpose and business logic behind them. My predecessors told me that "measurable is what can be improved", which means that we need to be clear about how to improve metrics in order to realize the value of data.

2. Daily needs. In terms of day-to-day needs, we face the challenge of insufficient manpower. In order to effectively manage requirements, we need to establish standards and processes. First of all, we set the criteria for making requirements, including the context of the requirements, the specific content and the desired output. Only needs made in accordance with these criteria will be taken into account. Then, we will have many repeated discussions and run-ins with the business department to ensure the accuracy and reasonableness of the requirements. Next, we prioritize the requirements and make them public to all stakeholders so that everyone can reach a common understanding. Finally, we prioritize based on the business's own criteria and needs to ensure that the most urgent needs are prioritized.

InfoQ: Does the data team interface with the business units as a whole, or do they work in groups? Which organization will be a little more efficient?

Xu Xiaolei: Taking our company's APP data team as an example, there are two main types of roles corresponding to different job functions. The first type is the BP type of data analyst, who reports back to the leader of the data team, but is often based on the business team. This model enables data analysts to be closer to the business, better understand business needs, and provide targeted data analysis support. This setup helps improve communication and collaboration between data teams and the business, ensuring that data analytics efforts are tightly aligned with business goals.

The second group of data team members focus on platform building, accounting for about 1/3 of the team, and focus on building and maintaining a data analytics platform, providing tools and methodological support to ensure that data teams can efficiently process and analyze data. Their work is fundamental and critical to the functioning of the entire data team, as the platforms and tools provided have a direct impact on the quality and efficiency of data analysis.

InfoQ: How do you avoid data analytics teams spending most of their time trying to improve numbers or report development?

Xu Xiaolei: This phenomenon cannot be completely avoided. In fact, it's a reasonable and necessary part of data analysis. Someone has to be responsible for data extraction and report maintenance, and not all needs can be met with existing reports.

Positioning the data team and considering the skills of the people is key. If the team is young and doesn't have a deep understanding of the business, it is natural for team members to be more engaged in data extraction and report development in the early stages. Especially when the business team is just starting to go through the digital transformation, the main work of the data team must include data extraction and report development at the stage of starting from scratch.

As data teams mature and grow, a more balanced distribution of work can be taken. For example, a small number of team members (e.g. 2~3 people) can be retained to handle temporary data retrieval needs and maintain existing reports. Assuming you've developed 100 reports, you might only need to add a few new reports each month, or maintain the fields of an existing report. This frees up other team members to work on more exploratory and valuable work.

InfoQ: Who are the people in our decision management department? How are the roles and responsibilities of data personnel scattered across business units divided?

Xu Xiaolei: The decision management department is actually an independent back-office department, which is responsible for managing the entire data process, including unified management indicators, definition of caliber, application of data, data platform, model, strategy, etc. Why do business units have data teams? The reason for this is that data itself is similar to HR, and communication is limited if the business team doesn't have data to back it up. As a result, people on the data team need to work closely with the business to develop business awareness.

There are some divisions and differences in the roles of data teams because business needs are different. In the case of our current team, there are three broad categories of roles. The first type is the data product manager, who is responsible for managing the data product; The second category is data analysts, including junior, intermediate and senior level, who are responsible for data analysis; The third category is the data intelligence team, which is responsible for the development of algorithms and models. There will also be some specific breakdowns under each persona. For example, a data product manager may be responsible for the tracking and labeling management of an online platform, a data analyst may be responsible for everything from simple data fetching to advanced exploratory analysis, and the data intelligence team will develop appropriate models and algorithms based on business needs.

InfoQ: How can data teams verify the validity of data once it meets business needs? Are there some criteria or indicators for judging?

Xu Xiaolei: We take different verification methods according to different types of business needs.

First of all, for specific requirements, the goal is to improve specific business indicators. For example, if the business wants us to help improve a metric by 5% through data analytics, we will track the achievement of that goal based on business outcomes to assess the value of our work.

Second, for needs such as daily data withdrawal or budget request, our value is reflected in our ability to help the business department successfully pass the financial review. If the data analysis provided enables the business to successfully request the budget, this proves the effectiveness of our work.

Third, for needs that do not have a clear purpose, such as exploratory analysis, we will proactively communicate with the business department, and after providing the data, we will not only send an email, but will sit next to them, discuss the results of the data analysis, ask them for their opinions, and discuss whether further work is needed. It's the kind of interaction that brings out the value of the data team.

Once, in an attempt to prove the value of a data team, I took a very interesting approach. I stopped publishing daily, weekly, and monthly reports for a week to see what the reaction would be. As a result, two days later, the heads of many business units began to contact me to ask why they had not received the daily report. This little experiment made me realize that although they may not always express it, they actually rely heavily on the data reports we provide, and it is simple and effective to validate the importance and value of our team.

InfoQ: Now that the business can see the data, what is the value of data analysis?

Xiaolei Xu: If you take our company's structure as an example, you will find that the role of the data team is very obvious. As the organizational division of labor and the socialized division of labor become more and more detailed, business personnel often can only see the business data they are responsible for, and it is difficult to fully understand the situation in other business areas. A data team is like a data middle office that runs through the flow of data across the organization.

For example, China Guangfa Bank has an app called "Discover Wonderful", which is a platform that integrates a variety of businesses, including installment business, mall business, meal ticket business, etc. The business personnel in charge of installment and mall can only see the data of their respective businesses, such as the number of people who handle installments, the transaction amount of the mall, etc. However, as a data team, we're able to see the data at a big scale. I can observe how many installments a user not only makes in a month, but also buys in the marketplace. With such a full-service perspective, I can make a suggestion to the operation manager of the mall: if 100,000 users have recently lent 10,000 yuan in the installment business, can you consider attracting them to the mall to buy high-value items such as mobile phones? Such a recommendation is difficult to make without the data team's holistic perspective.

Similarly, if I see a spike in the number of people buying a high-priced phone in the marketplace, I can feed that information back to the installment business to see if it's possible to attract these users who have already spent a lot of money to pay in installments. This cross-business and full-service perspective collaboration is the core value of data teams. By working in this way, data teams not only help the business better understand user behavior, but also facilitate collaboration between different businesses to create greater value for the business.

InfoQ: How do you get data such as feedback on business strategy, campaign effectiveness, or recommendation effect to flow back into the data system to drive the next business campaign?

Xu Xiaolei: This can be divided into two categories, one is automated and the other is manual.

Automated means that the recommender system is inherently reflowed and closed-loop. Whether it is positive feedback or negative feedback, it will enter the recommendation system and act as a weighted signal for the next recommendation.

The manual aspect is to manually use the effect of this time as an input to affect the formulation of the next strategy. First of all, your data team must have a high status and influence; Second, it requires upper leadership to recognize and implement this data-driven business process; Third, there are a few basic principles that require business teams to think about data and customer segments when developing strategies.

Data teams in the AIGC era

How to build and plan?

InfoQ: How is the data permission system of China Guangfa Bank's credit card center divided?

Xu Xiaolei: It is mainly divided according to different products, because data permissions need to be combined with specific data products. Typically, the most common data permissions revolve around a BI platform or a self-service analytics platform. In our company, the data permission system is not complicated, and it is generally determined based on departments, ranks, and roles. However, it is meaningless to simply set up such a data permission system, because it cannot achieve the goals of digital transformation and data empowerment. Therefore, we have taken a traditional approach to building a data system, but it has also been combined with a data talent development plan.

For example, our current norm is this: in a data all-hands system, each department will have a seed user of the data, who has the most comprehensive permissions to manage and maintain all the metrics and data used by the department. And then there are the general users, which are basically junior and senior analysts, who have different levels of permissions. Through this division, we are able to better manage data permissions and ensure the safe and effective use of data.

InfoQ: Can you expand on the Data Talent Development Program?

Xiaolei Xu: The data analyst certification system and training program are part of this. First, we have established a certification system for junior, intermediate, and senior data analysts.

The Junior Data Analyst Certification is mainly conducted through the online exam, once a month, and candidates need to pass the primary certification before they can register for the Intermediate Data Analyst Exam;

For the intermediate data analyst certification, you need to pass the online question bank exam, which involves statistics, business knowledge and other aspects, and the exam needs to be conducted in front of the computer;

In addition, we divide senior data analysts into two branches: modeling analysts and business analysts. A modeling analyst is responsible for the algorithmic side of things, while a business analyst is focused on the business side of things. The selection process for a Senior Data Analyst includes not only a written exam, but also an important interview session to screen out the best talent.

Junior certification allows data analysts to view data and reports on our self-service analytics platform and BI platform, but intermediate and advanced certifications are required for more in-depth work, such as creating their own dashboards or tables. Passes of the Advanced Certification gain additional access to perform exploratory analysis, write models, and solve complex business needs.

In addition, we have also incorporated Geekbang's corporate training product, Geek Time, into our training program to build the knowledge and competence system of data analysts from T1 to T5. Our bank's data analysts are not exactly full-time data people, and many of the people who take the data analyst exam are business people. I think that's what is most valuable.

InfoQ: What are the core competencies of a good data scientist/data analyst from a practitioner's perspective? How to nurture and improve?

Xu Xiaolei: As the head of the data department of an enterprise, after interviewing and observing hundreds of data analysts, I concluded that the competency requirements of data analysts are divided into levels, levels, and categories. Here's my overview of what a data analyst should have at different levels of experience:

  • Data Analyst with 0-3 years of experience: For those who are new to the industry, what we value most is technical ability, that is, whether the technical foundation is solid. This includes proficiency in commonly used data analysis tools such as SQL, Python, Excel, etc. Mastery means being able to quickly understand the data structure of an enterprise and quickly implement complex business requirements.
  • Data analyst with 3-5 or 3-7 years of experience: At this level, where the technical skills are relatively mature, we begin to look at whether the analyst truly understands the business implications behind the data. During the interview, I ask them about a metric in their work (e.g., DAU) and expect them to explain in depth what this metric means and the business logic behind it.
  • Data analyst with 5-7 or 5-10 years of experience: At this level, we focus on the analyst's mastery of the complex business. Unlike the initial stage of postmortem, senior analysts need to have the ability to predict future trends and risks of the business based on data, and help enterprises make more targeted decisions.
  • Data Analyst with more than 10 years of experience: For analysts at this level, in addition to technical skills and business understanding, we value their way of thinking and strategic perspective. They should be able to look at data from a broader perspective, understand its impact on business strategy, and be able to provide valuable insights and recommendations.

InfoQ: How do you become a senior analyst in your job?

Xiaolei Xu: Becoming a senior analyst is not only about improving your skills, but also about changing your thinking and way of thinking. Junior and mid-level data analysts are typically concerned with how to perform tasks correctly, while senior analysts are more focused on choosing the right tasks.

The key in the shift is to shift from focusing on how to do things right to focusing on why you are doing them. Senior analysts think more about where the business is headed, why an analysis is being implemented, and how the results of the analysis affect business goals. Senior analysts are no longer just performers, and they no longer need to work with data and runs themselves. Their thinking is more focused on the business level, thinking about business trends, goals, and measurement indicators, and incorporating these considerations into the analysis.

I've been working for 17 years. Now, I'm more focused on why I'm doing an analysis than how I'm doing it. It's a transformation you can go through when you become a senior analyst. Because at the senior data analyst level, a lot of the front-line execution work no longer needs to be handled by you. You'll think more about where the business is going, such as how the business is going this year, what you need to do next year, why you're doing so much, and why you're going to be looking at specific metrics. The focus is on aligning these metrics with business goals, rather than just focusing on how to get things done.

InfoQ: How do data teams plan in the AIGC era?

Xiaolei Xu: In the era of AIGC or large models, the planning of data teams will be different, mainly reflected in improving work efficiency, value, and depth. AIGC is seen as an effective tool that helps data teams do their jobs better.

In the past, data teams relied heavily on a deep understanding of the business and years of experience. This kind of experience-based analysis has its advantages, such as the ability to quickly synchronize with business needs, but it also has limitations, especially when it can lead teams into the inertial thinking of the business. Data analysts can be overconfident that things "should" be the way they are, when in reality this judgment is often wrong. The first advantage of AIGC is that it can help us compensate for the incompleteness of knowledge and the inertia of thinking.

The second advantage is efficiency. No matter how familiar an individual is with the tools and data environment, data processing and analysis is always a time-consuming process. The human brain has a limited speed of processing information, and AIGC can help us complete the task quickly, we only need to verify and challenge the process and results of AIGC. For example, in tasks such as short text classification, AIGC can assist us in processing insufficient text to improve the accuracy and efficiency of classification.

In the AIGC era, data teams may not be much different in terms of structure, but there will be significant improvements in productivity and depth of analysis. This means that data teams can more effectively leverage advanced technologies like AIGC to take data analytics to the next level.

What will the data system of the future look like?

InfoQ: Big data has been on a high profile for the last 10 years, but are there any more valuable systems that you see beyond real business intelligence and recommender systems?

Xiaolei Xu: First of all, I want to use an analogy to illustrate that 20 years ago we used to say that everyone was a product manager, but is this still the case? The answer is no. Second, if you don't feel the presence of the data, then it's probably fully integrated into your life. You will find that when something is no longer noticeable and has no sound, it becomes an indispensable and subtle part of your work and life. This may seem like a prominent thing, but it's not really integrated into your system.

For example, when I joined China Guangfa Bank, there was only one person in my team, and I was the second. I try to make my presence feel better every day, showing my work by sending daily and weekly reports. Why is this the case? Because at that time, my data work was independent of the business work, and it was a parallel work model. Now, almost all businesses need to be supported by data, and data and business have become a process on the same line. The more data is lacking, the more important it is in the work.

InfoQ: Looking ahead, what are the further plans of the China Guangfa Bank Credit Card Center in terms of data system construction and data asset application practice? What is your personal focus this year?

Xiaolei Xu: First of all, I set a direction for my team, which is to fully integrate with AIGC. We plan to incorporate large models into our work, but this is challenging. One of the biggest challenges is the privatization of large models. Since we are a bank, we can't have a large model deployed inside the bank while also being able to access external data, because that would bring a risk of data breaches. When deploying privately, we may encounter a number of issues such as smart degradation and unavailability. But we will work with the systems team to overcome these challenges.

Second, when AIGC is integrated into my data team, I first try to dispel the panic of the data team. They may be worried about being replaced by AIGC, and I need to change their perspective and take full advantage of AIGC.

Third, we will achieve some breakthroughs at the business level. It is difficult for existing users to find innovation and breakthrough points based on existing analysis methods, but I firmly believe that everything is worth re-analyzing with data. Before re-analyzing, we need to keep an empty cup mentality and ask if the practices of the past still apply, and if not, we need to look for new directions, new strategies, and new approaches. It's our responsibility as decision-makers and business brains to provide answers to the business and leadership.

Event Recommendations

FCon will officially open on August 16 with the theme of "Technology-Driven, Smart Future: Unleashing the Endogenous Power of Digital Finance". If you are interested, please click "Read the original article" to see more details. For other questions, please contact the ticketing students: 13269078023, or scan the QR code above to add the welfare officer of the conference, and you can receive the welfare information package.

Tracking Link: "Link"

Read on