Large language models, how to empower testing?

On June 27, 2024, CCF TF ushered in the 136th event, with the theme of "Large Language Models, How to Empower Testing?" ”。 The event was curated and presented by the CCF TF Quality Engineering SIG, and invited senior technical experts from industry leaders such as Huawei, Baidu, and Byte to bring in-depth insights and the latest practices on the application of large language models in the testing field. The content shared was wonderful, the discussion was lively, and the feedback from the audience was positive. The event was broadcast live online through the Tencent Meeting platform and CCF video account "China Computer Federation", attracting the participation of many professionals. This article will review the highlights and insights from the event.

CCF TF

The expert reports related to CCF TF activities are included in the CCF Digital Library [TF Album], welcome to press and hold to identify and watch the wonderful sharing. This event report will also be included in the near future, welcome to review!

In the era of large language models, the testing field has shown more innate advantages than other fields in the software industry. The testing industry has a wealth of digital assets, including test strategies, test plans, test cases, automation scripts, massive problem tickets, and execution records. There is a natural correspondence between these assets, for example, in test automation scripts, it is easy to get the correlation between the test step (text description) and the test code. All of these provide a rich source corpus for LLM empowerment of testing.

In addition, the emergence of LLMs has significantly lowered the threshold for the application of AI in the testing field, bringing valuable opportunities for technology upgrading. Although AI-enabled testing has been developed for many years, the knowledge barrier between AI experts and business experts has always been a major challenge. The introduction of LLMs simplifies this process, and in many scenarios, preliminary domain applications can be realized through simple prompts without complex training, such as intelligent generation of test data, prompting and supplementing test points, etc.

In practice, we have also found that although there are many favorable conditions, it is not easy for LLMs to generate systematic business value in the field of testing business. Once you get to a specific business area, you will encounter many challenges, such as the high cost and illusion of SFT, the high-quality search for real intent in the RAG process, the quality of the data corpus, and the contradiction between the expectations of managers and users and the capabilities of LLMs.

Practical Experience and Lessons of Code Generation for Large Model-Assisted Test Automation

Gao Guangda, Chief Test Expert of Huawei Data Storage, shared the topic of "Practical Experience and Lessons of Large Model-Assisted Test Automation Code Generation", and first introduced the original consideration and business value of LLM-assisted test automation code generation as the preferred breakthrough point for LLM implementation. Then the exploration process of the whole project was introduced. The first stage - the old characteristics of the automatic protective net to complete. Through the cleaning of the existing test automation scripts, the test step-test code pair pair is obtained, and the optimization of the internal code model is realized through SFT. In the process, many problems were encountered such as corpus quality standards and checks, distinction of business contexts, etc. Through the obtained large model of test code, better results can be achieved for the test automation code generation of similar features. However, the problem of using SFT is that the cost is high, the training cycle is long, and it takes a long time for a new project to be trained by the corpus submission model team to the automated script that can be used to generate new projects, which cannot meet the needs of automatic and rapid writing of new features. Through analysis, the project team found the scenario of completing the old feature automatic protection net in the actual business, and found the appropriate business application value for the technology at this stage. In order to solve the automation requirements of new features, RAG can be used to realize the minute-level entry of new feature corpus into the knowledge base, and through retrieval enhancement generation, it can support the automatic and rapid writing of new business features. In this process, a series of problems such as non-compliance with model RAG instructions and retrieval accuracy not meeting business requirements were solved.

At present, Huawei's LLM-assisted test automation code generation project has entered the practical stage, covering 60+ products and reaching more than 2,700 users. At the end of the sharing, he introduced some lessons learned in this process: LLM-assisted testing projects require the cooperation of AI experts, business experts, and tool teams, all of which are indispensable; Technology will not be perfect all at once, and we must find the value point of the business in time according to the current situation of technology; Unlike previous deterministic projects, LLM projects need to manage the expectations of business testers and supervisors.

Large language models, how to empower testing? | TF136 review

Technical Exploration and Paradigm of Large Model Testing

Chen Liushan, an expert in ByteDance's engineering efficiency development, introduced the overall planning idea for AI-assisted testing with the title of "Technical Exploration and Paradigm of Large Model Testing": building a full-link intelligent quality assurance system covering end-to-end R&D. For each stage of R&D from requirements to development, testing, deployment, and O&M, plan and develop corresponding LLM-assisted application value scenarios. Build a unified atomic service layer for the common capabilities required by these scenarios, such as requirements understanding and code understanding. In order to build these atomic capabilities, it is necessary to build a model layer to undertake the access of different models, and optimize and evaluate common capabilities based on these models, such as tuning, RAG, Prompt, and automatic evaluation. At the same time, a unified data layer is built, and a unified data lake required for model application is built based on R&D digitalization. With the support of this unified framework, the unified paradigm of LLM application summarized through practice is further described, including model capability optimization, model capability evaluation, model online evaluation, etc. How to optimize the model capability for different types of knowledge is also elaborated. Next, the specific application of the above paradigm is elaborated in detail by taking the application of large model assisted unit test generation as an example. It includes tuning, RAG, Prompt engineering, agent construction ideas, and problems encountered in the evaluation of metric settings, and describes the current deployment method of the application and the business effects. This sharing has cases and summaries, which has good reference value for the development of current LLM empowerment test applications.

Exploration and Practice of Manual Test Case Generation Driven by Large Models

Zhang Kepeng, a senior engineer at Baidu, shared "Exploration and Practice of Manual Test Case Generation Driven by Large Models", which elaborated on the overall driver, goals and ideas of Baidu's AI-enabled test project. Then, the development process of LLM-assisted test case generation scenarios is analyzed in detail. First of all, according to the situation of demand input, it is divided into short demand and long demand. For clear and relatively simple requirements, test cases are directly generated through a generalization of the large model. For relatively complex requirements, the test points are first extracted, and after manual confirmation, the test cases are further generated through the test points. For long requirements with large lengths, the requirements are disassembled through a large model, and the use cases are directly output according to the processing method of the previous short requirements, or the use cases are generated by generating test points. For the imperfect requirements that often exist in the Internet, with the assistance of large models, the requirements that need to be supplemented are intelligently identified, and the test cases are further automatically generated after the requirements are refined. Through the actual case analysis of two different types of typical projects, how to combine the specific characteristics of different businesses to identify value points and implement large model application scenarios in the generation of large model assisted test code is shared. The business changes and efficiency improvements felt from the perspective of QA before and after the implementation of the large model were analyzed. Finally, the observation index system determined for the generation of large model-assisted test cases was shared, as well as the current business effects of the corresponding indicators: it has been implemented in 200+ products, the overall adoption rate is 40%, and some teams using private domain knowledge can reach 60%; Generate use cases can account for up to 50%. Finally, some technical problems that still exist at present, as well as the follow-up improvement directions, including the recognition of multimodal information such as rich text and tables, are shared.

In the interactive session, the participants actively asked questions about the relevant content shared, such as "In the unit test generation, how does the model determine whether the value of the function output is correct?" Give the correct assertion", "If the execution result of the function is related to the interface, how to judge the correctness of the interface change", "The ability improvement required for testers in the implementation of large model empowerment test" and so on. The guests answered questions in detail according to the theme content shared.

Upcoming Events

Instalments	date	SIG	topic	format
TF137	July 6th	Engineer culture	Engineers in the age of AI	Offline (Beijing)
TF140	July 18th	Algorithms & AI SIG	AI for Science	online

About CCF TF

Founded in June 2017, CCF TF Tech Frontier aims to provide a top-level communication platform for engineers, better serve computer professionals in the business world, help professional and technical professionals in the business community develop their careers, achieve normalized cooperation and development by building a platform, and promote technical exchanges between enterprises, academia and enterprises. At present, 12 SIGs (Special Interest Groups) have been established, including knowledge graph, data science, intelligent manufacturing, architecture, security, intelligent equipment and interaction, digital transformation and enterprise architecture, algorithm and AI, intelligent front-end, engineer culture, R&D efficiency, and quality engineering, to provide rich technical front-line content sharing.

Join the CCF

Join CCF members to enjoy more value-added activities and make a good investment in your own technology growth.

Click on the link to learn more about membership benefits:

CCF Individual Membership Benefits CCF Corporate Membership Benefits

Identify or scan the QR code to join

Welcome to pay attention to the official account of CCFTF and CCF Business Headquarters, and the excitement will be opened one after another!

Follow the CCFTF for TF activity information

Pay attention to the CCF business headquarters to book a meeting venue at a discount

CCF Recommended

【Articles】

The 2024 TF event is officially launched! Unlock the annual plan with one click

Large language models, how to empower testing? | TF136 review

Read on