background

Launch time is a core performance metric for apps. Zhihu currently has buried data that can be used as a reference, and data collection does not require manual intervention, but the accuracy of this testing method has been questioned, and there is a certain gap with the real feelings of users. In the early stage of performance optimization, Zhihu uses the screen recording method, that is, manually recording and collecting the mobile phone screen with video recording software, and then using the frame splitting tool to divide the frame statistics, so that the test results are in line with the user's real feelings, but the test cost is extremely high. Therefore, on the basis of the buried point data as the longitudinal standard, the automatic test scheme of screen recording is implemented as an auxiliary reference standard.

Ideas

Since it is to automate the process of manual testing, let's first sort out the idea of manual testing: manually record the screen → use the framing tool to frame the video → manually select the picture of the key node and record the time of this frame in the video → calculate the startup time: end node time – start node time. If you want to automate the entire process, you can put together the automation solutions for each step. How to adjust the images of key nodes to be automatically recognized by machine learning models is an important step related to the accuracy of the results.

The overall idea is shown in Figure 1 and can be divided into four parts.

Train the model: Collect image materials, train the image recognition model, verify the accuracy of the model, supplement or adjust the samples of the scene with insufficient recognition, and finally use the model with high accuracy for recognition.

Screen Recording: Automatically launch apps with scripts and save your phone's picture frame sequences to your PC.

Data processing: Identify picture frames, accurately identify key node images, and calculate the startup time.

Closed-loop process: Data persistence and MR access mechanism are established.

Zhihu starts the automated test of screen recording at the start time

Figure 1 Start-up time automated testing process

Train the model

The ultimate goal of selecting and training a model is to enable the model to accurately identify key node images. We continuously adjust and expand the training samples so that the trained model can more accurately identify the images of key nodes. First, you need to collect samples of key nodes, then select a suitable model and train it with samples, and finally make corrections and adjustments as needed.

Sample collection

We divide the startup process into five key nodes, namely the start of the startup, the appearance of the logo interface, the appearance of advertising creatives, the completion of the loading of the homepage frame, the completion of the loading of the homepage content, the recording of the App startup process, and the manual classification of image frames, respectively, to save the images in the five key node folders. Figure 2 shows a schematic diagram of the startup process.

The name of the node corresponding to the folder name.
Each folder stores 50 screenshots, including screenshots of 2 models, and a total of 250 images on five nodes.

Figure 2 Schematic diagram of the startup process

Model selection and training

The machine learning model chooses the popular TensorFlow because it is simple to operate, does not need to extract features, and has high recognition accuracy after training. For details about the environment setup process, see https://www.tensorflow.org/hub/tutorials/image_retraining

After setting up the environment, start training samples, and the number of images under each node cannot be less than 20, otherwise an error will be reported.

The learned model is stored in /tmp/output_graph.pb, and the identities of the five key nodes are the names of the five folders.

Test the accuracy of the model and make adjustments to the sample

We can't use the trained model right away, but also depend on whether it recognizes accurately, and if not, we need to supplement or adjust the material until it can meet the needs.

When we take a graph and test it, the model tells us what he thinks is the probability of each stage. As shown in Figure 3, the model has the highest probability of identifying it as a loading stage, which is 98.35%, so we can assume that it recognizes this graph as a loading stage. And in fact, this image is really in the loading phase. The higher the probability value, the more confident the model is, and we should try to make the trained model feel confident about which node the image is at.

Figure 3 An example of test model accuracy

Next, we selected 20 new sets of screenshots (100 images in total) to test the trained model. These 20 sets of screenshots cover different resolutions and different models. where MIN is the minimum probability that the model recognizes as this node.

Figure 4 Probability statistics of model recognition at different stages

As we can see from the above figure, the recognition accuracy of most of the pictures is quite high. Only in the "ad-ad" stage, some pictures have a low recognition probability, and even mis-identify as "start-start", guessing that it may be caused by the high similarity between the ad page and the mobile phone desktop.

We set the threshold of the recognition probability of each node in the code, and only when the recognition probability exceeds the threshold will the match be considered successful.

Screen recording

The screen recorder chooses the minicap provided by STF, which can take 30 or 40 screenshots per second on medium phones. Due to the extremely fast transfer rate of minicap, we directly save the binary stream as a picture, and calculate the startup time by the timestamp when the picture is saved. At this time, the transmission time of the last image of the key node is the error value of this calculation method, and the error value does not exceed 200ms, which is acceptable. Screen recording and startup are done at the same time, in this case in a multi-threaded manner.

The recording steps are as follows:

After getting the minicap source code, we need to manually compile it to get the executable file, but the compiled executable file will be different due to the different CPU architecture, but fortunately, the official documentation is more detailed, providing easy way and hard way, if you follow the easy way, connect the real machine to the computer, run the official script to directly compile the version required by the real machine. After installing it on the client, you can start minicap with the adb command, and the -P parameter in the command can set the screen resolution and image compression ratio, and appropriate image compression can reduce the time spent on image transmission and image recognition. The following is the command to start minicap, add -t to check the execution result, and OK is the start successful.

adb shell LD_LIBRARY_PATH=/data/local/tmp/minicap-devel /data/local/tmp/minicap-devel/minicap -P 1080x2340@1080x2340/0 -t

After minicap is started, map the TCP server port of the device to port 1717 of the machine, and enter "nc localhost 1717" in the terminal to see the binary data stream transmitted by minicap to the PC.

adb forward tcp:1717 localabstract:minicap
nc localhost 1717

Create a TCP client locally, listen to port 1717, save the transmitted binary stream as a picture, and get a screenshot of the startup process.

data processing

The app starts using the adb command, without the effect of deepening the color of the icon after pressing the icon with a finger, and the startup node calculation can only be calculated according to the process of the first page being continuously enlarged and appearing on the desktop, and this process also ignores the time of the real user's finger pressing the hardware processing, and the timestamp when the adb command is delivered is used as the time to start starting.

The entire startup time is calculated based on the timestamp, i.e. the timestamp when the first homepage image is saved - the startup timestamp.

Since the launch process involves advertising logic, it is inevitable that there will be ads. In the image recognition process, if an ad appears in the recognition, the activation time will not be counted.

In order to save time, a new thread will be created every time you start to identify the pictures saved in the background, and multiple threads will be recognized concurrently, and if you start ten times, the whole process will take about 12 minutes. The following figure shows the logs of multiple threads concurrently identifying pictures.

Figure 5 Identifying Process Logs

Even if the variables such as model and system are kept unchanged, the startup time of the same mobile phone to start the same package fluctuates greatly, how to make the test results more reliable, and the tool can effectively feedback the problem, we have been constantly optimizing, and the main optimizations made are as follows:

Reduce the appearance of ads. Since the data of ads in the launch will be invalidated, we do 10 cold launches after the app is installed, so that the probability of ads appearing in subsequent test data will be reduced.
Find the right number of starts. The larger the sample data, the higher the accuracy, but since image recognition is time-consuming, the number of starts should not be too much.
Statistically reduce the error value. Once we have the full boot time, we remove the maximum and minimum values and calculate the average.

Closed-loop process

In order to find out the impact of the code on the startup time as soon as possible, we need to monitor the startup time before the code is merged, so the startup time screen recording automation is connected to CI, and the startup time test will be executed after each MR packet, and if the startup time is higher than the average of historical data, the startup time will be @启动时间负责人 on MR to check the code. In order to facilitate Dev's troubleshooting, we also record the startup time of each startup item during the startup process, so that it is easy to find out which startup item affects the startup time. The overall flow chart for accessing CI automation is shown in the following figure.

Figure 6 Closed-loop flowchart

The following figure shows the front-end page of the test data storage platform. From the page, you can check the startup data of each MR after packaging, which is convenient for troubleshooting.

Figure 7 Data Presentation Diagram

prospect

In general, we use automated screen recording and image pattern recognition to conduct automated testing, and build a full closed-loop process to ensure that the results of startup optimization can be guaranteed for a long time. However, we noticed that the model used in the current test was still trained at the beginning, and the follow-up plan is to save the keyframe images in the sample library after each test, train them regularly, and update the model, so that the recognition can be more accurate.

Author: Yang Yang

Source: https://zhuanlan.zhihu.com/p/70696324

Zhihu starts the automated test of screen recording at the start time

background

Ideas

Train the model

Screen recording

data processing

Closed-loop process

prospect

Read on

Plant Synthetic Biology - Secret Tower AI Artifact Test

The beta for Warhammer 40K: Starship 2 has now been canceled

Vehicle testing has entered the era of unmanned SAIC-GM announced that "RoboTest" is open to friends and shared

Isn't it a bit like a traveler, the new Haval H9 square lamp version test spy photos exposed

It seems to be a simple oil-to-electric model, and the Hongqi H5 pure electric version test spy photos were exposed

It feels like the changes are limited, and the spy photos of the mid-life facelift Genesis GV60 test have been revealed

Simultaneous testing at home and abroad, ZEEKR CX1E overseas test spy photos exposed

Gundam Breaker 4 is about to hold a public network beta

After the rainstorm active safety test, I learned about the gap between Xiaomi cars and Huawei series!

iFLYTEK Xinghuo V4.0 is coming, surpassing GPT-4 Turbo as a whole, ranking first in 8 international mainstream tests

Precise refereeing helps the game, and this Games will hold a field test match immediately!

Delta Action PC Test Qualification Acquisition Step-by-step guide you to obtain Delta Action Test Qualification

Software Testing Learning Notes丨JUnit5 Dynamic Test Rules

Psychological test: Which bonsai plant do you like, and measure how high the light of your wisdom is

Psychological test: Choose a glass of wine to test whether you are subconsciously annoying

Psychological test: quasi-crying! Which cup of coffee do you most want to drink? It is found that there are several people who secretly love you