laitimes

Zhihu Android Client Startup Optimization - Retrofit Agent

author:Flash Gene

background

In order to improve the user experience, the Zhihu mobile platform team has set improving the speed of app launch as a long-term and important OKR, so in the second quarter of this year, we focused on making a series of optimizations for the launch of the client.

Although we are still in the early stages of performance optimization, we have some lessons to share during the optimization process. So, today we're going to share one of the optimization experiences related to Retrofit.

Before you begin

When we do performance optimization, many people may struggle with how to test or quantify the effect of the final optimization. There is actually a lot of knowledge in this, and we usually use the information output of the system as an indicator, such as the well-known GPU rendering histogram as an indicator of whether the UI is stuck. Why do I need to choose the information output from the system? First, it is more convenient to collect information, and we don't need to write code to measure it ourselves, and secondly, we can also collect information from other apps, which is convenient for horizontal comparison with other competing products.

As for the startup speed, we will use the activity startup log printed by the system as an indicator:

107-11 15:09:32.519 1440-1502/? I/ActivityManager: Displayed com.zhihu.android/.app.ui.activity.MainActivity: +1s412ms (total +1s978ms)           

The meaning of log data can be seen in this answer on Stack Overflow, and we mainly take the time from the cold start of the app to the first activity display.

To get back to the point, before this optimization begins, in order to be able to compare it with the final data, we need to measure the launch time of the app before the optimization. By cold-starting the app multiple times and recording the total duration in the log, the average startup time of the app before optimization was 1.905s (data from OnePlus 3T):

Zhihu Android Client Startup Optimization - Retrofit Agent

Log output

Discover & Analyze Issues

We want to find out exactly what code is causing the startup to be slow, and we can't just review the code, because the execution efficiency of a piece of code is not only affected by various internal factors (CPU/IO operation intensive, lock...). ), and may also be affected by other external factors (system resource competition, GC...). ). So the best way to quickly and accurately identify problems is to get the code actually running and then profile the code at runtime.

Profiling is actually a big science, which requires the use of a large number of tools, including the Android system, SDK, and even some interfaces or tools provided by the IDE, and skillfully using these tools to analyze and find performance problems is a necessary skill for performance optimization. For today's Zhihu App launch of Profiling, I will briefly introduce some of the tools:

Method Tracing

Method tracing is to track all the methods that have been called by the app in a certain period of time, which is one of the common ways to measure the performance of code execution, and we can use it to find out how many methods have been called when the app is launched, and how long it has taken to do so.

This feature is provided by the Android system, and we can start and end the tracing by manually calling the android.os.Debug.startMethodTracing() and stopMethodTracing() methods in the code, and then the system will save the results of Tracing to the .trace file on the phone. For more information, please refer to the official documentation.

In addition, in addition to writing code to Trace, we also have more convenient ways. For example, this can also be done via the Method Tracer in the Android Studio Profiler. However, for the cold start of the app, we usually use the built-in Am command of the Android system to do it, because it can accurately start the trace when the app is launched:

# 启动指定 Activity,并同时进行采样跟踪
adb shell am start -n com.zhihu.android/com.zhihu.android.app.ui.activity.MainActivity --start-profiler /data/local/tmp/zhihu-startup.trace --sampling 1000           

When the app is cold started (the first activity has been drawn to the screen), run the following command to manually terminate the trace and pull the .trace file to the current directory of the machine:

# 终止跟踪
adb shell am profile stop
 
# 拉取 .trace 文件到本机当前目录
adb pull /data/local/tmp/zhihu-startup.trace .           

Once you have the .trace file, the next step is to visualize it. You can directly drag the .trace file into Android Studio to open it, but the current version of Android Studio does not do a good job of visualizing and interacting with .trace files, so it is not recommended. Here it is recommended to use the Traceview in Android Device Monitor to open it, please refer to the official documentation for details.

Screenshot after opening with Traceview:

Zhihu Android Client Startup Optimization - Retrofit Agent

Screenshot of Traceview

By default, all tracked methods will be sorted down according to the actual total time elapsed from most to least, and clicking on a method can see all its parent and child methods. By checking from top to bottom, you can see that the initUserInfo() method of the UserInfoInitialization class takes more than 600ms when the Zhihu App is launched:

Zhihu Android Client Startup Optimization - Retrofit Agent

initUserInfo() method

Zhihu Android Client Startup Optimization - Retrofit Agent

159 (0x68f8) method

Let's take a look at the code for this initUserInfo() method:

private void initUserInfo() {
    NetworkUtils.createService(ProfileService.class)
            .getSelf(AppInfo.getAppId())
            // ...
            .subscribe(response -> {
                // ...
            }, Debug::e);
}           

NetworkUtils.createService() is a method that we encapsulate ourselves, and internally calls Retrofit to get the dynamic proxy class of the ProfileService, while getSelf() is a method provided by the dynamic proxy class. As you can see from the traceview, 159 (0x68f8) is a proxy method generated at runtime, so you can tell that the time taken here is caused by calling getSelf().

Keep tracking:

Zhihu Android Client Startup Optimization - Retrofit Agent

responseBodyConverter() 方法

Zhihu Android Client Startup Optimization - Retrofit Agent

_newReader() method

As you can see, in the end, the main time spent was on a few methods of the Jackson library, which we use for Retrofit serialization and deserialization of data (e.g. Response Body).

The next step is to dig deeper into the cause. If you look at Retrofit's code, you can see that the first time you call one of Retrofit's dynamic proxy methods, Retrofit will create a new instance of ServiceMethod to store some data about that proxy method, including a converter to transform the Response Body. When creating a new Converter, an ObjectReader will be generated according to the corresponding Java Model class of the Body deserialization result, and if the corresponding Java Model class is particularly complex, it will take a long time to create a new ObjectReader (there will be a bunch of reflection operations inside). The Java Model returned by our getSelf() method is a class with a lot of fields, so the first call to the proxy method takes a lot of time.

Now that we've found the cause of the problem through Method Tracing, we can use another tool to see what was really going on with the app and system when the problem occurred.

Systrace

Although Method Tracing is a powerful tool for finding time-consuming methods, the execution of Method Tracing will seriously slow down the execution speed of the app, and even if you use sample tracking, the measured results will still be very different from the actual results, which can only be used as a reference. And Method Tracing is powerless to track down locks, GCs, lack of resources, and other factors. Therefore, we can use another tool that is highly recommended by Google, "Systrace", to track the actual runtime of the app. For more information, please refer to the official documentation.

The principle of Systrace is to capture some key information about the system and the app through the atrace tool that comes with the Android system, and then visualize it through the Chrome browser.

Next, we will use Systrace to trace the entire process of the Zhihu App executing the Retrofit proxy method at launch. First of all, we need to do some preparations:

  1. In order to get the result closer to the real thing, we need to enable App Tracing for the Release package, which can be found here.
  2. 给 Retrofit 的 loadServiceMethod() 方法添加 Tracing Section(PS:可以借助 JarFilterPlugin 来修改 Retrofit 的内部代码):
ServiceMethod<?, ?> loadServiceMethod(Method method) {
    Trace.beginSection("Retrofit:" + method.getName());
    // ...
    Trace.endSection();
    return result;
}           

3. 添加 Proguard 规则,保证 method.getName() 获取到正确的方法名:

-keepclassmembernames class * {
    @retrofit2.http.GET *;
    @retrofit2.http.POST *;
    @retrofit2.http.PUT *;
    @retrofit2.http.DELETE *;
}           

After making preparations, use the Gradle command to type out a live standard release package and install it on the phone, and then you can start tracking with Systrace:

systrace -a com.zhihu.android app view res am sched dalvik           

After you start tracking, cold start the app, wait until the first activity is visible, and then click Enter to end tracking. After the trace is finished, Systrace will save the trace results to a trace.html file in the current directory, and after opening it with Chrome:

Zhihu Android Client Startup Optimization - Retrofit Agent

Screenshot of Chrome

Through Chrome, you can intuitively see the various key sections of the app at launch. In addition, we can see the usage of the system CPU for each epoch and the threads executed on each core:

Zhihu Android Client Startup Optimization - Retrofit Agent

CPU usage and threads executed on each core

By selecting the thread status bar above a thread, we can view the running status of the thread in a certain period of time or section:

Zhihu Android Client Startup Optimization - Retrofit Agent

Thread state

As you can see from the above figure, in the section "Retrofit:getSelft", the UI Thread is basically in the Running state, which means that getSelft() performs CPU-intensive operations inside and rarely enters the waiting state. This also shows that the getSelft() method is time-consuming and not due to other reasons (e.g. lock waiting). And it should be noted that the actual time measured here is more than 250 milliseconds, while the measured time of Method Tracing is 650 milliseconds, which shows that the measurement results of Method Tracing are still very incorrect.

Then we can see that the loadServiceMethod() method of Retrofit is called more than once during startup:

Zhihu Android Client Startup Optimization - Retrofit Agent

Search results

And a lot of them are called on the UI thread, but these operations don't need to be on the UI thread, we need to find a way to throw all the loadServiceMethod() calls on other threads.

Work something out

The reason for the time has been identified, and the idea is to throw the loadServiceMethod() call on the non-UI thread. For example, for the getSelf() method, we can use the observable's defer() operator to put it on a non-UI thread and execute it asynchronously:

private void initUserInfo() {
    Observable.defer(() ->
            NetworkUtils.createService(ProfileService.class).getSelf(AppInfo.getAppId())
                    .subscribeOn(Schedulers.io())
    )
            .subscribeOn(Schedulers.single())
            // ...
            .subscribe(response -> {
                // ...
            }, Debug::e);
 
}           

This is a change for a single place, but keep in mind that this is not the only place that will call Retrofit's loadServiceMethod() when starting, and it would be a bit too much to add the defer() operator to everything. So is there a better way?

The answer is quadratic dynamic proxies:

public final class Net {
    public static <T> createService(Class<T> service) {
        // ...
        return createWrapperService(mRetrofit, service);
    }

    private static <T> T createWrapperService(Retrofit retrofit, Class<T> service) {
        return (T) Proxy.newProxyInstance(service.getClassLoader(),
               new Class<?>[]{service}, new InvocationHandler() {
            @Override
            public Object invoke(Object proxy, Method method, @Nullable Object[] args)
                throws Throwable {
                if (method.getReturnType() == Observable.class) {
                    // 如果方法返回值是 Observable 的话,则包一层再返回
                    return Observable.defer(() -> {
                        final T service = getRetrofitService();
                        // 执行真正的 Retrofit 动态代理的方法
                        return ((Observable) getRetrofitMethod(service, method)
                                .invoke(service, args))
                                .subscribeOn(Schedulers.io());
                    })
                            .subscribeOn(Schedulers.single());
                }
                // 返回值不是 Observable 的话不处理
                final T service = getRetrofitService();
                return getRetrofitMethod(service, method).invoke(service, args);
           }
           // ...
        });
    }
}           

The idea is to create another layer of dynamic proxies, and then wrap the calls to the underlying Retrofit proxy methods into a new Observable and return them. The advantage of this change is that it can be done once and for all, so that all the calls do not need to be changed, and the amount of code modification is reduced.

After this optimization, we re-tested the boot time and came up with an optimized boot time of about 400ms. As you can see, this optimization was a success.

epilogue

This is the end of our optimization experience, and you can see that the "Discovery & Problem Analysis" chapter in our entire article takes up the most space, which actually reflects our normal situation when doing performance optimization, most of the time will be spent on Profiling, and when we encounter optimization bottlenecks, we are more likely to spend a lot of time analyzing the problem through various tools.

In fact, as mentioned above, Profiling is a university question, and this sharing only picks out some relatively simple but key methods and tools, and in fact, you will use more knowledge when conducting a profile, and sometimes you even need to write your own tools to assist in locating problems. I hope this article can play a role in throwing bricks and stones, so that you can understand some common ideas when doing startup optimization. In addition, this is just one of our experiences in optimizing the startup speed of the Zhihu Android client, and the Zhihu mobile platform team is still working hard to optimize the startup speed~ I hope to share more experiences with you in the future.

Due to the limited level of my own, if there are any mistakes and omissions, I welcome you to correct them.

作者:nekocode

Source: https://zhuanlan.zhihu.com/p/40097338