laitimes

How eBPF is Reimagining Observability Engineering (Part 1)

author:Cloud and cloud sentient beings
How eBPF is Reimagining Observability Engineering (Part 1)

eBPF 在可观测性中的应用——对 Groundcover、Odigos、Grafana Beyla、Pixie、Cilium 和 Apache SkyWalking 等领先的可观测性平台中 eBPF 使用情况的回顾

译自 The eBPF Effect,作者 Admin。

eBPF became an overnight success, but it has been around for a long time. As shown in last year's Unlocking the Kernel documentary, the eBPF interpreter was first incorporated into the Linux networking stack back in 2014. In recent years, Isovalent has been at the forefront of bringing the technology to a wider audience – particularly through the development of cutting-edge open-source products such as Cilium and Tetragon.

eBPF is a transformative technology because it allows applications to connect directly to the Linux kernel. This means that eBPF applications can have a clear view of network traffic while having a small footprint and huge scalability. The potential of an observability platform is huge because applications can connect to the kernel without any type of user instrumentation.

How eBPF is Reimagining Observability Engineering (Part 1)

eBPF Overview

In this roundup, we'll take a look at how some of the leading observability platforms are leveraging the power of eBPF in their tools. Strikingly, many of the early adopters of eBPF were new to the observability market. Obviously, newer stacks that don't have an existing codebase to redesign are better suited to adopt this new technology than existing vendors, especially those with large codebases and complex architectures.

advantage

So, we know that eBPF is a powerful and revolutionary technology - but what are the practical advantages of using it in an observability platform? One of the first advantages of eBPF is that it is open-source. It's a building block of an observability tool and doesn't involve any licensing fees. As a result, it significantly lowers the barrier to entry for new entrants in the observability space. It's also worth noting that all of the products listed in this review are OSS.

Second, eBPF (theoretically) eliminates the need to develop a client-side SDK. This can be seen as a win-win for both users and suppliers. Developing an SDK for compatibility with multiple languages, platforms, and versions requires a significant effort and money from the vendor. For end users, the integration process becomes smoother because cloud-native services no longer need to be instrumented. Third, eBPF applications are faster, more scalable, and less resource-intensive than SDK-based solutions.

However, as we know, every technical decision represents some kind of trade-off, and eBPF is no exception. There are some limitations and caveats to keep in mind.

limit

The first is that, currently, eBPF is a Linux-only technology. It's not cross-platform - although the Windows version is in development. Many eBPF solutions are described as "cloud-native" - which often boils down to running on Kubernetes - which in turn obviously means running on Linux hosts. Many eBPF-based systems on the market assume that you are running your service in a K8S cluster. If you're using other platforms, then they might not be a good fit. If you're running on a Windows network, the current generation of eBPF solutions won't work at all.

Similarly, an eBPF solution won't be able to connect to serverless technologies such as Azure Functions or AWS Lambdas because you can't load the solution into the Linux kernel in a serverless environment. The same limitations apply to technologies such as Azure Web Applications or AWS Elastic Beanstalk. While this is by no means an obstacle, it does mean that companies using these technologies will need a solution that supports telemetry acquisition through eBPF as well as through agents or pipelines.

Third, there are currently functional limitations to eBPF observability. eBPF is powerful, but it's not a magic wand. While eBPF is a fantastic enabler, it requires a great deal of specialized engineering skills and knowledge to write robust, high-performance, and highly scalable eBPF programs. Not all eBPF programs are created equal. Some only cover a small subset of languages, while others may have partial or incomplete functionality in capturing log metrics and traces.

Now that you've learned some of the general principles and theories of eBPF, it's time to start understanding how some of the leading observability solutions can leverage its power. In the first part of this article, we'll look at Pixie, the pioneer of eBPF in the observability space. In the second part, we'll investigate some of the other leading products in the market.

Pixie

It would be an oversight if we didn't start this roundup with Pixie - the first tool to take advantage of eBPF in an observability tool as far as we know. Pixie is an open-source project that was contributed to the CNCF by New Relic back in 2021, and in fact, the project is still tightly integrated with New Relic. Like most eBPF-based tools, Pixie sets up eBPF probes to trigger a number of kernel or user-space events.

When Pixie is deployed in a K8S cluster, it deploys eBPF kernel probes (kprobes) that are set to trigger on Linux system calls for networking. Then, when your application makes network-related system calls, such as send() and recv()), Pixie's eBPF probe sniffs the data and sends it to the Pixie Edge Module (PEM). In PEM, data is parsed and stored for querying based on the detected protocol. This is encapsulated in the following chart:

How eBPF is Reimagining Observability Engineering (Part 1)

Chart: eBPF in Pixie

Conceptually, the idea of "hooking up to a kernel process" sounds simple. However, for observability systems, practical applications require considerable technical complexity. The full stack trace doesn't just exist in a neat little box waiting to be collected. In Pixie, recover the stack trace by looking at the instruction pointers for the application on the CPU, and then inspect the stack to find the instruction pointers for all parent functions (frames). There are some complexities in traversing the stack to rebuild the stack trace, but the basic case is as follows. Start with a leaf frame and use the frame pointer to find the next parent frame consecutively. Each stack frame contains a return address instruction pointer that is recorded to build the entire stack trace.

How eBPF is Reimagining Observability Engineering (Part 1)

Traverse the call stack

Dynamic structured logging

Capture metrics and CPU profiling capabilities are probably standard features of observability solutions in most eBPF implementations. One of the features that really makes Pixie stand out is its dynamic logging feature, which is a game-changing feature for debugging applications. Typically, if you find that features in your application aren't working as expected, and you need to add logging to it, you'll need to edit, recompile, and redeploy your code. Dynamic logging is an alpha feature in Pixie that allows users to add logging to a function while it's running. This article shows how to add new functionality to binaries using a simple script. The function is able to capture the parameters and write the output to the table as shown below.

How eBPF is Reimagining Observability Engineering (Part 1)

Dynamic logging in Pixie

This is an excellent example of the power of eBPF. In the second part of this article, we'll continue our summary by looking at the implementation of eBPF in five other major systems:

  • Cilium
  • Groundcover
  • Codes
  • Hosted by Beyla
  • Apache SkyWalking

Stay tuned!

Read on