Going faster in Linux with BPF
I ran into an article in the Linux Weekly newsletter that talked about BPF and got curious what it was. BPF (also known as Berkeley Packet Filtering) is a technology that serves as a “faster” alternative to interacting with kernel-space from user-space. Woah, that’s nuts; I didn’t even know such a thing existed!
BPF was originally designed to make network packet filtering faster. Why would you want network packet filtering? A basic example is a firewall. For example, if a user-space program wanted to ignore all packets to certain ports or from certain IPs, then it would be able to do so in an efficient manner and without having to change the kernel source code. You may also wonder, why is this technology novel or necessary – can’t I just do packet filtering in user-space? Unfortunately, at high throughput, packet filtering in user space is too slow because of the copying from kernel space to user space. Software-defined networking is an example of where this is particularly useful. Routers need to be able to process packets at very high speeds and want to minimize filtering overhead. The BPF technology allows these routers to do all the filtering in kernel space while specifying the filter criteria in user space. Let’s see if we can do something similar – let’s snoop on the NIC to see if we can detect whenever we receive a packet!
First, we’ll write a user-space program that will initialize the BPF function we want to write:
This code loads a BPF source program written in C called packets.c. Within that source file, it looks for the function packetfilter and loads it with BPF.XDP. What’s that? XDP stands for eXpress Data Path, it’s a feature included in some NIC (network interface card) drivers. The driver will pass the RX (received) packets directly to the BPF function without doing any software queuing or memory allocation. This is where we’ll get the packet processing speed up!
Once we’ve registered this function, we attach it to the relevant NIC device. In our case, that’s the “lo” device. This device is known as the loopback device and is used to route traffic from itself, to itself. The addresses localhost and 127.0.0.1 are configured with the “lo” device. Finally, we run b.trace_print() which will print out are bpf_trace_printk calls in the packetfilter function to this program’s stdout.
Next,we need to create the packets.c file and write the packetfilter function.
This is what a simple BPF function looks like. It accepts an xdp_md pointer which contains data about the incoming packet. We run bpf_trace_printk for every packet we receive and return XDP_PASS. This function has other possible return values like XDP_DROP which drops the packet or XDP_TX which sends the packet out to the same NIC it came from. XDP_PASS simply lets the packet pass through to its intended flow had the BPF program not existed.
Let’s run our program now with sudo python packetfilter.py. For a program to send a BPF program to the kernel, it needs to have superuser privileges. Cool, we can actually see packets coming through!
Not only that, but in this function we can also modify the packet’s data. It’s all so powerful! But, with great power comes great responsibility. With BPF, a user space program can specify some code it wants to run. The kernel then uses a JIT (just-in-time) compiler to compile the code and run it! So does this mean that a user-space program can just execute arbitrary code in the kernel? That sounds like a fundamental security flaw that operating system developers plotted against! It turns out it’s not that simple, the kernel has a program called the BPF verifier which verifies the BPF code that the user-space program wants to execute. So...there’s a program that verifies a program? I didn’t know that was possible!
The BPF verifier ensures that BPF programs are safe and won’t crash the entire system. The first thing the BPF verifier checks is that the program is a DAG (directed acyclic graph) and that all instructions jump forwards instead of backwards. By doing this, it can ensure that the program will eventually reach completion. Additionally, loops may be flagged if they depend on information at runtime like packet data. The kernel wouldn't want an infinitely-looping program running! The second part of the verifier examines all conditions of the code by doing a depth-first search. It keeps track of registers and their types and then uses that information to be more strict about the possible operations a BPF program can run.
BPF is pretty sweet, if you’re curious to learn more, definitely check out this series of articles on Oracle’s Linux Blog.