Symbolic execution simplifies the analysis of BPF malware and eliminates a bottleneck in reverse engineering. This approach allows for the automatic reconstruction of “magic” packets to trigger backdoors.
The problem does not manifest immediately — until the analysis of BPF malware encounters the complexity of the filters themselves. The classic Berkeley Packet Filter operates as a compact virtual machine within the Linux kernel. It processes traffic at the bytecode level and can obscure behavior from user-space tools. This makes it a convenient platform for hidden backdoors. When a filter contains dozens or hundreds of branching instructions, manual reverse engineering becomes a bottleneck: the analyst is forced to reconstruct trigger conditions step by step, which can take hours or days.
The situation is complicated by the fact that malware uses “magic” packets. They remain inactive until they receive a strictly defined sequence of bytes. Such conditions are often encoded through chains of checks with offsets and values. Even with context, such as through LLM, the task reduces to enumerating possible execution paths. As the number of instructions increases, the complexity grows not linearly, but exponentially.
The solution is built around symbolic execution and formalizing the task as a system of constraints. Instead of executing code with specific input values, the input is represented as variables. The system then tracks which conditions must be met to reach an ACCEPT state. Here, Z3 is used — a theorem prover that can find values satisfying the given constraints. This translates the task from “understand all the code” to “solve the system of conditions.”
The trade-off here is evident. We lose intuitive understanding of the execution flow, but gain in speed and automation. This approach is particularly effective for deterministic systems, where the result entirely depends on the input data — as in the case of BPF filters.
The implementation begins with the decomposition of the BPF program. Each filter is a sequence of instructions with transitions. The system builds a path graph and searches for those that lead to ACCEPT. For this, a state queue is used, which stores:
- the current instruction
- the execution path
- accumulated conditions (boolean constraints)
When encountering a conditional jump, both branches are recorded, and the system tracks how many conditions need to be traversed to reach ACCEPT. This allows finding the shortest path — the minimal set of conditions for the filter to trigger.
Next, symbolic execution is activated. A simplified model of the machine with registers (in classic BPF, there are two) and state is created. Each instruction is translated into an equivalent Z3 expression. For example, arithmetic operations become expressions over bitvectors. Transition conditions turn into logical constraints.
After this, Z3 solves the problem: it finds the byte values of the packet that satisfy all the path constraints. In practice, this yields specific offsets and values. For example:
- IP version check (IPv4 or IPv6)
- protocol check (e.g., UDP)
- destination port check
This data is then converted into real network packets using scapy. As a result, not only is an execution path obtained, but a ready-made “magic” packet is generated that activates the backdoor.
Implementation challenges are mainly related to correctly modeling instructions and states. Although the BPF instruction set is limited, it is crucial to carefully account for side effects, registers, and execution order. It is also important to handle branches correctly to avoid losing valid paths.
The result is a radical reduction in analysis time. Tasks that previously took hours of manual parsing are now solved in seconds. Moreover, the system not only finds the path but also generates a reproducible artifact — a network packet. There are no precise performance metrics in the original data, but the qualitative effect is evident: the elimination of the bottleneck in analysis.
In a broader context, this reflects a trend toward automating reverse engineering through formal methods. Symbolic execution has long been applied in binary analysis, but its application to BPF is a pragmatic extension toward kernel-level logic. Especially in conditions where stealth and minimal footprint make traditional methods less effective.
The approach also opens up the possibility for proactive detection. The generated packets can be used for network scanning and identifying implants that would otherwise remain passive. This shifts analysis from reactive to exploratory.