Optimizing split learning through Service Function Chaining reduces latency by jointly managing placement and routing.
The problem in distributed AI arises not at the model level, but at the intersection of computation and networking. Multi-hop split learning (MSL/MSI) breaks the model into parts and distributes them across nodes, but performance begins to depend on the routing of smashed data. Unlike the classic client-server split, an additional variable emerges here — the path of data through multiple nodes. Without considering this factor, optimizing the model or placement separately yields unpredictable latency.
The authors use Service Function Chaining (SFC) as a fundamental abstraction. Each sub-model is treated as a network function, and the entire model is viewed as a chain. The architecture is built on top of an augmented network, where “imaginary” nodes are added for the sub-models. The problem is formulated as an ILP: simultaneously optimizing the cut points of the model, the placement of sub-models, and the data transmission routes. The objective function is to minimize end-to-end latency, taking into account computations (FLOPs, batch size) and data transmission (bandwidth, propagation delay). For practical application, a heuristic algorithm based on Block Coordinate Descent (BCD) is proposed, which sequentially optimizes split and routing.
A key insight is that latency is determined by the balance between computation and communication. Increasing the number of segments (K) reduces the computational load on a node but increases the volume of smashed data transmission. In experiments, the optimal value of K is not monotonic: for light tasks, K=2 is advantageous (essentially client-server), while for heavier tasks, K=3 is better. Further increasing K worsens latency due to network costs. It has also been shown that the proposed BCD algorithm achieves nearly the same latency values as ILP but with significantly better scalability. This indicates practical applicability in real systems.
For the industry, this means that split learning cannot be viewed as a purely ML problem. Architectural solutions must consider the network as an equal component. The SFC approach provides a clear model for integration with existing network orchestration practices. Joint optimization (split + placement + routing) proves to be a pragmatic choice, while separate strategies (compute-only or network-only) lead to increased latency. This is especially important for edge-cloud scenarios, where bandwidth and latency constraints are stricter than in centralized systems.
Information source
arXiv is the largest open preprint repository (since 1991, under the auspices of Cornell), where researchers quickly post working versions of papers; the materials are publicly accessible but do not undergo full peer review, so results should be considered preliminary and, where possible, checked against updated versions or peer‑reviewed journals. arxiv.org