B2B Engineering Insights & Architectural Teardowns

Mid-path Network Analysis through A/B Route Comparison

Mid-path analysis reveals hidden routing and interconnection issues that are typically masked in traditional network measurements.

The problem manifests not at the access level, but deeper—mid-path, where autonomous systems and interconnection intersect. Traditional measurement tools attempt to smooth out variations, considering them noise. As a result, degradation (latency, throughput) is attributed to the edge or user environment. This is particularly noticeable when different routes to “similarly close” servers yield different results, but such discrepancies are rarely isolated correctly.

The key challenge is to separate the influence of the access ISP from the influence of the network between providers. Without this, any conclusions about performance remain partially blind. An additional factor is daily load fluctuations and uneven testing, which distort statistics. In such conditions, mid-path issues are either not recorded or misinterpreted.

The solution is built on controlled A/B comparisons, using data from Measurement Lab (M-Lab). The main idea is to compare the performance of users from one access ISP to different geographically close servers. Through uniform server selection, each server receives a statistically equivalent stream of tests. This eliminates biases related to clients, time, and local conditions.

If the metric distributions match, the mid-path can be considered “clean.” If not, the difference becomes a signal. This approach flips the traditional model: what was previously considered noise becomes the primary source of information. Metrics used include throughput and minimum RTT (minRTT). The former indicates bandwidth limitations or traffic shaping, while the latter points to inefficient routing or “hairpinning.”

The trade-off here is evident. The method requires strict control over the distribution of tests and does not work in systems where the server is selected based on RTT or current load. This excludes some popular measurement platforms but results in a cleaner signal.

The implementation is based on processing large volumes of NDT data in BigQuery. Sparse multidimensional histograms are used, where measurements are aggregated across three axes: server, ASN (access ISP), and metric value. This approach allows for the processing of millions of measurements in a single pass.

Two indicators are used to compare distributions:

  • Kolmogorov-Smirnov distance—captures differences in the shape of distributions
  • geometric mean ratio—provides an interpretable difference in percentage

KS distance is sensitive to any deviations but is complex to interpret. The geometric mean is easier to read but may obscure local anomalies. Using both metrics reduces the risk of false conclusions.

Practice shows two typical patterns. Large differences in throughput indicate overloaded interconnection or rate limiting. For example, a narrow “plateau” at a fixed speed value signals per-flow limitation. If the problem were in the aggregated load, the distribution would be more blurred.

Differences in minRTT reveal routing issues. Even without throughput loss, elongated paths increase latency and degrade application responsiveness. In some cases, this is related to the lack of local peering and forced transit through remote nodes.

Results are aggregated into dashboards, where problematic regions and server pairs can be quickly identified. It is important to note that quantitative improvements in the study are not explicitly recorded. The method is more about identifying anomalies than measuring their impact in absolute values.

An additional value is the ability to drill down to specific ISPs and routes. For some measurements, tcp info, traceroute, and even packet capture are available, allowing for deeper analysis. This creates a foundation for integration with other approaches, including BGP analysis and active probing tools.

In an industrial context, this is a pragmatic step. Instead of trying to “clean” data from noise, the system uses it as a signal. This approach better reflects the real behavior of the internet, where interconnection and routing policy are often more important than the state of the edge.

The main conclusion is that mid-path can no longer be considered a black box. With proper experimental setup, its influence becomes measurable and operationally useful.

Information source

/arXiv is the largest open preprint repository (since 1991, under the auspices of Cornell), where researchers quickly post working versions of papers; the materials are publicly accessible but do not undergo full peer review, so results should be considered preliminary and, where possible, checked against updated versions or peer‑reviewed journals. arxiv.org

View the original research PDF

×

🚀 Deploy the Blocks

Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.