Rüdiger Birkner, USENIX NSDI 2020
Rüdiger Birkner, USENIX NSDI 2018
Timon Gehr, PLDI 2018
Olivier Tilmans, USENIX NSDI 2018
ACM SIGCOMM 2021. Online (August 2021).
Large-scale reconfiguration campaigns tend to be nerve-racking for network operators as they can lead to significant network downtimes, decreased performance, and policy violations. Unfortunately, existing reconfiguration frameworks often fall short inpractice as they either only support a small set of reconfiguration scenarios or simply do not scale.
We address these problems with Snowcap, the first network reconfiguration framework which can synthesize configuration updates that comply with arbitrary hard and soft specifications, and involve arbitrary routing protocols. Our key contributionis an efficient search procedure which leverages counter-examples to efficiently navigate the space of configuration updates. Given a reconfiguration ordering which violates the desired specifications, our algorithm automatically identifies the problematic commands so that it can avoid this particular order in the next iteration.
We fully implemented Snowcap and extensively evaluated its scalability and effectiveness on real-world topologies and typical, large-scale reconfiguration scenarios. Even for large topologies, Snowcap finds a valid reconfiguration ordering with minimal side-effects (i.e., traffic shifts) within a few seconds at most.
USENIX NSDI 2021. Online (April 2021).
Network analysis and verification tools are often a godsend for network operators as they free them from the fear of introducing outages or security breaches. As with any complex software though, these tools can (and often do) have bugs. For the operators, these bugs are not necessarily problematic except if they affect the precision of the model as it applies to their specific network. In that case, the tool output might be wrong: it might fail to detect actual configuration errors and/or report non-existing ones.
In this paper, we present Metha, a framework that systematically tests network analysis and verification tools for bugs in their network models. Metha automatically generates syntactically- and semantically-valid configurations; compares the tool’s output to that of the actual router software; and detects any discrepancy as a bug in the tool’s model. The challenge in testing network analyzers this way is that a bug may occur very rarely and only when a specific set of configuration statements is present. We address this challenge by leveraging grammar-based fuzzing together with combinatorial testing to ensure thorough coverage of the search space and by identifying the minimal set of statements triggering the bug through delta debugging.
We implemented Metha and used it to test three well-known tools. In all of them, we found multiple (new) bugs in their models, most of which were confirmed by the developers.
Samuel Steffen, Timon Gehr, Petar Tsankov, Laurent Vanbever, Martin Vechev
ACM SIGCOMM 2020. New York, USA (August 2020).
Not all important network properties need to be enforced all the time. Often, what matters instead is the fraction of time / probability these properties hold. Computing the probability of a property in a network relying on complex inter-dependent routing protocols is challenging and requires determining all failure scenarios for which the property is violated. Doing so at scale and accurately goes beyond the capabilities of current network analyzers.
In this paper, we introduce NetDice, the first scalable and accurate probabilistic network configuration analyzer supporting BGP, OSPF, ECMP, and static routes. Our key contribution is an inference algorithm to efficiently explore the space of failure scenarios. More specifically, given a network configuration and a property phi, our algorithm automatically identifies a set of links whose failure is provably guaranteed not to change whether phi holds. By pruning these failure scenarios, NetDice manages to accurately approximate P(phi). NetDice supports practical properties and expressive failure models including correlated link failures.
We implement NetDice and evaluate it on realistic configurations. NetDice is practical: it can precisely verify probabilistic properties in few minutes, even in large networks.
USENIX NSDI 2020. Santa Clara, California, USA (February 2020).
Network verification and configuration synthesis are promising approaches to make networks more reliable and secure by enforcing a set of policies. However, these approaches require a formal and precise description of the intended network behavior, imposing a major barrier to their adoption: network operators are not only reluctant to write formal specifications, but often do not even know what these specifications are.
We present Config2Spec, a system that automatically synthesizes a formal specification (a set of policies) of a network given its configuration and a failure model (e.g., up to two link failures). A key technical challenge is to design a synthesis algorithm which can efficiently explore the large space of possible policies. To address this challenge, Config2Spec relies on a careful combination of two well-known methods: data plane analysis and control plane verification.
Experimental results show that Config2Spec scales to mining specifications of large networks (>150 routers).
Timon Gehr, Sasa Misailovic, Petar Tsankov, Laurent Vanbever, Pascal Wiesman, Martin Vechev
PLDI 2018. Philadelphia, Pennsylvania, USA (June 2018).
Network operators often need to ensure that important probabilistic properties are met, such as that the probability of network congestion is below a certain threshold. Ensuring such properties is challenging and requires both a suitable language for probabilistic networks and an automated procedure for answering probabilistic inference queries. We present Bayonet, a novel approach that consists of: (i) a probabilistic network programming language and (ii) a system that performs probabilistic inference on Bayonet programs. The key insight behind Bayonet is to phrase the problem of probabilistic network reasoning as inference in existing probabilistic languages. As a result, Bayonet directly leverages existing probabilistic inference systems and offers a flexible and expressive interface to operators. We present a detailed evaluation of Bayonet on common network scenarios, such as network congestion, reliability of packet delivery, and others. Our results indicate that Bayonet can express such practical scenarios and answer queries for realistic topology sizes (with up to 30 nodes).
USENIX NSDI 2018. Renton, Washington, USA (April 2018).
For an Internet Service Provider (ISP), getting an accurate picture of how its network behaves is challenging. Indeed, given the carried traffic volume and the impossibility to control end-hosts, ISPs often have no other choice but to rely on heavily sampled traffic statistics, which provide them with coarse-grained visibility at a less than ideal time resolution (seconds or minutes). We present Stroboscope, a system that enables fine-grained monitoring of any traffic flow by instructing routers to mirror millisecond-long traffic slices in a programmatic way. Stroboscope takes as input high-level monitoring queries together with a budget and automatically determines: (i) which flows to mirror; (ii) where to place mirroring rules, using fast and provably correct algorithms; and (iii) when to schedule these rules to maximize coverage while meeting the input budget. We implemented Stroboscope, and show that it scales well: it computes schedules for large networks and query sizes in few seconds, and produces a number of mirroring rules well within the limits of current routers. We also show that Stroboscope works on existing routers and is therefore immediately deployable.
USENIX NSDI 2018. Renton, Washington, USA (April 2018).
Today network operators spend a significant amount of time struggling to understand how their network forwards traffic. Even simple questions such as "How is my network handling Google traffic?" often require operators to manually bridge large semantic gaps between low-level forwarding rules distributed across many routers and the corresponding high-level insights. We introduce Net2Text, a system which assists network operators in reasoning about network-wide forwarding behaviors. Out of the raw forwarding state and a query expressed in natural language, Net2Text automatically produces succinct summaries, also in natural language, which efficiently capture network-wide semantics. Our key insight is to pose the problem of summarizing ("captioning") the network forwarding state as an optimization problem that aims to balance coverage, by describing as many paths as possible, and explainability, by maximizing the information provided. As this problem is NP-hard, we also propose an approximation algorithm which generates summaries based on a sample of the forwarding state, with marginal loss of quality. We implemented Net2Text and demonstrated its practicality and scalability. We show that Net2Text generates high-quality interpretable summaries of the entire forwarding state of hundreds of routers with full routing tables, in few seconds only.
ACM HotNets 2016. Atlanta, Georgia, USA (November 2016).
For Internet Service Provider (ISP) operators, getting an accurate picture of how their network behaves is challenging. Given the traffic volumes that their networks carry and the impossibility to control end-hosts, ISP operators are typically forced to randomly sample traffic, and rely on aggregated statistics. This provides coarse-grained visibility, at a time resolution that is far from ideal (seconds or minutes).
In this paper, we present Mille-Feuille, a novel monitoring architecture that provides fine-grained visibility over ISP traffic. Mille-Feuille schedules activation and deactivation of traffic-mirroring rules, that are then provisioned networkwide from a central location, within milliseconds. By doing so, Mille-Feuille combines the scalability of sampling with the visibility and controllability of traffic mirroring. As a result, it supports a set of monitoring primitives, ranging from checking key performance indicators (e.g., one-way delay) for single destinations to estimating traffic matrices in subseconds. Our preliminary measurements on existing routers confirm that Mille-Feuille is viable in practice.