The Networked Systems Group (NSG) is a research group in the Department of Information Technology and Electrical Engineering (D-ITET) at ETH Zürich led by Prof. Laurent Vanbever.
Our research interests are centered around complex network management problems, with the larger goal of making current and future networks (especially the Internet) easier to design, understand and operate. We are currently active in multiple areas including network programmability, data-driven networking, verification, routing, and security. Most of our projects are inherently multidisciplinary and tend to involve recent advances in programming languages, algorithmics, and machine learning.
We are active in several areas, a subset of which include:
Take a look at our research and publications pages for a full list.
Our flagship lecture is Communication Networks offered in the Spring semester. We also offer a (master-evel) seminar in the Spring semester, and a lecture on Advanced Topics in Communication Networks in the Fall semester. Check our courses page for more information.
Learn more about our recent research and teaching activities in our latest activity report (2022, 2021, 2020) and in our 1-page research statement.
Our group is recruiting PhD students. If you are interested, apply here.
2022 was one of our best year ever, on many accounts. Check out our activity report to see what our group has been up to and what is in the tank for us for 2023.
Our paper "Reducing P4 Language's Voluminosity using Higher-Level Constructs" has been accepted at EuroP4 2022! In this paper, we present O4, an extension of P4, that incorporates three higher-level constructs (arrays, loops, and factories) to reduce the voluminosity of P4 code.
Our paper entitled "Learning to Configure Computer Networks with Neural Algorithmic Reasoning" was accepted at NeurIPS 2022! In this paper, we explain how we can approximate routing computations using neural networks. Among others, doing so allows us to efficiently "invert" these computations enabling to automatically synthesize configurations from their intended output. This synthesis problem is known to be hard: actually, our recent ICNP 2022 paper shows that many instances of that problem are NP-hard/NP-complete. Having a away to approximate these computations allows us to "break" the inherent scalability barrier of solving these problems, at the price of accuracy. How to deal with this accuracy loss is amongst the many next questions we want to look at. Stay tuned!
Our group will have two papers at this upcoming ACM HotNets workshop! These two papers will mark our 10th and 11th HotNets papers since 2014.
Stay tuned to learn more about:
1) How we plan to build the next-generation of network traffic generator by leveraging millions of code repositories hosted on code-sharing platforms such as GitHub;
and
2) How we intend to build generalizable machine learning (ML) models for predicting network traffic dynamics using the Transformer architecture.
As usual, you'll find the final version of the papers on our publications page in a couple of weeks.
We're thrilled to welcome two new post-docs in our team: Georgia Fragkouli (from EPFL) and Muoi Tran (from NUS)!
This summer treated us well thus far, with no less than five papers acceptance! Stay tuned to learn more about:
As usual, all the papers will be made available on our publications page.
Our group will be well represented at this upcoming ACM SIGCOMM; we have three accepted papers! Stay tuned to learn more about: pulse-wave DDoS mitigations (ACC-Turbo), fast gray failure detection (FANcY), and buffer management in data centers (ABM). See you in Amsterdam!
Rüdiger Birkner, a fresh NSG PhD graduate, just won the Roger Needham PhD Award! The Roger Needham PhD award is an annual prize awarded to a PhD student from a European University whose thesis is considered to be an exceptional, innovative contribution to knowledge in the systems area.
BibTeX...
Tobias Bühler, Romain Jacob, Ingmar Poese, Laurent Vanbever
USENIX NSDI 2023. Boston, MA, USA (April 2023).
Monitoring where traffic enters and leaves a network is a routine task for network operators. In order to scale with Tbps of traffic, large Internet Service Providers (ISPs) mainly use traffic sampling for such global monitoring. Sampling either provides a sparse view or generates unreasonable overhead. While sampling can be tailored and optimized to specific contexts, this coverage–overhead trade-off is unavoidable.
Rather than optimizing sampling, we propose to “magnify” the sampling coverage by complementing it with mirroring. Magnifier enhances the global network view using a two-step approach: based on sampling data, it first infers traffic ingress and egress points using a heuristic, then it uses mirroring to validate these inferences efficiently. The key idea behind Magnifier is to use negative mirroring rules; i.e., monitor where traffic should not go. We implement Magnifier on commercial routers and demonstrate that it indeed enhances the global network view with negligible traffic overhead. Finally, we observe that monitoring based on our heuristics also allows to detect other events, such as certain failures and DDoS attacks.
Thomas Wirtgen, Tom Rousseaux, Quentin De Coninck, Nicolas Rybowski, Randy Bush, Laurent Vanbever, Axel Legay, Olivier Bonaventure
USENIX NSDI 2023. Boston, MA, USA (April 2023).
Internet Service Providers use routers from multiple ven- dors that support standardized routing protocols. Network operators deploy new services by tuning these protocols. Un- fortunately, while standardization is necessary for interoper- ability, this is a slow process. As a consequence, new features appear very slowly in routing protocols.
We propose a new implementation model for BGP, called xBGP, that enables ISPs to innovate by easily deploying BGP extensions in their multivendor network. We define a vendor- neutral xBGP API which can be supported by any BGP im- plementation and an eBPF Virtual Machine that allows ex- ecuting extension code within these BGP implementations. We demonstrate the feasibility of our approach by extending both FRRouting and BIRD.
We demonstrate seven different use cases showing the ben- efits that network operators can obtain using xBGP programs. We propose a verification toolchain that enables operators to compile and verify the safety properties of xBGP programs before deploying them. Our testbed measurements show that the performance impact of xBGP is reasonable compared to native code.
Luca Beurer-Kellner, Martin Vechev, Laurent Vanbever, Petar Velickovic
NeurIPS 2022. New Orleans, LA, USA (November 2022).
We present a new method for scaling automatic configuration of computer networks. The key idea is to relax the computationally hard search problem of finding a configuration that satisfies a given specification into an approximate objective amenable to learning-based techniques. Based on this idea, we train a neural algorithmic model which learns to generate configurations likely to (fully or partially) satisfy a given specification under existing routing protocols. By relaxing the rigid satisfaction guarantees, our approach (i) enables greater flexibility: it is protocol-agnostic, enables cross-protocol reasoning, and does not depend on hardcoded rules; and (ii) finds configurations for much larger computer networks than previously possible. Our learned synthesizer is up to 490× faster than state-of-the-art SMT-based methods, while producing configurations which on average satisfy more than 92% of the provided requirements.
Tobias Bühler, Roland Schmid, Sandro Lutz, Laurent Vanbever
ACM HotNets 2022. Austin, Texas, USA (November 2022).
In theory, any network operator, developer, or vendor should have access to large amounts of live network traffic for testing their solutions. In practice, though, that is not the case. Network actors instead have to use packet traces or synthetic traffic, which is highly suboptimal: today's generated traffic is unrealistic. We propose a system for generating live application traffic leveraging massive codebases such as GitHub.
Our key observation is that many repositories have now become "orchestrable" thanks to the rise of container technologies. To showcase the practicality of the approach, we iterate through >293k GitHub repositories and manage to capture >74k traces containing meaningful and diverse network traffic. Based on this first success, we outline the design of a system, DYNAMO, which analyzes these traces to select and orchestrate open-source projects to automatically generate live application traffic matching a user's specification.
Alexander Dietmüller, Siddhant Ray, Romain Jacob, Laurent Vanbever
ACM HotNets 2022. Austin, Texas, USA (November 2022).
Generalizing machine learning (ML) models for network traffic dynamics tends to be considered a lost cause. Hence for every new task, we design new models and train them on model-specific datasets closely mimicking the deployment environments. Yet, an ML architecture called Transformer has enabled previously unimaginable generalization in other domains. Nowadays, one can download a model pre-trained on massive datasets and only fine-tune it for a specific task and context with comparatively little time and data. These fine-tuned models are now state-of-the-art for many benchmarks.
We believe this progress could translate to networking and propose a Network Traffic Transformer (NTT), a transformer adapted to learn network dynamics from packet traces. Our initial results are promising: NTT seems able to generalize to new prediction tasks and environments. This study suggests there is still hope for generalization through future research.
Tibor Schneider, Roland Schmid, Laurent Vanbever
IEEE ICNP 2022. Lexington, KY, USA (November 2022).
Configuration Synthesis promises to increase automation in network hardware configuration but is generally assumed to constitute a computationally hard problem. We conduct a formal analysis of the computational complexity of network-wide Configuration Synthesis to establish this claim formally. To that end, we consider Configuration Synthesis as a decision problem, whether or not the selected routing protocol(s) can implement a given set of forwarding properties.
We find the complexity of Configuration Synthesis heavily depends on the combination of the forwarding properties that need to be implemented in the network, as well as the employed routing protocol(s). Our analysis encompasses different forwarding properties that can be encoded as path constraints, and any combination of distributed destination-based hop-by-hop routing protocols. Many of these combinations yield NP-hard Configuration Synthesis problems; in particular, we show that the satisfiability of a set of arbitrary waypoints for any hop-by-hop routing protocol is NP-complete. Other combinations, however, show potential for efficient, scalable Configuration Synthesis.
Albert Gran Alcoz, Martin Strohmeier, Vincent Lenders, Laurent Vanbever
ACM SIGCOMM 2022. Amsterdam, Netherlands (August 2022).
Pulse-wave DDoS attacks are a new type of volumetric attack consisting of short, high-rate traffic pulses. Such attacks target the Achilles' heel of state-of-the-art DDoS defenses: their reaction time. By continuously adapting their attack vectors, pulse-wave attacks manage to render existing defenses ineffective.
In this paper, we leverage programmable switches to build an in-network DDoS defense effective against pulse-wave attacks. To do so, we revisit Aggregate-based Congestion Control (ACC): a mechanism proposed twenty years ago to manage congestion events caused by high-bandwidth traffic aggregates. While ACC proved efficient in inferring and controlling DoS attacks, it cannot keep up with the speed requirements of pulse-wave attacks.
We propose ACC-Turbo, a renewed version of ACC, which infers attack patterns by applying online-clustering techniques in the network and mitigates them by using programmable packet scheduling. By doing so, ACC-Turbo can infer attacks at line rate and in real-time; and rate-limit attack traffic on a per-packet basis.
We fully implement ACC-Turbo in P4 and evaluate it on a wide range of attack scenarios. Our evaluation shows that ACC-Turbo autonomously identifies DDoS attack vectors in an unsupervised manner and rapidly mitigates pulse-wave DDoS attacks. We also show that ACC-Turbo runs on existing hardware (Intel Tofino).
Edgar Costa Molero, Stefano Vissicchio, Laurent Vanbever
ACM SIGCOMM 2022. Amsterdam, Netherlands (August 2022).
Avoiding packet loss is crucial for ISPs. Unfortunately, malfunctioning hardware at ISPs can cause long-lasting packet drops, also known as gray failures, which are undetectable by existing monitoring tools.
In this paper, we describe the design and implementation of FANcY, an ISP-targeted system that detects and localizes gray failures quickly and accurately. FANcY complements previous monitoring approaches, which are mainly tailored for low-delay networks such as data center networks and do not work at ISP scale. We experimentally confirm FANcY’s capability to accurately detect gray failures in seconds, as long as only tiny fractions of traffic experience losses. We also implement FANcY in an Intel Tofino switch, demonstrating how it enables fine-grained fast rerouting.
Vamsi Addanki, Maria Apostolaki, Manya Ghobadi, Stefan Schmid, Laurent Vanbever
ACM SIGCOMM 2022. Amsterdam, Netherlands (August 2022).
Today’s network devices share buffer across queues to avoid drops during transient congestion and absorb bursts. As the buffer-perbandwidth-unit in datacenter decreases, the need for optimal buffer utilization becomes more pressing. Typical devices use a hierarchical packet admission control scheme: First, a Buffer Management (BM) scheme decides the maximum length per queue at the device level and then an Active Queue Management (AQM) scheme decides which packets will be admitted at the queue level. Unfortunately, the lack of cooperation between the two control schemes leads to (i) harmful interference across queues, due to the lack of isolation; (ii) increased queueing delay, due to the obliviousness to the per-queue drain time; and (iii) thus unpredictable burst tolerance. To overcome these limitations, we propose ABM, Active Buffer Management which incorporates insights from both BM and AQM. Concretely, ABM accounts for both total buffer occupancy (typically used by BM) and queue drain time (typically used by AQM). We analytically prove that ABM provides isolation, bounded buffer drain time and achieves predictable burst tolerance without sacrificing throughput. We empirically find that ABM improves the 99th percentile FCT for short flows by up to 94% compared to the state-of-the-art buffer management. We further show that ABM improves the performance of advanced datacenter transport protocols in terms of FCT by up to 76% compared to DCTCP, TIMELY and PowerTCP under bursty workloads even at moderate load conditions.
Roland Meier, Vincent Lenders, Laurent Vanbever
NDSS Symposium 2022. San Diego, CA, USA (April 2022).
Many large organizations operate dedicated wide area networks (WANs) distinct from the Internet to connect their data centers and remote sites through high-throughput links. While encryption generally protects these WANs well against content eavesdropping, they remain vulnerable to traffic analysis attacks that infer visited websites, watched videos or contents of VoIP calls from analysis of the traffic volume, packet sizes or timing information. Existing techniques to obfuscate Internet traffic are not well suited for WANs as they are either highly inefficient or require modifications to the communication protocols used by end hosts.
This paper presents ditto, a traffic obfuscation system adapted to the requirements of WANs: achieving high-throughput traffic obfuscation at line rate without modifications of end hosts. ditto adds padding to packets and introduces chaff packets to make the resulting obfuscated traffic independent of production traffic with respect to packet sizes, timing and traffic volume.
We evaluate a full implementation of ditto running on programmable switches in the network data plane. Our results show that ditto runs at 100 Gbps line rate and performs with negligible performance overhead up to a realistic traffic load of 70 Gbps per WAN link.
Tibor Schneider, Rüdiger Birkner, Laurent Vanbever
ACM SIGCOMM 2021. Online (August 2021).
Large-scale reconfiguration campaigns tend to be nerve-racking for network operators as they can lead to significant network downtimes, decreased performance, and policy violations. Unfortunately, existing reconfiguration frameworks often fall short inpractice as they either only support a small set of reconfiguration scenarios or simply do not scale.
We address these problems with Snowcap, the first network reconfiguration framework which can synthesize configuration updates that comply with arbitrary hard and soft specifications, and involve arbitrary routing protocols. Our key contributionis an efficient search procedure which leverages counter-examples to efficiently navigate the space of configuration updates. Given a reconfiguration ordering which violates the desired specifications, our algorithm automatically identifies the problematic commands so that it can avoid this particular order in the next iteration.
We fully implemented Snowcap and extensively evaluated its scalability and effectiveness on real-world topologies and typical, large-scale reconfiguration scenarios. Even for large topologies, Snowcap finds a valid reconfiguration ordering with minimal side-effects (i.e., traffic shifts) within a few seconds at most.
Rüdiger Birkner *, Tobias Brodmann *, Petar Tsankov, Laurent Vanbever, Martin Vechev
USENIX NSDI 2021. Online (April 2021).
Network analysis and verification tools are often a godsend for network operators as they free them from the fear of introducing outages or security breaches. As with any complex software though, these tools can (and often do) have bugs. For the operators, these bugs are not necessarily problematic except if they affect the precision of the model as it applies to their specific network. In that case, the tool output might be wrong: it might fail to detect actual configuration errors and/or report non-existing ones.
In this paper, we present Metha, a framework that systematically tests network analysis and verification tools for bugs in their network models. Metha automatically generates syntactically- and semantically-valid configurations; compares the tool’s output to that of the actual router software; and detects any discrepancy as a bug in the tool’s model. The challenge in testing network analyzers this way is that a bug may occur very rarely and only when a specific set of configuration statements is present. We address this challenge by leveraging grammar-based fuzzing together with combinatorial testing to ensure thorough coverage of the search space and by identifying the minimal set of statements triggering the bug through delta debugging.
We implemented Metha and used it to test three well-known tools. In all of them, we found multiple (new) bugs in their models, most of which were confirmed by the developers.
Thomas Wirtgen, Quentin De Coninck, Randy Bush, Laurent Vanbever, Olivier Bonaventure
ACM HotNets 2020. Chicago, Illinois, USA (November 2020).
Thanks to the standardization of routing protocols such as BGP, OSPF or IS-IS, Internet Service Providers (ISP) and enterprise networks can deploy routers from various vendors. This prevents them from vendor-lockin problems. Unfortunately, this also slows innovation since any new feature must be standardized and implemented by all vendors before being deployed.
We propose a paradigm shift that enables network operators to program the routing protocols used in their networks. We demonstrate the feasibility of this approach with xBGP. xBGP is a vendor neutral API that exposes the key data structures and functions of any BGP implementation. Each xBGP compliant implementation includes an eBPF virtual machine that executes the operator supplied programs. We extend FRRouting and BIRD to support this new paradigm and demonstrate the flexibility of xBGP with four different use cases. Finally, we discuss how xBGP could affect future research on future routing protocols.
Patrick Wintermeyer, Maria Apostolaki, Alexander Dietmüller, Laurent Vanbever
ACM HotNets 2020. Chicago, Illinois, USA (November 2020).
Programmable devices allow the operator to specify the data-plane behavior of a network device in a high-level language such as P4. The compiler then maps the P4 program to the hardware after applying a set of optimizations to minimize resource utilization. Yet, the lack of context restricts the compiler to conservatively account for all possible inputs -- including unrealistic or infrequent ones -- leading to sub-optimal use of the resources or even compilation failures. To address this inefficiency, we propose that the compiler leverages insights from actual traffic traces, effectively unlocking a broader spectrum of possible optimizations.
We present a system working alongside the compiler that uses traffic-awareness to reduce the allocated resources of a P4 program by: (i) removing dependencies that do not manifest; (ii) adjusting table and register sizes to reduce the pipeline length; and (iii) offloading parts of the program that are rarely used to the controller. Our prototype implementation on the Tofino switch automatically profiles the P4 program, detects opportunities and performs optimizations to improve the pipeline efficiency.
Our work showcases the potential benefit of applying profiling techniques used to compile general-purpose languages to compiling P4 programs.
Samuel Steffen, Timon Gehr, Petar Tsankov, Laurent Vanbever, Martin Vechev
ACM SIGCOMM 2020. New York, USA (August 2020).
Not all important network properties need to be enforced all the time. Often, what matters instead is the fraction of time / probability these properties hold. Computing the probability of a property in a network relying on complex inter-dependent routing protocols is challenging and requires determining all failure scenarios for which the property is violated. Doing so at scale and accurately goes beyond the capabilities of current network analyzers.
In this paper, we introduce NetDice, the first scalable and accurate probabilistic network configuration analyzer supporting BGP, OSPF, ECMP, and static routes. Our key contribution is an inference algorithm to efficiently explore the space of failure scenarios. More specifically, given a network configuration and a property phi, our algorithm automatically identifies a set of links whose failure is provably guaranteed not to change whether phi holds. By pruning these failure scenarios, NetDice manages to accurately approximate P(phi). NetDice supports practical properties and expressive failure models including correlated link failures.
We implement NetDice and evaluate it on realistic configurations. NetDice is practical: it can precisely verify probabilistic properties in few minutes, even in large networks.