The Networked Systems Group (NSG) is a research group in the Department of Information Technology and Electrical Engineering (D-ITET) at ETH Zürich led by Prof. Laurent Vanbever.
Our research interests are centered around complex network management problems, with the larger goal of making current and future networks (especially the Internet) easier to design, understand and operate. We are currently active in multiple areas including network programmability, data-driven networking, verification, routing, and security. Most of our projects are inherently multidisciplinary and tend to involve recent advances in programming languages, algorithmics, and machine learning.
We are active in several areas, a subset of which include:
Take a look at our research and publications pages for a full list.
Our flagship lecture is Communication Networks offered in the Spring semester. We also offer a (master-evel) seminar in the Spring semester, and a lecture on Advanced Topics in Communication Networks in the Fall semester. Check our courses page for more information.
Learn more about our recent research and teaching activities in our latest activity report (2021, 2020) and in our 1-page research statement.
Our group will be well represented at this upcoming ACM SIGCOMM; we have three accepted papers! Stay tuned to learn more about: pulse-wave DDoS mitigations (ACC-Turbo), fast gray failure detection (FANcY), and buffer management in data centers (ABM). See you in Amsterdam!
Rüdiger Birkner, a fresh NSG PhD graduate, just won the Roger Needham PhD Award! The Roger Needham PhD award is an annual prize awarded to a PhD student from a European University whose thesis is considered to be an exceptional, innovative contribution to knowledge in the systems area.
2021, wrapped! Check out our activity report to see what our group has been up to in terms of research and teaching activities. Spoiler alert: 2021 was busy, but fulfilling!
Curious about what happened in the Internet in 2021? Check out this print depicting 1 year worth of BGP routing announcements observed on 256 Internet routers. "Behind the scenes" information available here.
Laurent gave the introductory keynote at ACM CoNEXT 2021. The talk is entitled "The three tales of (correct) network operations" and summarizes our 10-year (and counting!) journey exploring the operational aspects of routing algorithms, focusing on three aspects in particular: verification, synthesis, and reconfiguration. You can find the slides here.
Tibor Schneider is one of this year's recipients of the ABB Research Award (ABB Forschungspreis) for his Master's Thesis entitled "Snowcap: Synthesizing Network-Wide Configuration Updates". He received the award at this year's "ETH Tag" on 20 November.
Laurent was selected for the 2021 ACM SIGCOMM "Rising Star" Award. The award is in recognition for outstanding research contributions, early in his career, toward improving the flexibility, correctness, and security of Internet routing. More information about the award and past recipients can be found online.
Maria Apostolaki (2nd PhD student to graduate from NSG) will join Princeton's Electrical and Computer Engineering Department (ECE) as a tenure-track assistant professor in the Fall of 2022! Huge congrats to her!!
BibTeX...
Albert Gran Alcoz, Martin Strohmeier, Vincent Lenders, Laurent Vanbever
ACM SIGCOMM 2022. Amsterdam, Netherlands (August 2022).
Pulse-wave DDoS attacks are a new type of volumetric attack consisting of short, high-rate traffic pulses. Such attacks target the Achilles' heel of state-of-the-art DDoS defenses: their reaction time. By continuously adapting their attack vectors, pulse-wave attacks manage to render existing defenses ineffective.
In this paper, we leverage programmable switches to build an in-network DDoS defense effective against pulse-wave attacks. To do so, we revisit Aggregate-based Congestion Control (ACC): a mechanism proposed twenty years ago to manage congestion events caused by high-bandwidth traffic aggregates. While ACC proved efficient in inferring and controlling DoS attacks, it cannot keep up with the speed requirements of pulse-wave attacks.
We propose ACC-Turbo, a renewed version of ACC, which infers attack patterns by applying online-clustering techniques in the network and mitigates them by using programmable packet scheduling. By doing so, ACC-Turbo can infer attacks at line rate and in real-time; and rate-limit attack traffic on a per-packet basis.
We fully implement ACC-Turbo in P4 and evaluate it on a wide range of attack scenarios. Our evaluation shows that ACC-Turbo autonomously identifies DDoS attack vectors in an unsupervised manner and rapidly mitigates pulse-wave DDoS attacks. We also show that ACC-Turbo runs on existing hardware (Intel Tofino).
Edgar Costa Molero, Stefano Vissicchio, Laurent Vanbever
ACM SIGCOMM 2022. Amsterdam, Netherlands (August 2022).
Vamsi Addanki, Maria Apostolaki, Manya Ghobadi, Stefan Schmid, Laurent Vanbever
ACM SIGCOMM 2022. Amsterdam, Netherlands (August 2022).
Roland Meier, Vincent Lenders, Laurent Vanbever
NDSS Symposium 2022. San Diego, CA, USA (April 2022).
Many large organizations operate dedicated wide area networks (WANs) distinct from the Internet to connect their data centers and remote sites through high-throughput links. While encryption generally protects these WANs well against content eavesdropping, they remain vulnerable to traffic analysis attacks that infer visited websites, watched videos or contents of VoIP calls from analysis of the traffic volume, packet sizes or timing information. Existing techniques to obfuscate Internet traffic are not well suited for WANs as they are either highly inefficient or require modifications to the communication protocols used by end hosts.
This paper presents ditto, a traffic obfuscation system adapted to the requirements of WANs: achieving high-throughput traffic obfuscation at line rate without modifications of end hosts. ditto adds padding to packets and introduces chaff packets to make the resulting obfuscated traffic independent of production traffic with respect to packet sizes, timing and traffic volume.
We evaluate a full implementation of ditto running on programmable switches in the network data plane. Our results show that ditto runs at 100 Gbps line rate and performs with negligible performance overhead up to a realistic traffic load of 70 Gbps per WAN link.
Tibor Schneider, Rüdiger Birkner, Laurent Vanbever
ACM SIGCOMM 2021. Online (August 2021).
Large-scale reconfiguration campaigns tend to be nerve-racking for network operators as they can lead to significant network downtimes, decreased performance, and policy violations. Unfortunately, existing reconfiguration frameworks often fall short inpractice as they either only support a small set of reconfiguration scenarios or simply do not scale.
We address these problems with Snowcap, the first network reconfiguration framework which can synthesize configuration updates that comply with arbitrary hard and soft specifications, and involve arbitrary routing protocols. Our key contributionis an efficient search procedure which leverages counter-examples to efficiently navigate the space of configuration updates. Given a reconfiguration ordering which violates the desired specifications, our algorithm automatically identifies the problematic commands so that it can avoid this particular order in the next iteration.
We fully implemented Snowcap and extensively evaluated its scalability and effectiveness on real-world topologies and typical, large-scale reconfiguration scenarios. Even for large topologies, Snowcap finds a valid reconfiguration ordering with minimal side-effects (i.e., traffic shifts) within a few seconds at most.
Rüdiger Birkner *, Tobias Brodmann *, Petar Tsankov, Laurent Vanbever, Martin Vechev
USENIX NSDI 2021. Online (April 2021).
Network analysis and verification tools are often a godsend for network operators as they free them from the fear of introducing outages or security breaches. As with any complex software though, these tools can (and often do) have bugs. For the operators, these bugs are not necessarily problematic except if they affect the precision of the model as it applies to their specific network. In that case, the tool output might be wrong: it might fail to detect actual configuration errors and/or report non-existing ones.
In this paper, we present Metha, a framework that systematically tests network analysis and verification tools for bugs in their network models. Metha automatically generates syntactically- and semantically-valid configurations; compares the tool’s output to that of the actual router software; and detects any discrepancy as a bug in the tool’s model. The challenge in testing network analyzers this way is that a bug may occur very rarely and only when a specific set of configuration statements is present. We address this challenge by leveraging grammar-based fuzzing together with combinatorial testing to ensure thorough coverage of the search space and by identifying the minimal set of statements triggering the bug through delta debugging.
We implemented Metha and used it to test three well-known tools. In all of them, we found multiple (new) bugs in their models, most of which were confirmed by the developers.
Thomas Wirtgen, Quentin De Coninck, Randy Bush, Laurent Vanbever, Olivier Bonaventure
ACM HotNets 2020. Chicago, Illinois, USA (November 2020).
Thanks to the standardization of routing protocols such as BGP, OSPF or IS-IS, Internet Service Providers (ISP) and enterprise networks can deploy routers from various vendors. This prevents them from vendor-lockin problems. Unfortunately, this also slows innovation since any new feature must be standardized and implemented by all vendors before being deployed.
We propose a paradigm shift that enables network operators to program the routing protocols used in their networks. We demonstrate the feasibility of this approach with xBGP. xBGP is a vendor neutral API that exposes the key data structures and functions of any BGP implementation. Each xBGP compliant implementation includes an eBPF virtual machine that executes the operator supplied programs. We extend FRRouting and BIRD to support this new paradigm and demonstrate the flexibility of xBGP with four different use cases. Finally, we discuss how xBGP could affect future research on future routing protocols.
Patrick Wintermeyer, Maria Apostolaki, Alexander Dietmüller, Laurent Vanbever
ACM HotNets 2020. Chicago, Illinois, USA (November 2020).
Programmable devices allow the operator to specify the data-plane behavior of a network device in a high-level language such as P4. The compiler then maps the P4 program to the hardware after applying a set of optimizations to minimize resource utilization. Yet, the lack of context restricts the compiler to conservatively account for all possible inputs -- including unrealistic or infrequent ones -- leading to sub-optimal use of the resources or even compilation failures. To address this inefficiency, we propose that the compiler leverages insights from actual traffic traces, effectively unlocking a broader spectrum of possible optimizations.
We present a system working alongside the compiler that uses traffic-awareness to reduce the allocated resources of a P4 program by: (i) removing dependencies that do not manifest; (ii) adjusting table and register sizes to reduce the pipeline length; and (iii) offloading parts of the program that are rarely used to the controller. Our prototype implementation on the Tofino switch automatically profiles the P4 program, detects opportunities and performs optimizations to improve the pipeline efficiency.
Our work showcases the potential benefit of applying profiling techniques used to compile general-purpose languages to compiling P4 programs.
Samuel Steffen, Timon Gehr, Petar Tsankov, Laurent Vanbever, Martin Vechev
ACM SIGCOMM 2020. New York, USA (August 2020).
Not all important network properties need to be enforced all the time. Often, what matters instead is the fraction of time / probability these properties hold. Computing the probability of a property in a network relying on complex inter-dependent routing protocols is challenging and requires determining all failure scenarios for which the property is violated. Doing so at scale and accurately goes beyond the capabilities of current network analyzers.
In this paper, we introduce NetDice, the first scalable and accurate probabilistic network configuration analyzer supporting BGP, OSPF, ECMP, and static routes. Our key contribution is an inference algorithm to efficiently explore the space of failure scenarios. More specifically, given a network configuration and a property phi, our algorithm automatically identifies a set of links whose failure is provably guaranteed not to change whether phi holds. By pruning these failure scenarios, NetDice manages to accurately approximate P(phi). NetDice supports practical properties and expressive failure models including correlated link failures.
We implement NetDice and evaluate it on realistic configurations. NetDice is practical: it can precisely verify probabilistic properties in few minutes, even in large networks.
Thomas Holterbach, Tobias Bühler, Tino Rellstab, Laurent Vanbever
ACM SIGCOMM CCR 2020. Volume 50 Issue 2 (April 2020).
Each year at ETH Zurich, around 100 students collectively build and operate their very own Internet infrastructure composed of hundreds of routers and dozens of Autonomous Systems (ASes). Their goal? Enabling Internet-wide connectivity.
We find this class-wide project to be invaluable in teaching our students how the Internet infrastructure practically works. Among others, our students have a much deeper understanding of Internet operations alongside their pitfalls. Besides students tend to love the project: clearly the fact that all of them need to cooperate for the entire Internet to work is empowering.
In this paper, we describe the overall design of our teaching platform, how we use it, and interesting lessons we have learnt over the years. We also make our platform openly available.
Rüdiger Birkner, Dana Drachsler Cohen, Laurent Vanbever, Martin Vechev
USENIX NSDI 2020. Santa Clara, California, USA (February 2020).
Network verification and configuration synthesis are promising approaches to make networks more reliable and secure by enforcing a set of policies. However, these approaches require a formal and precise description of the intended network behavior, imposing a major barrier to their adoption: network operators are not only reluctant to write formal specifications, but often do not even know what these specifications are.
We present Config2Spec, a system that automatically synthesizes a formal specification (a set of policies) of a network given its configuration and a failure model (e.g., up to two link failures). A key technical challenge is to design a synthesis algorithm which can efficiently explore the large space of possible policies. To address this challenge, Config2Spec relies on a careful combination of two well-known methods: data plane analysis and control plane verification.
Experimental results show that Config2Spec scales to mining specifications of large networks (>150 routers).
Albert Gran Alcoz, Alexander Dietmüller, Laurent Vanbever
USENIX NSDI 2020. Santa Clara, California, USA (February 2020).
Push-In First-Out (PIFO) queues are hardware primitives which enable programmable packet scheduling by allowing to perfectly reorder packets at line rate. While promising, implementing PIFO queues in hardware and at scale is not easy: only hardware designs (not implementations) exist and they can only support about 1000 flows.
In this paper, we introduce SP-PIFO, a programmable packet scheduler which closely approximates the behavior of PIFO queues using strict-priority queues—at line rate, at scale, and on existing devices. The key insight behind SP-PIFO is to dynamically adapt the mapping between packet ranks and available queues to minimize the scheduling errors. We present a mathematical formulation of the problem and derive an adaptation technique which closely approximates the optimal queue mapping without any traffic knowledge.
We fully implement SP-PIFO in P4 and evaluate it on real workloads. We show that SP-PIFO: (i) closely matches ideal PIFO performance, with as little as 8 priority queues; (ii) arbitrarily scales to large amount of flows and ranks; and (iii) quickly adapts to traffic variations. We also show that SP-PIFO runs at line rate on existing programmable data planes.
Roland Meier, Thomas Holterbach, Stephan Keck, Matthias Stähli, Vincent Lenders, Ankit Singla, Laurent Vanbever
ACM HotNets 2019. Princeton, NJ, USA (November 2019).
Traditional network control planes can be slow and require manual tinkering from operators to change their behavior. There is thus great interest in a faster, data-driven approach that uses signals from real-time traffic instead. However, the promise of fast and automatic reaction to data comes with new risks: malicious inputs designed towards negative outcomes for the network, service providers, users, and operators.
Adversarial inputs are a well-recognized problem in other areas; we show that networking applications are susceptible to them too. We characterize the attack surface of data-driven networks and examine how attackers with different privileges—from infected hosts to operator-level access—may target network infrastructure, applications, and protocols. To illustrate the problem, we present case studies with concrete attacks on recently proposed data-driven systems.
Our analysis urgently calls for a careful study of attacks and defenses in data-driven networking, with a view towards ensuring that their promise is not marred by oversights in robust design.
Thomas Holterbach, Edgar Costa Molero, Maria Apostolaki, Alberto Dainotti, Stefano Vissicchio, Laurent Vanbever
USENIX NSDI 2019. Boston, Massachusetts, USA (February 2019).
We present Blink, a data-driven system that leverages TCP-induced signals to detect failures directly in the data plane. The key intuition behind Blink is that a TCP flow exhibits a predictable behavior upon disruption: retransmitting the same packet over and over, at epochs exponentially spaced in time. When compounded over multiple flows, this behavior creates a strong and characteristic failure signal. Blink efficiently analyzes TCP flows to: (i) select which ones to track; (ii) reliably and quickly detect major traffic disruptions; and (iii) recover connectivity---all this, completely in the data plane. We present an implementation of Blink in P4 together with an extensive evaluation on real and synthetic traffic traces. Our results indicate that Blink: (i) achieves sub-second rerouting for large fractions of Internet traffic; and (ii) prevents unnecessary traffic shifts even in the presence of noise. We further show the feasibility of Blink by running it on an actual Tofino switch.
Maria Apostolaki, Gian Marti, Jan Müller, Laurent Vanbever
NDSS Symposium 2019. San Diego, CA, USA (February 2019).
Nowadays Internet routing attacks remain practically effective as existing countermeasures either fail to provide protection guarantees or are not easily deployable. Blockchain systems are particularly vulnerable to such attacks as they rely on Internet-wide communications to reach consensus. In particular, Bitcoin---the most widely-used cryptocurrency---can be split in half by any AS-level adversary using BGP hijacking.
In this paper, we present SABRE, a secure and scalable Bitcoin relay network which relays blocks worldwide through a set of connections that are resilient to routing attacks. SABRE runs alongside the existing peer-to-peer network and is easily deployable. As a critical system, SABRE design is highly resilient and can efficiently handle high bandwidth loads, including Denial of Service attacks.
We built SABRE around two key technical insights. First, we leverage fundamental properties of inter-domain routing (BGP) policies to host relay nodes: (i) in networks that are inherently protected against routing attacks; and (ii) on paths that are economically-preferred by the majority of Bitcoin clients. These properties are generic and can be used to protect other Blockchain-based systems. Second, we leverage the fact that relaying blocks is communication-heavy, not computation-heavy. This enables us to offload most of the relay operations to programmable network hardware (using the P4 programming language). Thanks to this hardware/software co-design, SABRE nodes operate seamlessly under high load while mitigating the effects of malicious clients.
We present a complete implementation of SABRE together with an extensive evaluation. Our results demonstrate that SABRE is effective at securing Bitcoin against routing attacks, even with deployments of as few as 6 nodes.