The Networked Systems Group (NSG) is a research group in the Department of Information Technology and Electrical Engineering (D-ITET) at ETH Zürich led by Prof. Laurent Vanbever. We are also part of the ETH ICE center.
Our research interests are centered around complex network management problems, with the larger goal of making current and future networks (especially the Internet) easier to design, understand and operate. We are currently active in multiple areas including network programmability, data-driven networking, verification, routing, and security. Most of our projects are inherently multidisciplinary and tend to involve recent advances in programming languages, algorithmics, and machine learning.
A few recent examples of practical systems we have built include: Blink, Config2Spec, Bayonet, Fibbing, iTAP, Net2Text, NetComplete, NetHide, SDX, SyNET, SDNRacer, SP-PIFO, Stroboscope, and SWIFT. We are also currently looking at the impact of routing attacks on systems overlays such as cryptocurrencies and anonymity networks. To learn about our work, please check out our research and publications pages.
Our flagship lecture is Communication Networks offered in the Spring semester. We also offer a lecture on Advanced Topics in Communication Networks in the Fall semester. We also offer a Seminar in Communication Networks. Check our courses page for more information.
Coralie Busse-Grawitz, one of our master students and future PhD student, is one of this year's recipients of the ABB Research Award (ABB Forschungspreis) for her Master's Thesis entitled "In-Network Inference with Random Forests". She received the award at this year's "ETH Tag" on 16 November.
For the second time after 2016, Laurent has been awarded the Golden Owl of the VSETH! The Owl is awarded by the VSETH, ETH Zurich's students association, and "honours lecturers who have provided exceptional teaching". Huge kudos to the teaching assistants from NSG without whom this wouldn't have been possible!
Stay tuned to learn about how adversarial traffic inputs can mislead “data-driven” networks.
Laurent got awarded an ERC Starting Grant. The grant (SyNET) aims at automatically generating correct network configurations rather than verifying them. This also means that our group will soon recruit PhD students and postdocs. Ping us if interested!
Starting from this fall, our group will offer a new seminar lecture. This year's topic will be "Learning, Reasoning and Control" in the context of communication networks. You can find more information about the seminar on our website: seminar-net.ethz.ch. Please note that the number of seats is limited to 24.
USENIX NSDI 2020. Santa Clara, California, USA (February 2020).
USENIX NSDI 2020. Santa Clara, California, USA (February 2020).
ACM HotNets 2019. Princeton, NJ, USA (November 2019).
Traditional network control planes can be slow and require manual tinkering from operators to change their behavior. There is thus great interest in a faster, data-driven approach that uses signals from real-time traffic instead. However, the promise of fast and automatic reaction to data comes with new risks: malicious inputs designed towards negative outcomes for the network, service providers, users, and operators.
Adversarial inputs are a well-recognized problem in other areas; we show that networking applications are susceptible to them too. We characterize the attack surface of data-driven networks and examine how attackers with different privileges—from infected hosts to operator-level access—may target network infrastructure, applications, and protocols. To illustrate the problem, we present case studies with concrete attacks on recently proposed data-driven systems.
Our analysis urgently calls for a careful study of attacks and defenses in data-driven networking, with a view towards ensuring that their promise is not marred by oversights in robust design.
USENIX NSDI 2019. Boston, Massachusetts, USA (February 2019).
We present Blink, a data-driven system that leverages TCP-induced signals to detect failures directly in the data plane. The key intuition behind Blink is that a TCP flow exhibits a predictable behavior upon disruption: retransmitting the same packet over and over, at epochs exponentially spaced in time. When compounded over multiple flows, this behavior creates a strong and characteristic failure signal. Blink efficiently analyzes TCP flows to: (i) select which ones to track; (ii) reliably and quickly detect major traffic disruptions; and (iii) recover connectivity---all this, completely in the data plane. We present an implementation of Blink in P4 together with an extensive evaluation on real and synthetic traffic traces. Our results indicate that Blink: (i) achieves sub-second rerouting for large fractions of Internet traffic; and (ii) prevents unnecessary traffic shifts even in the presence of noise. We further show the feasibility of Blink by running it on an actual Tofino switch.
NDSS Symposium 2019. San Diego, CA, USA (February 2019).
Nowadays Internet routing attacks remain practically effective as existing countermeasures either fail to provide protection guarantees or are not easily deployable. Blockchain systems are particularly vulnerable to such attacks as they rely on Internet-wide communications to reach consensus. In particular, Bitcoin---the most widely-used cryptocurrency---can be split in half by any AS-level adversary using BGP hijacking.
In this paper, we present SABRE, a secure and scalable Bitcoin relay network which relays blocks worldwide through a set of connections that are resilient to routing attacks. SABRE runs alongside the existing peer-to-peer network and is easily deployable. As a critical system, SABRE design is highly resilient and can efficiently handle high bandwidth loads, including Denial of Service attacks.
We built SABRE around two key technical insights. First, we leverage fundamental properties of inter-domain routing (BGP) policies to host relay nodes: (i) in networks that are inherently protected against routing attacks; and (ii) on paths that are economically-preferred by the majority of Bitcoin clients. These properties are generic and can be used to protect other Blockchain-based systems. Second, we leverage the fact that relaying blocks is communication-heavy, not computation-heavy. This enables us to offload most of the relay operations to programmable network hardware (using the P4 programming language). Thanks to this hardware/software co-design, SABRE nodes operate seamlessly under high load while mitigating the effects of malicious clients.
We present a complete implementation of SABRE together with an extensive evaluation. Our results demonstrate that SABRE is effective at securing Bitcoin against routing attacks, even with deployments of as few as 6 nodes.
ACM HotNets 2018. Redmond, WA, USA (November 2018).
One design principle of modern network architecture seems to be set in stone: a software-based control plane drives a hardware- or software-based data plane. We argue that it is time to revisit this principle after the advent of programmable switch ASICs which can run complex logic at line rate.
We explore the possibility and benefits of accelerating the control plane by offloading some of its tasks directly to the network hardware. We show that programmable data planes are indeed powerful enough to run key control plane tasks including: failure detection and notification, connectivity retrieval, and even policy-based routing protocols. We implement in P4 a prototype of such a “hardware-accelerated” control plane, and illustrate its benefits in a case study.
Despite such benefits, we acknowledge that offloading tasks to hardware is not a silver bullet. We discuss its tradeoffs and limitations, and outline future research directions towards hardware-software codesign of network control planes.
USENIX Security 2018. Baltimore, MD, USA (August 2018).
Simple path tracing tools such as traceroute allow malicious users to infer network topologies remotely and use that knowledge to craft advanced denial-of-service (DoS) attacks such as Link-Flooding Attacks (LFAs). Yet, despite the risk, most network operators still allow path tracing as it is an essential network debugging tool.
In this paper, we present NetHide, a network topology obfuscation framework that mitigates LFAs while preserving the practicality of path tracing tools. The key idea behind NetHide is to formulate network obfuscation as a multi-objective optimization problem that allows for a flexible tradeoff between security (encoded as hard constraints) and usability (encoded as soft constraints). While solving this problem exactly is hard, we show that NetHide can obfuscate topologies at scale by only considering a subset of the candidate solutions and without reducing obfuscation quality. In practice, NetHide obfuscates the topology by intercepting and modifying path tracing probes directly in the data plane. We show that this process can be done at line-rate, in a stateless fashion, by leveraging the latest generation of programmable network devices.
We fully implemented NetHide and evaluated it on realistic topologies. Our results show that NetHide is able to obfuscate large topologies (> 150 nodes) while preserving near-perfect debugging capabilities. In particular, we show that operators can still precisely trace back > 90% of link failures despite obfuscation.
Timon Gehr, Sasa Misailovic, Petar Tsankov, Laurent Vanbever, Pascal Wiesman, Martin Vechev
PLDI 2018. Philadelphia, Pennsylvania, USA (June 2018).
Network operators often need to ensure that important probabilistic properties are met, such as that the probability of network congestion is below a certain threshold. Ensuring such properties is challenging and requires both a suitable language for probabilistic networks and an automated procedure for answering probabilistic inference queries. We present Bayonet, a novel approach that consists of: (i) a probabilistic network programming language and (ii) a system that performs probabilistic inference on Bayonet programs. The key insight behind Bayonet is to phrase the problem of probabilistic network reasoning as inference in existing probabilistic languages. As a result, Bayonet directly leverages existing probabilistic inference systems and offers a flexible and expressive interface to operators. We present a detailed evaluation of Bayonet on common network scenarios, such as network congestion, reliability of packet delivery, and others. Our results indicate that Bayonet can express such practical scenarios and answer queries for realistic topology sizes (with up to 30 nodes).
USENIX NSDI 2018. Renton, Washington, USA (April 2018).
For an Internet Service Provider (ISP), getting an accurate picture of how its network behaves is challenging. Indeed, given the carried traffic volume and the impossibility to control end-hosts, ISPs often have no other choice but to rely on heavily sampled traffic statistics, which provide them with coarse-grained visibility at a less than ideal time resolution (seconds or minutes). We present Stroboscope, a system that enables fine-grained monitoring of any traffic flow by instructing routers to mirror millisecond-long traffic slices in a programmatic way. Stroboscope takes as input high-level monitoring queries together with a budget and automatically determines: (i) which flows to mirror; (ii) where to place mirroring rules, using fast and provably correct algorithms; and (iii) when to schedule these rules to maximize coverage while meeting the input budget. We implemented Stroboscope, and show that it scales well: it computes schedules for large networks and query sizes in few seconds, and produces a number of mirroring rules well within the limits of current routers. We also show that Stroboscope works on existing routers and is therefore immediately deployable.
USENIX NSDI 2018. Renton, Washington, USA (April 2018).
Network operators often need to adapt the configuration of a network in order to comply with changing routing policies. Evolving existing configurations, however, is a complex task as local changes can have unforeseen global effects. Not surprisingly, this often leads to mistakes that result in network downtimes. We present NetComplete, a system that assists operators in modifying existing network-wide configurations to comply with new routing policies. NetComplete takes as input configurations with “holes” that identify the parameters to be completed and “autocompletes” these with concrete values. The use of a partial configuration addresses two important challenges inherent to existing synthesis solutions: (i) it allows the operators to precisely control how configurations should be changed; and (ii) it allows the synthesizer to leverage the existing configurations to gain performance. To scale, NetComplete relies on powerful techniques such as counter-example guided inductive synthesis (for link-state protocols) and partial evaluation (for path-vector protocols). We implemented NetComplete and showed that it can autocomplete configurations using static routes, OSPF, and BGP. Our implementation also scales to realistic networks and complex routing policies. Among others, it is able to synthesize configurations for networks with up to 200 routers within few minutes.