Towards reliable network control planes

Author: Ahmed Elhassany
Doctoral Thesis

Abstract

Many critical services, such as e-commerce, emergency response, or even remote surgeries, rely on computer networks. This strategic importance makes them a mission-critical infrastructure that must operate reliably, with minimum downtime. Achieving reliable operations is challenging, though, as confirmed by the countless major network downtimes that are frequently making the news. In this dissertation, we develop techniques and tools to enable the reliable network operations of the two main types of networks deployed today: (i) networks running distributed routing protocols; and (ii) networks managed by a logically-centralized controller, also known as Software-Defined Networks (SDN). Each paradigm comes with its own set of operational challenges. In networks running distributed routing protocols, the main problem is configuring each device correctly. Studies have indeed shown that human-induced misconfigurations, not physical failures, explain the majority of network downtimes. In contrast, in SDN, the main problem is ensuring the correctness of the controller software. Therefore, we tackle two complementary problems: (i) how to synthesize configurations for networks running distributed routing protocols automatically?; and (ii) how to catch bugs in SDN networks? To eliminate human-induced misconfigurations, we propose two novel and complementary techniques to synthesize configurations from high-level policies. The first technique aims at generality, while the second one aims at usability and scalability. Specifically, the first technique enables the synthesis of correct configurations for arbitrary protocols from operators’ intents and a formal specification of the protocols’ behavior. While useful, we show that this technique is limited in its scalability and can produce configurations that are far from what a human would write. The second set of techniques rely on autocompletion to address these problems. Here, we allow the network operators to specify parts of the configurations and constraint the values that the synthesizer is permitted to produce. We show that autocompletion enables the synthesizer to scale to large networks and generates interpretable configurations. To catch bugs in SDN controllers, we develop new techniques for detecting and troubleshooting concurrency violations. As an event-based system, an SDN controller is indeed particularly subject to concurrency issues. Thus, we present a sound and complete dynamic concurrency analyzer for SDN controllers. Furthermore, we present techniques to assist the developers in identifying and troubleshooting the source of the reported concurrency violations.

BibTex

@PHDTHESIS{elhassany2020towards,
	copyright = {In Copyright - Non-Commercial Use Permitted},
	year = {2019-09-09},
	type = {Doctoral Thesis},
	author = {Elhassany, Ahmed},
	size = {170 p.},
	abstract = {Many critical services, such as e-commerce, emergency response, or even remote surgeries, rely on computer networks. This strategic importance makes them a mission-critical infrastructure that must operate reliably, with minimum downtime. Achieving reliable operations is challenging, though, as confirmed by the countless major network downtimes that are frequently making the news.In this dissertation, we develop techniques and tools to enable the reliable network operations of the two main types of networks deployed today: (i) networks running distributed routing protocols; and (ii) networks managed by a logically-centralized controller, also known as Software-Defined Networks (SDN). Each paradigm comes with its own set of operational challenges. In networks running distributed routing protocols, the main problem is configuring each device correctly. Studies have indeed shown that human-induced misconfigurations, not physical failures, explain the majority of network downtimes. In contrast, in SDN, the main problem is ensuring the correctness of the controller software. Therefore, we tackle two complementary problems: (i) how to synthesize configurations for networks running distributed routing protocols automatically?; and (ii) how to catch bugs in SDN networks?To eliminate human-induced misconfigurations, we propose two novel and complementary techniques to synthesize configurations from high-level policies. The first technique aims at generality, while the second one aims at usability and scalability. Specifically, the first technique enables the synthesis of correct configurations for arbitrary protocols from operators’ intents and a formal specification of the protocols’ behavior. While useful, we show that this technique is limited in its scalability and can produce configurations that are far from what a human would write. The second set of techniques rely on autocompletion to address these problems. Here, we allow the network operators to specify parts of the configurations and constraint the values that the synthesizer is permitted to produce. We show that autocompletion enables the synthesizer to scale to large networks and generates interpretable configurations.To catch bugs in SDN controllers, we develop new techniques for detecting and troubleshooting concurrency violations. As an event-based system, an SDN controller is indeed particularly subject to concurrency issues. Thus, we present a sound and complete dynamic concurrency analyzer for SDN controllers. Furthermore, we present techniques to assist the developers in identifying and troubleshooting the source of the reported concurrency violations.},
	keywords = {Computer networks; Programming languages; program synthesis; VERIFICATION (SOFTWARE ENGINEERING)},
	language = {en},
	address = {Zurich},
	publisher = {ETH Zurich},
	DOI = {10.3929/ethz-b-000429488},
	title = {Towards reliable network control planes},
	school = {ETH Zurich}
}

Research Collection: 20.500.11850/429488