FAst In-Network GraY Failure Detection for ISPs

Authors: Edgar Costa Molero, Stefano Vissicchio, and Laurent Vanbever
SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference

Abstract

Avoiding packet loss is crucial for ISPs. Unfortunately, malfunctioning hardware at ISPs can cause long-lasting packet drops, also known as gray failures, which are undetectable by existing monitoring tools. In this paper, we describe the design and implementation of FANcY, an ISP-Targeted system that detects and localizes gray failures quickly and accurately. FANcY complements previous monitoring approaches, which are mainly tailored for low-delay networks such as data center networks and do not work at ISP scale. We experimentally confirm FANcY’s capability to accurately detect gray failures in seconds, as long as only tiny fractions of traffic experience losses. We also implement FANcY in an Intel Tofino switch, demonstrating how it enables fine-grained fast rerouting.

Research Areas: Data-Driven Networking and Network Programmability

People

Talk

BibTex

@INPROCEEDINGS{molero2022in-network,
	isbn = {978-1-4503-9420-8},
	doi = {10.1145/3544216.3544242},
	year = {2022-08},
	booktitle = {SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference},
	type = {Conference Paper},
	author = {Costa Molero, Edgar and Vissicchio, Stefano and Vanbever, Laurent},
	abstract = {Avoiding packet loss is crucial for ISPs. Unfortunately, malfunctioning hardware at ISPs can cause long-lasting packet drops, also known as gray failures, which are undetectable by existing monitoring tools. In this paper, we describe the design and implementation of FANcY, an ISP-Targeted system that detects and localizes gray failures quickly and accurately. FANcY complements previous monitoring approaches, which are mainly tailored for low-delay networks such as data center networks and do not work at ISP scale. We experimentally confirm FANcY's capability to accurately detect gray failures in seconds, as long as only tiny fractions of traffic experience losses. We also implement FANcY in an Intel Tofino switch, demonstrating how it enables fine-grained fast rerouting.},
	keywords = {Failure detection; Measurements; Network Hardware; Programmable data planes},
	language = {en},
	address = {New York, NY},
	publisher = {Association for Computing Machinery},
	title = {FAst In-Network GraY Failure Detection for ISPs},
	PAGES = {677 - 692},
	Note = {36th ACM SiGCOMM Conference (SIGCOMM 2022); Conference Location: Amsterdam, Netherlands; Conference Date: August 22-26, 2022; Conference lecture on August 25, 2022}
}

Research Collection: 20.500.11850/573971

Slide Sources: https://gitlab.ethz.ch/projects/41288