Generating representative, live network traffic out of millions of code repositories

Authors: Tobias Bühler, Roland Schmid, Sandro Lutz, and Laurent Vanbever
HotNets '22: Proceedings of the 21st ACM Workshop on Hot Topics in Networks

Abstract

In theory, any network operator, developer, or vendor should have access to large amounts of live network traffic for testing their solutions. In practice, though, that is not the case. Network actors instead have to use packet traces or synthetic traffic, which is highly suboptimal: today’s generated traffic is unrealistic. We propose a system for generating live application traffic leveraging massive codebases such as GitHub.

Our key observation is that many repositories have now become “orchestrable” thanks to the rise of container technologies. To showcase the practicality of the approach, we iterate through >293k GitHub repositories and manage to capture >74k traces containing meaningful and diverse network traffic. Based on this first success, we outline the design of a system, Dynamo, which analyzes these traces to select and orchestrate open-source projects to automatically generate live application traffic matching a user’s specification.

People

Dr. Tobias Bühler
PhD student
2016—2023
Roland Schmid
PhD student

Talk

BibTex

@INPROCEEDINGS{bühler2022generating,
	isbn = {978-1-4503-9899-2},
	doi = {10.1145/3563766.3564084},
	year = {2022-11},
	booktitle = {HotNets '22: Proceedings of the 21st ACM Workshop on Hot Topics in Networks},
	type = {Conference Paper},
	author = {Bühler, Tobias and Schmid, Roland and Lutz, Sandro and Vanbever, Laurent},
	size = {7 p.},
	abstract = {In theory, any network operator, developer, or vendor should have access to large amounts of live network traffic for testing their solutions. In practice, though, that is not the case. Network actors instead have to use packet traces or synthetic traffic, which is highly suboptimal: today's generated traffic is unrealistic. We propose a system for generating live application traffic leveraging massive codebases such as GitHub.Our key observation is that many repositories have now become "orchestrable" thanks to the rise of container technologies. To showcase the practicality of the approach, we iterate through >293k GitHub repositories and manage to capture >74k traces containing meaningful and diverse network traffic. Based on this first success, we outline the design of a system, Dynamo, which analyzes these traces to select and orchestrate open-source projects to automatically generate live application traffic matching a user's specification.},
	keywords = {traffic generation; traffic analysis; network virtualization},
	language = {en},
	address = {New York, NY},
	publisher = {Association for Computing Machinery},
	title = {Generating representative, live network traffic out of millions of code repositories},
	PAGES = {3563766},
	Note = {21st ACM Workshop on Hot Topics in Networks (HotNets 2022); Conference Location: Austin, TX, USA; Conference Date: November 14-15, 2022}
}

Research Collection: 20.500.11850/589729

Slide Sources: https://gitlab.ethz.ch/projects/41218