Towards Network Model Generalization using Strategic Data Collection

ACM SIGCOMM Posters and Demos '25: Proceedings of the ACM SIGCOMM 2025 Posters and Demos

Abstract

Essential networking applications, such as video streaming, require accurate network models to estimate current and future network states (e.g., is the network congested?). Due to the complexity of today’s networks and the subsequent difficulty of this modeling task, Machine Learning (ML)-based approaches have emerged as an alternative to first-principle modeling methods. However, proposed ML algorithms suffer from a generalization crisis: they often fail to perform in deployments outside of their training environment. Moreover, simple solutions such as naively training on more data do not guarantee improved generalization performance.

We propose an interpretable approach to improving model gen- eralization by focusing on the quality of a dataset over sample quantity already during data collection. Notably, our approach’s interpretability allows us to reason on which environments to pri- oritize at the data acquisition stage. To this end, we investigate the impact of dataset metrics such as Round Trip Time (RTT) and throughput on both in-distribution (ID) and out-of-distribution (OOD) model performance. Our results suggest that strategically performing data collection in environments with broader state- space coverage in areas of higher RTT and lower throughput is key to achieving improved model generalization and OOD performance.

Research Area: Network Analysis and Reasoning

People

Benjamin Hoffman
PhD student
Dr. Alexander Dietmüller
PhD student
2018—2024

BibTex

@incollection{hoffman2025towards,
 title={Towards Network Model Generalization using Strategic Data Collection},
 author={Hoffman, Benjamin and Dietm{"u}ller, Alexander and Vanbever, Laurent},
 booktitle={Proceedings of the ACM SIGCOMM 2025 Posters and Demos},
 pages={31--33},
 year={2025}
 }

Research Collection: 20.500.11850/783274