Probabilistic modelling of internet routes
The distributed nature of the internet makes the exchange of BGP messages from a single viewpoint (AS) seemingly stochastic or even random. However, there is hope to capture these complex interactions as messages are triggered by some structural change somewhere on the internet, which we hope to learn.
Modelling external routes is of particular importance in the context of network verification. State-of-the-art control plane verification tools generally either constrain the set of routing inputs [1] or allow (approximately) arbitrary routes but is usually not scalable [2][3]. You will help improve this. This thesis aims to better understand the “laws of motion” of the internet in order to predict changes at a local level. More precisely, we wish to estimate a distribution over all the possible environments our network could be exposed to based on historical data.
In a previous Masters thesis, some initial results were achieved in this direction, showing that some environments were much more likely than others. We wish to take the scope of that work to the next level both in terms of methods used and in performance (precision and training/inference time). Our initial ideas of models include, but are not limited to:
- Bayesian learning (Gaussian Processes, Variational inference techniques, deep learning) on historical data with different regularisation techniques.
- Using domain knowledge to explicit the model and its variables (in our case some representation of the internet) and learn the parameters for those variables given some historical data. Then, we may sample this local synthetic model in order to build a distribution over network environments.
The second method is inspired by network inference techniques [4] but differs in a few points. First, we only care about a local perspective of the external network, as opposed to a correct representation of the whole network. Furthermore, what we really are interested in is not so much the exact network structure but rather the changes that are likely to occur and what influence it has on the distribution of network environments.
You will work in close collaboration with two PhD students, with weekly meetings and ongoing support.
Milestones
- After familiarising yourself with the background information (we will provide references to get you started), you will get to know your tools with initial experiments.
- Getting to know the data by replicating some results.
- Formaly framing the problem. What are you trying to model and with which assumptions.
- Come up with good evaluation metrics, given the objectives of capturing route diversity, generalisation, training and inference complexity, …
- Explore ideas presented above.
- Extend the framework for training and evaluating different models.
- Implement at least one probabilistic machine learning model and one based on modelling and sampling.
- Improve one of the models.
- Depending on your initial experiments, you should explore the most promising avenues more sophisticatedly and perhaps develop your own ideas.
- You should rigorously evaluate your model on a variety of networks.
Requirements
- Some knowledge of probabilistic modelling, ideally in the context of learning.
- Coding abilities are also welcome as implementation and evaluation are key to this thesis.
References
- S. Steffen, T. Gehr, P. Tsankov, L. Vanbever, and M. Vechev, “NetDice: Probabilistic Verification of Network Configurations”, SIGCOMM ‘20.
- R. Beckett, A. Gupta, R. Mahajan, and D. Walker, “Minesweeper: A General Approach to Network Configuration Verification” SIGCOMM ‘17.
- D. Wang, P. Zhang, and A. Gember-Jacobson, “Expresso: Comprehensively Reasoning About External Routes Using Symbolic Simulation,” SIGCOMM ‘24
- W. Mühlbauer, A. Feldmann, O. Maennel, M. Roughan, and S. Uhlig, “Building an AS-Topology Model that Captures Route Diversity”.