Finding likely network congestion events.
Link loads—how much traffic crosses each link—are a key indicator of network performance. Excessive link loads, in particular, increase the likelihood of congestion, packet drops, and inflated delays. Measuring link loads in the past is not sufficient to prevent congestion in the future: a likely event that causes congestion might have never occurred in the past.
To improve congestion, operators must consider link loads under a wide range of likely events. Past systems (such as QARC and YU) find worst-case link loads under a set of realistic network failures. However, in our recent publication Velo, we found that external routing events that affect where traffic enters and leaves the network can have an even larger effect than internal failures. These events are even more likely: BGP routers receive around ten updates per second on average (source).
Velo finds worst-case scenarios that result in the largest possible link loads. It finds both the BGP routes that maximize the load on any link, and the worst-case link where traffic can enter the network. Although Velo is guaranteed to find the worst-case link load, such events are not likely to be observed in the future.
This project extends Velo to identify likely events that cause excessive link loads. You do so by introducing a probabilistic model of routing events: how likely will a network observe a particular event in the future? A first model assumes events with a significant impact are less likely than those with a more minor impact. More sophisticated models learn from past events to predict the probability of any possible future event. Using such a model, can you find the most likely event that causes congestion in the network? How likely is the network to be congested?
Milestones
- Extend Velo to limit the events it considers by restricting their impact. You will mainly focus on how ingress traffic can shift from one point to another, and prove that your implementation actually finds the worst-case link loads under the imposed restrictions.
- Evaluate your implementation to show that it can scale to large ISP networks of hundreds of routers (improving your implementation if necessary).
- [Master thesis only] Extend Velo by adding a probabilistic model of routing events to perform probabilistic verification. Your goal is to design an algorithm to answer the following questions: What is the most likely event that causes congestion? What is the likelihood of network congestion?
Requirements
- Basic knowledge of BGP and routing algebra
- Programming experience, ideally some prior knowledge of Rust.