Network Learning with Large Language Models

In this semester project, we will investigate the capabilities of large language models (LLMs) such as GPT-4 in learning network configurations. Specifically, we aim to gauge their performance within our mini-internet.

Our approach involves applying prompt engineering to these LLMs to examine their potential and limitations in reasoning about complex network interactions. For instance, could we transform an LLM into an automated grader by providing it with configuration files from previous students and then asking it to evaluate if the configuration meets certain requirements? Alternatively, could we morph it into a synthesizer by supplying it with configuration templates and assigning it various configuration tasks?


LLMs have demonstrated impressive abilities across a broad range of natural language processing (NLP) tasks [1]. There’s also an ongoing conversation about their potential for logical reasoning. For instance, one study demonstrated that LLMs are zero-shot reasoners when prompted with the phrase “Let’s think step by step” [2]. According to a technical report by OpenAI, GPT-4 scored 700 out of 800 in SAT Math [3].

In terms of networking knowledge, interactions with models such as ChatGPT suggest that these systems have assimilated related RFCs and configuration files from the internet, even in their earlier versions [4]. However, there’s a gap in the research evaluating their competence in resolving actual network issues, such as network verification and synthesis. Given their demonstrated aptitude in learning software programs, we believe there is significant potential in this area [5].


It’s important to note that directing a trained LLM to execute our instructions is not a straightforward process. LLMs are probabilistic models that produce responses based on input prompts. Crafting an effective prompt is essential for obtaining a good response at a reasonable cost, making prompt engineering a vital IT skill [6] and an emerging research topic [7].

In relation to our network tasks, we anticipate several challenges:

  • Configuration Pre-processing: Even a small configuration file can comprise hundreds of lines. Feeding the entire network configuration into the prompt could easily exceed the maximum prompt length set by the LLM. We’ll likely need to pre-process the configuration without omitting crucial information.
  • Subtask Decomposition: If the LLM provides an incorrect answer, it’s crucial to pinpoint the root cause of its reasoning chain. This will involve investigating its intermediate analysis and iteratively adjusting our prompt accordingly.
  • LLM Limitations: Networks are intricate systems that require both computation (e.g., shortest path routing) and logical reasoning (e.g., interactions among different protocols), making them a challenging test case for LLM boundaries.


Our project will commence with the development of our network application using the GPT-4 API. The project can be broadly split into the following milestones:

  • Investigate prompt engineering techniques and establish a series of mini-internet tasks that GPT-4 API can potentially solve.
  • Construct a preliminary demo that automates interactions with GPT-4 using one fixed configuration.
  • Automate the process for various configuration inputs.
  • Document the strengths and limitations of the GPT-4 model for different subtasks.

If time allows, we may also explore new tasks not limited to the mini-internet.


  • Familiarity with FRR router configuration commands, good performance in the routing project, or equivalent experience.
  • Proficiency in Python or another programming language with a reliable OpenAI library [8].
  • Comfort with scripting for automation.
  • Basic understanding of prompt engineering.
  • Experience with using LLMs (e.g., ChatGPT). Prior experience with developing LLM-based applications is advantageous but not essential.


Yu Chen
PhD student