Bringing pcap processing to the next level
With increasing network speeds, packet captures are also increasingly larger and harder to process. Today, pcap files are usually processed with a Python script that iterates over the pcap file to extract useful information. This can get extremely slow for bigger files, so the best way so far is to split them into smaller batches and analyze them individually. The goal of this thesis would be to find better ways to do this.
One option could be to ingest the pcap data into a database that can then be queried. This could be an option if we need to query the pcap multiple times or if the speed-up during querying is greater than the time it takes to ingest into the database.
Another option would be to offload the processing of the pcap to a GPU and run it in parallel. For example, one could filter the pcap files using GPUs and only run the Python script on the few packets that matched the filter. While there has been previous work on pcap processing on the GPU [1,2] there is no ready-to-use tool out there to run the processing. Building such a tool would be the goal of this project.
Milestones
- Setting up and optimizing pcap processing first with the available Python scripts to get a baseline against which to compare.
- Ingest pcap files into a fitting database and compare the query time to the Python scripts.
- Implement pcap processing directly on the GPU, wherever appropriate, to speed up the processing time.
Requirements
- Knowledge of Databases and query languages
- Experience with C++ and ideally also with CUDA
References
- Nottingham, A., and Irwin, B. “A high-level architecture for efficient packet trace analysis on GPU co-processors.” 2013 Information Security for South Africa.
- Nottingham, A., Richter, J., and Irwin, B. “CaptureFoundry: A GPU accelerated packet capture analysis tool”