Computational Environment for Modeling and Analysing Network Traffic Behaviour using the Divide and Recombine Framework

Get BibTex-formatted data

Download

PDF

Author

Ashrith Barthur

Tech report number

CERIAS TR 2016-6

Entry type

phdthesis

Abstract

There are two essential goals of this research. The first goal is to design and construct a computational environment that is used for studying large and complex datasets in the cybersecurity domain. The second goal is to analyse the Spamhaus blacklist query dataset which includes uncovering the properties of blacklisted hosts and understanding the nature of blacklisted hosts over time. The analytical environment enables deep analysis of very large and complex datasets by exploiting the divide and recombine framework. The capability to analyse data in depth enables one to go beyond just summary statistics in research. This deep analysis is at the highest level of granularity without any compromise on the size of the data. The environment is also, fully capable of processing the raw data into a data structure suited for analysis. Spamhaus is an organisation that identifies malicious hosts on the Internet. Information about malicious hosts are stored in a distributed database by Spamhaus and served through the DNS protocol query-response. Spamhaus and other malicious-host-blacklisting organisations have replaced smaller malicious host databases curated independently by multiple organisations for their internal needs. Spamhaus services are popular due to their free access, exhaustive information, historical information, simple DNS based implementation, and reliability. The malicious host information obtained from these databases are used in the first step of weeding out potentially harmful hosts on the internet. During the course of this research work a detailed packet-level analysis was carried out on the Spamhaus blacklist data. It was observed that the query-responses displayed some peculiar behaviours. These anomalies were studied and modeled, and identified to be showing definite patterns. These patterns are empirical proof of a systemic or statistical phenomenon.

Download

PDF

Date

2016 – 10 – 14

Institution

Purdue University

Key alpha

information security, network security, statistics, computer science, DNS, anomalous behaviour,

Organization

Purdue University

School

Purdue University

Affiliation

Purdue University, H2O.Ai

Publication Date

2016-10-14

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Phdthesis{ information security, network security, statistics, computer science, DNS, anomalous behaviour,,
	title = "Computational Environment for Modeling  and Analysing Network Traffic Behaviour using the Divide and Recombine Framework",
	author = "Ashrith Barthur",
	year = "2016",
	month = "10",
	institution = "Purdue University",
	organization = "Purdue University",
	school = "Purdue University",
	day = "14",
	abstract = "There are two essential goals of this research. The first goal is to design and
construct a computational environment that is used for studying large and complex
datasets in the cybersecurity domain. The second goal is to analyse the Spamhaus
blacklist query dataset which includes uncovering the properties of blacklisted hosts
and understanding the nature of blacklisted hosts over time.
The analytical environment enables deep analysis of very large and complex
datasets by exploiting the divide and recombine framework. The capability to
analyse data in depth enables one to go beyond just summary statistics in research.
This deep analysis is at the highest level of granularity without any compromise on
the size of the data.
The environment is also, fully capable of processing the raw data into a data
structure suited for analysis.
Spamhaus is an organisation that identifies malicious hosts on the Internet.
Information about malicious hosts are stored in a distributed database by
Spamhaus and served through the DNS protocol query-response. Spamhaus and
other malicious-host-blacklisting organisations have replaced smaller malicious host
databases curated independently by multiple organisations for their internal needs.
Spamhaus services are popular due to their free access, exhaustive information,
historical information, simple DNS based implementation, and reliability. The
malicious host information obtained from these databases are used in the first step
of weeding out potentially harmful hosts on the internet.
During the course of this research work a detailed packet-level analysis was
carried out on the Spamhaus blacklist data. It was observed that the
query-responses displayed some peculiar behaviours. These anomalies were studied
and modeled, and identified to be showing definite patterns. These patterns are
empirical proof of a systemic or statistical phenomenon.",
	affiliation = "Purdue University, H2O.Ai",
}