Alexander N. Mikoyan,
Using CBR Techniques to Detect Plagiarism in Computing
Assignments
Abstract: The problems of case retrieval in CBR and
plagiarism detection have in common a need to detect close but
not exact matches between exemplars. In this paper we describe a
plagiarism detection system that has been inspired by ideas from
CBR research. In particular this system can detect similarities
between programs without performing exhaustive comparisons on all
exemplars. Our analysis of similarity in this well controlled
domain offers some insights into the kinds of profiles that can
be used in similarity assessment in general. We argue that the
choice of a perspicuous profile is crucial to any classification
task and determining the best predictive features may require
significant analysis of the problem domain.
Ivan
V. Krsul,
Authorship Analysis: Identifying The Author Of A
Program
Abstract: In this paper we show that it is possible to
identify the author of a piece of software by looking at
stylistic characteristics of C source code. We also show that
there exist a set of characteristics within a program that are
helpful in the identification of a programmer, and whose
computation can be automated with a reasonable cost. There are
four areas that benefit directly from the findings we present
herein: the legal community can count on empirical evidence to
support authorship claims, the academic community can count on
evidence that supports authorship claims of students, industry
can count on identifying the author of previously un-identifiable
software modules, and real time intrusion detection systems can
be enhanced to include information regarding the authorship of
all locally compiled programs. We show that it is possible to
identify the author of a piece of software by collecting and
identifying eighty-eight programs for twenty nine students, staff
and faculty members at Purdue University.
Ivan Krsul Eugene H.
Spafford,
Authorship Analysis: Identifying The Author of a
Program
Keywords: authorhip analysis, authentication
Abstract: Authorship analysis on computer software is a
difficult problem. In this paper we explore the classification of
programmers' style, and try to find a set of characteristics that
remain constant for a significant portion of the programs that
this programmer might produce. Our goal is to show that it is
possible to identify the author of a program by examining
programming style characteristics. Within a closed environment,
the results of this paper support the conclusion that, for a
specific set of programmers, it is possible to identify the
author of any individual program. Also, based on previous work
and our observations during the experiments described herein we
believe that the probability of finding two programmers who share
exactly those same characteristics should be very small.
Eugene
H. Spafford, Stephen
A. Weeber,
Software Forensics: Can We Track Code To Its
Authors?
Abstract: Viruses, worms, trojan horses, and crackers all
exist and threaten the security of our computer systems. Often,
we are aware of an intrusion only after it has occurred. On some
occasions, we may have a fragment of code left behind used by an
adversary to gain access or damage the system. A natural question
to ask is "Can we use this remnant of code to positively identify
the culprit?" In this paper, we detail some of the features of
code remnants that might be analyzed and then used to identify
their authors. We further outline some of the difficulties
involved in tracing an intruder by analyzing code.
Built by Mark Crosbie and Ivan Krsul.