Supporting the cybercrime investigation process: Effective discrimination of source code authors based on byte-level information
Küçük Resim Yok
Tarih
2007
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Springer-Verlag Berlin
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
Source code authorship analysis is the particular field that attempts to identify the author of a computer program by treating each program as a linguistically analyzable entity. This is usually based on other undisputed program samples from the same author. There are several cases where the application of such a method could be of a major benefit, such as tracing the source of code left in the system after a cyber attack, authorship disputes, proof of authorship in court, etc. In this paper, we present our approach which is based on byte-level n-gram profiles and is an extension of a method that has been successfully applied to natural language text authorship attribution. We propose a simplified profile and a new similarity measure which is less complicated than the algorithm followed in text authorship attribution and it seems more suitable for source code identification since is better able to deal with very small training sets. Experiments were performed on two different data sets, one with programs written in C++ and the second with programs written in Java. Unlike the traditional language-dependent metrics used by previous studies, our approach can be applied to any programming language with no additional cost. The presented accuracy rates are much better than the best reported results for the same data sets.
Açıklama
2nd International Conference on E-Business and Telecommunication Networks -- OCT 03-07, 2005-2007 -- Reading, ENGLAND
Anahtar Kelimeler
source code authorship analysis, software forensics, security
Kaynak
E-Business and Telecommunication Networks
WoS Q Değeri
N/A
Scopus Q Değeri
Cilt
3