Supporting the cybercrime investigation process: Effective discrimination of source code authors based on byte-level information

Küçük Resim Yok

Tarih

2007

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer-Verlag Berlin

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Source code authorship analysis is the particular field that attempts to identify the author of a computer program by treating each program as a linguistically analyzable entity. This is usually based on other undisputed program samples from the same author. There are several cases where the application of such a method could be of a major benefit, such as tracing the source of code left in the system after a cyber attack, authorship disputes, proof of authorship in court, etc. In this paper, we present our approach which is based on byte-level n-gram profiles and is an extension of a method that has been successfully applied to natural language text authorship attribution. We propose a simplified profile and a new similarity measure which is less complicated than the algorithm followed in text authorship attribution and it seems more suitable for source code identification since is better able to deal with very small training sets. Experiments were performed on two different data sets, one with programs written in C++ and the second with programs written in Java. Unlike the traditional language-dependent metrics used by previous studies, our approach can be applied to any programming language with no additional cost. The presented accuracy rates are much better than the best reported results for the same data sets.

Açıklama

2nd International Conference on E-Business and Telecommunication Networks -- OCT 03-07, 2005-2007 -- Reading, ENGLAND

Anahtar Kelimeler

source code authorship analysis, software forensics, security

Kaynak

E-Business and Telecommunication Networks

WoS Q Değeri

N/A

Scopus Q Değeri

Cilt

3

Sayı

Künye