Investigation of Luhn's claim on information retrieval

Küçük Resim Yok

Tarih

2011

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Tubitak Scientific & Technical Research Council Turkey

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In this study, we show how Luhn's claim about the degree of importance of a word in a document can be related to information retrieval. His basic idea is transformed into z -scores as the weights of terms for the purpose of modeling terra frequency (If) within documents. The Luhn-based models represented in this paper are considered as the TF component of proposed TF x IDF weighing schemes. Moreover, the final term weighting functions appropriate for the TF x IDF weighting scheme are applied to TREC-6, -7, and -8 databases. The experimental results show relevance to Luhn's claim by having high mean average precision (MAP) for the terms with frequencies around the mean frequency of terms within a document. On the other hand, the weighting, which significantly discriminates the importance between low/high frequencies and medium frequencies, degrades the retrieval performance. Therefore, any weighting scheme (TF) that is directly proportional to If has a probability of high retrieval performance, if this can optimally indicate the difference of the importance regarding tf values and also optimally eliminate the terms that have high frequencies.

Açıklama

Anahtar Kelimeler

Luhn, information retrieval, term weighting, indexing

Kaynak

Turkish Journal of Electrical Engineering and Computer Sciences

WoS Q Değeri

Q4

Scopus Q Değeri

Q3

Cilt

19

Sayı

6

Künye