The effect of part-of-speech tagging on IR performance for Turkish

Küçük Resim Yok

Tarih

2004

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer-Verlag Berlin

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In this paper, we experimentally evaluate the effect of the Part-of-Speech (POS) tagging on Information Retrieval performance for Turkish. We used four term-weighting schemas to index SABANCI-METU Turkish Treebank corpus. The term weighting schemas are "tf", "tf x idf", "Ltu.ltu", and "Okapi". Each weighting scheme is factored over three POS tagging cases that are namely "No POS tagging", "POS tag with no history (i.e. 1-gram)", and "POS tag with one step history (i.e. 2-gram)". The Meta-scoring function is used to analyze the effect of these nine factors on IR performance. Results show that weighting schema are significantly different from each other with a p-value of 0.04 (Friedman Non-parametric Test), but there is not enough evidence in the corpus to reject the null hypothesis that the three weighting schemas, on the average, show equal performance over the three cases of POS tagging with a p-value of 0.36.

Açıklama

19th International Symposium on Computer and Information Sciences (ISCIS 2004) -- OCT 27-29, 2004 -- Kemer Antalya, TURKEY

Anahtar Kelimeler

Kaynak

Computer and Information Sciences - Iscis 2004, Proceedings

WoS Q Değeri

N/A

Scopus Q Değeri

Cilt

3280

Sayı

Künye