Identifying Collocations in Turkish Using Statistical Methods

Metin, Senem Kumova; Karaoglan, Bahar

Identifying Collocations in Turkish Using Statistical Methods

Tarih

2016

Yazarlar

Metin, Senem Kumova

Karaoglan, Bahar

Yayıncı

Ahmet Yesevi Univ

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Collocation is the combination of words in which words appear together more often than by chance in order to create a block of meaning. Since the extraction of collocations provides many benefits in automatic processing, translation of Turkish texts and in learning Turkish, it is an important issue in Turkish natural language processing. In this study several statistical techniques, including occurrence frequency, pointwise mutual information and hypothesis tests, are applied on Turkey Turkish corpus to automatically identify collocations. We have utilized both stemmed and surface forms of words in order to explore the effect of stemming in collocation extraction. The techniques are evaluated using the F-measure. The chi-square hypothesis test and pointwise mutual information methods have produced better results compared to other methods. In addition, we have observed that when words are stemmed, methods which may be considered as successful in collocation extraction may be more clearly discriminated.

Anahtar Kelimeler

Collocation, Turkey Turkish, natural language processing, corpus

Kaynak

Bilig

WoS Q Değeri

Q4

Sayı

78

Bağlantı

https://hdl.handle.net/11454/52656

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Identifying Collocations in Turkish Using Statistical Methods

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon