Description of Turkish Paraphrase Corpus Structure and Generation Method

Karaoglan, Bahar; Kisla, Tarik; Metin, Senem Kumova

Description of Turkish Paraphrase Corpus Structure and Generation Method

Tarih

2018

Yazarlar

Karaoglan, Bahar

Kisla, Tarik

Metin, Senem Kumova

Yayıncı

Springer International Publishing Ag

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Because developing a corpus requires a long time and lots of human effort, it is desirable to make it as resourceful as possible: rich in coverage, flexible, multipurpose and expandable. Here we describe the steps we took in the development of Turkish paraphrase corpus, the factors we considered, problems we faced and how we dealt with them. Currently our corpus contains nearly 4000 sentences with the ratio of 60% paraphrase and 40% non-paraphrase sentence pairs. The sentence pairs are annotated at 5-scale: paraphrase, encapsulating, encapsulated, non-paraphrase and opposite. The corpus is formulated in a database structure integrated with Turkish dictionary. The sources we used till now are news texts from Bilcon 2005 corpus, a set of professionally translated sentence pairs from MSRP corpus, multiple Turkish translations from different languages that are involved in Tatoeba corpus and user generated paraphrases.

Açıklama

17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 03-09, 2016 -- Mevlana Univ, Konya, TURKEY
KISLA, TARIK/0000-0001-9007-7455; KARAOGLAN, BAHAR/0000-0001-9338-7491

Anahtar Kelimeler

Turkish, Paraphrase, Corpus generation

Kaynak

Computational Linguistics and Intelligent Text Processing, (Cicling 2016), Pt I

WoS Q Değeri

N/A

Cilt

9623

Bağlantı

https://doi.org/10.1007/978-3-319-75477-2_13
https://hdl.handle.net/11454/70664

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Description of Turkish Paraphrase Corpus Structure and Generation Method

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon