Developing Pretrained Language Models for Turkish Biomedical Domain

Turkmen, Hazal; Dikenelli, Oguz; Eraslan, Cenk; Calli, Mehmet Cem; Ozbek, Suha Sureyya

Developing Pretrained Language Models for Turkish Biomedical Domain

Tarih

2022

Yazarlar

Yayıncı

Ieee

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Pretrained language models elevated with in-domain corpora show impressive results in biomedicine and clinical NLP tasks in English. However, there is minimal work in low-resource languages. This work introduces the BioBERTurk family, three pretrained models in Turkish for biomedicine. To evaluate models, we also introduce a labeled dataset to classify radiology reports of CT exams. Our first model was initialized from BERTurk and pretrained with biomedical corpus. The second model again continues to pretrain the general BERT model with a corpus of Ph.D. theses on radiology to test the effect of the task-related text. The final model combines radiology and biomedicine corpora with the corpus of BERTurk and pretrained a BERT model from scratch. F-scores of our models in the radiology resort classification are 92.99, 92.75, and 89.49 respectively. As far as we know, this is the first model that evaluates the effect of small size in-domain corpus in pretraining from scratch.

Açıklama

10th IEEE International Conference on Healthcare Informatics (IEEE ICHI) -- JUN 11-14, 2022 -- Rochester, MN

Anahtar Kelimeler

biomedicine, pretrained language model, transformer, transfer learning, radiology reports

Kaynak

2022 Ieee 10th International Conference On Healthcare Informatics (Ichi 2022)

WoS Q Değeri

N/A

Scopus Q Değeri

N/A

Bağlantı

https://doi.org/10.1109/ICHI54592.2022.00117
https://hdl.handle.net/11454/77466

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Developing Pretrained Language Models for Turkish Biomedical Domain

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon