IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION

dc.authoridGüven, Zekeriya Anıl/0000-0002-7025-2815
dc.authorscopusid57202999818
dc.authorscopusid22978771800
dc.authorscopusid57193650397
dc.authorwosidGüven, Zekeriya Anıl/AAO-3360-2021
dc.contributor.authorGuven, Zekeriya Anil
dc.contributor.authorDiri, Banu
dc.contributor.authorCakaloglu, Tolgahan
dc.date.accessioned2023-01-12T20:19:13Z
dc.date.available2023-01-12T20:19:13Z
dc.date.issued2022
dc.departmentN/A/Departmenten_US
dc.description.abstractData analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help over-come this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we bench-mark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classi-fication. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature ex-tractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.en_US
dc.identifier.doi10.7494/csci.2022.23.3.4622
dc.identifier.endpage396en_US
dc.identifier.issn1508-2806
dc.identifier.issn2300-7036
dc.identifier.issn1508-2806en_US
dc.identifier.issn2300-7036en_US
dc.identifier.issue3en_US
dc.identifier.scopus2-s2.0-85141215608en_US
dc.identifier.scopusqualityQ4en_US
dc.identifier.startpage377en_US
dc.identifier.urihttps://doi.org/10.7494/csci.2022.23.3.4622
dc.identifier.urihttps://hdl.handle.net/11454/79062
dc.identifier.volume23en_US
dc.identifier.wosWOS:000869727000004en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherAgh Univ Science & Technology Pressen_US
dc.relation.ispartofComputer Science-Aghen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectTopic Modelingen_US
dc.subjectHeadline Classificationen_US
dc.subjectMachine Learningen_US
dc.subjectText Classificationen_US
dc.subjectLatent Dirichlet Allocationen_US
dc.subjectData Analysisen_US
dc.titleIMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATIONen_US
dc.typeArticleen_US

Dosyalar