IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION
dc.authorid | Güven, Zekeriya Anıl/0000-0002-7025-2815 | |
dc.authorscopusid | 57202999818 | |
dc.authorscopusid | 22978771800 | |
dc.authorscopusid | 57193650397 | |
dc.authorwosid | Güven, Zekeriya Anıl/AAO-3360-2021 | |
dc.contributor.author | Guven, Zekeriya Anil | |
dc.contributor.author | Diri, Banu | |
dc.contributor.author | Cakaloglu, Tolgahan | |
dc.date.accessioned | 2023-01-12T20:19:13Z | |
dc.date.available | 2023-01-12T20:19:13Z | |
dc.date.issued | 2022 | |
dc.department | N/A/Department | en_US |
dc.description.abstract | Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help over-come this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we bench-mark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classi-fication. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature ex-tractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets. | en_US |
dc.identifier.doi | 10.7494/csci.2022.23.3.4622 | |
dc.identifier.endpage | 396 | en_US |
dc.identifier.issn | 1508-2806 | |
dc.identifier.issn | 2300-7036 | |
dc.identifier.issn | 1508-2806 | en_US |
dc.identifier.issn | 2300-7036 | en_US |
dc.identifier.issue | 3 | en_US |
dc.identifier.scopus | 2-s2.0-85141215608 | en_US |
dc.identifier.scopusquality | Q4 | en_US |
dc.identifier.startpage | 377 | en_US |
dc.identifier.uri | https://doi.org/10.7494/csci.2022.23.3.4622 | |
dc.identifier.uri | https://hdl.handle.net/11454/79062 | |
dc.identifier.volume | 23 | en_US |
dc.identifier.wos | WOS:000869727000004 | en_US |
dc.identifier.wosquality | N/A | en_US |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Scopus | en_US |
dc.language.iso | en | en_US |
dc.publisher | Agh Univ Science & Technology Press | en_US |
dc.relation.ispartof | Computer Science-Agh | en_US |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Topic Modeling | en_US |
dc.subject | Headline Classification | en_US |
dc.subject | Machine Learning | en_US |
dc.subject | Text Classification | en_US |
dc.subject | Latent Dirichlet Allocation | en_US |
dc.subject | Data Analysis | en_US |
dc.title | IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION | en_US |
dc.type | Article | en_US |