IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION

Guven, Zekeriya Anil; Diri, Banu; Cakaloglu, Tolgahan

IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION

dc.authorid	Güven, Zekeriya Anıl/0000-0002-7025-2815
dc.authorscopusid	57202999818
dc.authorscopusid	22978771800
dc.authorscopusid	57193650397
dc.authorwosid	Güven, Zekeriya Anıl/AAO-3360-2021
dc.contributor.author	Guven, Zekeriya Anil
dc.contributor.author	Diri, Banu
dc.contributor.author	Cakaloglu, Tolgahan
dc.date.accessioned	2023-01-12T20:19:13Z
dc.date.available	2023-01-12T20:19:13Z
dc.date.issued	2022
dc.department	N/A/Department	en_US
dc.description.abstract	Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help over-come this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we bench-mark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classi-fication. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature ex-tractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.	en_US
dc.identifier.doi	10.7494/csci.2022.23.3.4622
dc.identifier.endpage	396	en_US
dc.identifier.issn	1508-2806
dc.identifier.issn	2300-7036
dc.identifier.issn	1508-2806	en_US
dc.identifier.issn	2300-7036	en_US
dc.identifier.issue	3	en_US
dc.identifier.scopus	2-s2.0-85141215608	en_US
dc.identifier.scopusquality	Q4	en_US
dc.identifier.startpage	377	en_US
dc.identifier.uri	https://doi.org/10.7494/csci.2022.23.3.4622
dc.identifier.uri	https://hdl.handle.net/11454/79062
dc.identifier.volume	23	en_US
dc.identifier.wos	WOS:000869727000004	en_US
dc.identifier.wosquality	N/A	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.language.iso	en	en_US
dc.publisher	Agh Univ Science & Technology Press	en_US
dc.relation.ispartof	Computer Science-Agh	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Topic Modeling	en_US
dc.subject	Headline Classification	en_US
dc.subject	Machine Learning	en_US
dc.subject	Text Classification	en_US
dc.subject	Latent Dirichlet Allocation	en_US
dc.subject	Data Analysis	en_US
dc.title	IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION

Dosyalar

Koleksiyon