Application of deep learning technique in next generation sequence experiments

Ozgur, Su; Orman, Mehmet

Application of deep learning technique in next generation sequence experiments

dc.authorscopusid	57199652078
dc.authorscopusid	58654602800
dc.contributor.author	Ozgur, Su
dc.contributor.author	Orman, Mehmet
dc.date.accessioned	2024-08-25T18:51:57Z
dc.date.available	2024-08-25T18:51:57Z
dc.date.issued	2023
dc.department	Ege Üniversitesi	en_US
dc.description.abstract	In recent years, the widespread utilization of biological data processing technology has been driven by its cost-effectiveness. Consequently, next-generation sequencing (NGS) has become an integral component of biological research. NGS technologies enable the sequencing of billions of nucleotides in the entire genome, transcriptome, or specific target regions. This sequencing generates vast data matrices. Consequently, there is a growing demand for deep learning (DL) approaches, which employ multilayer artificial neural networks and systems capable of extracting meaningful information from these extensive data structures. In this study, the aim was to obtain optimized parameters and assess the prediction performance of deep learning and machine learning (ML) algorithms for binary classification in real and simulated whole genome data using a cloud-based system. The ART-simulated data and paired-end NGS (whole genome) data of Ch22, which includes ethnicity information, were evaluated using XGBoost, LightGBM, and DL algorithms. When the learning rate was set to 0.01 and 0.001, and the epoch values were updated to 500, 1000, and 2000 in the deep learning model for the ART simulated dataset, the median accuracy values of the ART models were as follows: 0.6320, 0.6800, and 0.7340 for epoch 0.01; and 0.6920, 0.7220, and 0.8020 for epoch 0.001, respectively. In comparison, the median accuracy values of the XGBoost and LightGBM models were 0.6990 and 0.6250 respectively. When the same process is repeated for Chr 22, the results are as follows: the median accuracy values of the DL models were 0.5290, 0.5420 and 0.5820 for epoch 0.01; and 0.5510, 0.5830 and 0.6040 for epoch 0.001, respectively. Additionally, the median accuracy values of the XGBoost and LightGBM models were 0.5760 and 0.5250, respectively. While the best classification estimates were obtained at 2000 epochs and a learning rate (LR) value of 0.001 for both real and simulated data, the XGBoost algorithm showed higher performance when the epoch value was 500 and the LR was 0.01. When dealing with class imbalance, the DL algorithm yielded similar and high Recall and Precision values. Conclusively, this study serves as a timely resource for genomic scientists, providing guidance on why, when, and how to effectively utilize deep learning/machine learning methods for the analysis of human genomic data.	en_US
dc.description.sponsorship	Ege University Office of Scientific Research Projects (BAP) [TDK-2020-21725]	en_US
dc.description.sponsorship	This study was supported by Ege University Office of Scientific Research Projects (BAP) (Project ID: TDK-2020-21725).	en_US
dc.identifier.doi	10.1186/s40537-023-00838-w
dc.identifier.issn	2196-1115
dc.identifier.issue	1	en_US
dc.identifier.scopus	2-s2.0-85174451880	en_US
dc.identifier.scopusquality	Q1	en_US
dc.identifier.uri	https://doi.org/10.1186/s40537-023-00838-w
dc.identifier.uri	https://hdl.handle.net/11454/102782
dc.identifier.volume	10	en_US
dc.identifier.wos	WOS:001088044100001	en_US
dc.identifier.wosquality	Q1	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.language.iso	en	en_US
dc.publisher	Springernature	en_US
dc.relation.ispartof	Journal of Big Data	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.snmz	20240825_G	en_US
dc.subject	Next generation sequencing	en_US
dc.subject	Deep learning	en_US
dc.subject	Machine learning	en_US
dc.subject	Variant calling format	en_US
dc.subject	Cloud computing	en_US
dc.subject	Convolutional Neural-Networks	en_US
dc.subject	Prediction	en_US
dc.subject	Identification	en_US
dc.subject	Algorithms	en_US
dc.subject	Alignment	en_US
dc.title	Application of deep learning technique in next generation sequence experiments	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Application of deep learning technique in next generation sequence experiments

Dosyalar

Koleksiyon