Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education

dc.authoridLee, Mark/0000-0002-1264-3260
dc.authoridAriaeinejad, Ali/0000-0003-3494-0765
dc.authoridJurado-Nunez, Alma/0000-0001-5643-4287
dc.authoridChan, Teresa/0000-0001-6104-462X
dc.authoridYilmaz, Yusuf/0000-0003-4378-4418
dc.authorscopusid57202001975
dc.authorscopusid57742510100
dc.authorscopusid57219120832
dc.authorscopusid57742418700
dc.authorscopusid14056775500
dc.authorscopusid36542149100
dc.authorwosidChan, Teresa/T-6676-2017
dc.authorwosidYilmaz, Yusuf/C-5948-2015
dc.contributor.authorYilmaz, Yusuf
dc.contributor.authorNunez, Alma Jurado
dc.contributor.authorAriaeinejad, Ali
dc.contributor.authorLee, Mark
dc.contributor.authorSherbino, Jonathan
dc.contributor.authorChan, Teresa M.
dc.date.accessioned2023-01-12T20:11:46Z
dc.date.available2023-01-12T20:11:46Z
dc.date.issued2022
dc.departmentN/A/Departmenten_US
dc.description.abstractBackground: Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part of a global decision about learner competence. Current approaches with qualitative data require a human rater to maintain attention and appropriately weigh various data inputs within the constraints of working memory before rendering a global judgment of performance. Objective: This study explores natural language processing (NLP) and machine learning (ML) applications for identifying trainees at risk using a large WBA narrative comment data set associated with numerical ratings. Methods: NLP was performed retrospectively on a complete data set of narrative comments (ie, text-based feedback to residents based on their performance on a task) derived from WBAs completed by faculty members from multiple hospitals associated with a single, large, residency program at McMaster University, Canada. Narrative comments were vectorized to quantitative ratings using the bag-of-n-grams technique with 3 input types: unigram, bigrams, and trigrams. Supervised ML models using linear regression were trained with the quantitative ratings, performed binary classification, and output a prediction of whether a resident fell into the category of at risk or not at risk. Sensitivity, specificity, and accuracy metrics are reported. Results: The database comprised 7199 unique direct observation assessments, containing both narrative comments and a rating between 3 and 7 in imbalanced distribution (scores 3-5: 726 ratings; and scores 6-7: 4871 ratings). A total of 141 unique raters from 5 different hospitals and 45 unique residents participated over the course of 5 academic years. When comparing the 3 different input types for diagnosing if a trainee would be rated low (ie, 1-5) or high (ie, 6 or 7), our accuracy for trigrams was 87%, bigrams 86%, and unigrams 82%. We also found that all 3 input types had better prediction accuracy when using a bimodal cut (eg, lower or higher) compared with predicting performance along the full 7-point rating scale (50%-52%). Conclusions: The ML models can accurately identify underperforming residents via narrative comments provided for WBAs. The words generated in WBAs can be a worthy data set to augment human decisions for educators tasked with processing large volumes of narrative assessments.en_US
dc.description.sponsorship2020 Canadian Association of Emergency Physicians (CAEP) Emergency Medicine Advancement Fund; Scientific and Technological Research Council of Turkey (Turkiye Bilimsel ve Teknolojik Arastirma Kurumu, TUBITAK) Postdoctoral Fellowship granten_US
dc.description.sponsorshipThis study was supported by the 2020 Canadian Association of Emergency Physicians (CAEP) Emergency Medicine Advancement Fund. YY is the recipient of the The Scientific and Technological Research Council of Turkey (Turkiye Bilimsel ve Teknolojik Arastirma Kurumu, TUBITAK) Postdoctoral Fellowship grant.en_US
dc.identifier.doi10.2196/30537
dc.identifier.issn2369-3762
dc.identifier.issue2en_US
dc.identifier.pmid35622398en_US
dc.identifier.scopus2-s2.0-85132025607en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.urihttps://doi.org/10.2196/30537
dc.identifier.urihttps://hdl.handle.net/11454/78188
dc.identifier.volume8en_US
dc.identifier.wosWOS:000848716700003en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakPubMeden_US
dc.language.isoenen_US
dc.publisherJmir Publications, Incen_US
dc.relation.ispartofJmir Medical Educationen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectnatural language processingen_US
dc.subjectmachine learning algorithmsen_US
dc.subjectcompetency-based medical educationen_US
dc.subjectassessmenten_US
dc.subjectmedical educationen_US
dc.subjectmedical residentsen_US
dc.subjectmachine learningen_US
dc.subjectwork performanceen_US
dc.subjectprediction modelsen_US
dc.subjectModular Assessment Programen_US
dc.subjectPerformanceen_US
dc.subjectResidentsen_US
dc.subjectCommitteesen_US
dc.subjectReliabilityen_US
dc.subjectCognitionen_US
dc.subjectScoresen_US
dc.subjectMcmapen_US
dc.titleHarnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Educationen_US
dc.typeArticleen_US

Dosyalar