Determining the reliability of Artificial Intelligence programs in medical faculty board exams

Artificial intelligence in medical school

Authors

  • Arif Keskin giresun university
  • Tayfun Aygün

DOI:

https://doi.org/10.12669/pjms.41.12.12855

Keywords:

Artificeducationsial Intelligence, ChatGPT, Gemini, Copilot, Anatomy.

Abstract

Objective: In recent years, artificial intelligence (AI) applications have become widespread in many fields, including medical education. This study aims to examine the reliability of widely used generative AI programs, such as Microsoft Copilot, Google Gemini and OpenAI ChatGPT, by evaluating the accuracy of first- and second-year medical students' responses to anatomy questions on mid-term board, final and make-up exams.

Methodology: Total 286 anatomy questions from the 2023-2024 academic year, 222 with analysis reports were included in the study. The difficulty levels of the questions were divided into four groups (very difficult, difficult, medium, easy) based on students' correct answer rates. The same questions were then posed to three AI applications. The data were analyzed by SPSS-version 27.

Results: According to the findings, Copilot, ChatGPT and Gemini achieved significantly higher accuracy compared to students, with 97.7% accuracy, 94.4% accuracy and 86.5% accuracy, respectively. However, Gemini and ChatGPT remained similar to students, particularly on very challenging questions. Gemini did not perform as well on questions requiring basic knowledge (first year) as on questions requiring clinical interpretation (second year).

Conclusion: While the study found that AI applications provide higher accuracy compared to students, systems that fail to achieve 100% accuracy are not suitable for unrestricted and unsupervised use in critical basic medical sciences like anatomy. Because AI models lack clinical reasoning and human experience, they should be used only as supplementary educational tools and integrated in a controlled manner to enhance student success.

Downloads

Published

2025-11-26

How to Cite

Keskin, A., & Aygün, T. (2025). Determining the reliability of Artificial Intelligence programs in medical faculty board exams: Artificial intelligence in medical school. Pakistan Journal of Medical Sciences, 41(12), 3354–3358. https://doi.org/10.12669/pjms.41.12.12855

Issue

Section

Original Articles