Subjective quality of multiple choice questions used in undergraduate courses in orthopedics and other specialties

Background and Objective: Multiple Choice Questions (MCQs) can sample broad domains of knowledge efficiently and reliably. The MCQs of lower order C1(Cognitive Level1=Recall of knowledge) do not fulfill this purpose and those of higher cognitive order C2 (Cognitive Level2=Interpret) &C3(Cognitive Level3=Analyze) are better at assessing the problem solving capabilities of the student. Every good educational activity must be supported by quality examination to complete the objectives of a curriculum. The objective of the study was to evaluate MCQs presently being used in internal examinations of medical colleges in Lahore. Methods: Papers consisting of MCQs from Orthopedics other specialties were collected in June 2019 from different medical colleges of Lahore and reviewed by a senior medical teacher without blinding and without his knowing the scores the students had been awarded before. Question statement, clinical scenarios, options and other mistakes were assessed in each item on predetermined criteria. Cognitive level of the item was determined if it was asking for a recall/identify/ analyze response. The results were tabulated and compared in two groups i.e. Miscellaneous and Orthopedics. Results: Most of the items(total=589) in both groups were of C1 cognitive level though Orthopedics (229) were slightly better (χ2 = 49.882 P-Value = 0.000 (Statistically Significant). Miscellaneous group (360) was better in quality in making clinical scenarios (χ2 = 29.952 P-Value = 0.000 (Statistically Significant) and writing a question statement without confusion. Options were better written in both groups. A good percentage of items needed to be corrected for mistakes in spellings, grammar and segregation into under graduate level. Conclusions: The cognitive level of assessment tool s MCQs is quiet low in both groups especially clinical scenario construction can be improved. Mistakes in spellings, grammar and conceptual mediocrity is common in both groups.


INTRODUCTION
Most medical colleges teach both the undergraduate & the postgraduate. Objective or multiple choice item based papers are an integral part of summative and formative assessments world over 1 . Multiple choice question are being used regularly in various forms e.g "true/false" or "single best-answer" with the intention to assess knowledge. MCQs can sample broad domains of knowledge efficiently and reliably. 1,2 MCQs have traditionally been blamed for being poor in validity while advocates say that they are more reliable. Critics say MCQs promote factual recall and appreciation of isolated facts. But if the MCQs are carefully made, the single best answer MCQs may also test higher-order thinking skills. 1,2 Students are quick to learn from the assessments methods and adapt their learning techniques to pass the next examination which becomes more obvious if the curriculum and assessment are misaligned. Item Writing Flaws are frequently encountered during the (PreHoc) before actual use of the item in real examinations). Next review is done after (Post Hoc) their use. Most experts explain item making flaws and inadequacies in construct occur due to our deviation from the accepted guidelines of making MCQs. Inappropriately worded items, philosophically unchallenging scenarios, lower in cognitive level clinical problems and those soliciting recall will affect the performance of the students. A poorly written scenario may result in a recall rather than the intended analyze or interpretation response. Christian et al cite a study reporting that more than 90% of MCQs in an internal examination were of low cognitive levels and that 46.2% of these MCQs had item writing flaws in them. Coincidently the lower the cognitive level more frequent were the item writing flaws. [2][3][4] Baig et al in their study in 2014 while evaluating basic sciences examination items in Pakistan reported that the cognitive level of most of the SEQs (83.33%) and MCQs (60%) were at C1-recall level, respectively, and 69 Item Writing Flaws (46%) were found in 150 MCQs. 4 A study by Naeem et al., (2012) at Aga Khan University (AKU) and Baig et al (2016) agree that any betterment in item quality is bound to faculty development. 1,4,5 Another study reported 17% change in the quality of MCQs after attending a short training session about the construction of MCQS. [3][4][5] Each teaching activity is planned such as to modify cognitive abilities of learners so that they can analyze the clinical problems, solve them, think critically and interpret findings. They can only be made to do this successfully if the assessment does no solicit recall and factual knowledge. 1,6 Educationists insist that the assessment methods should be made known to the students beforehand which has important bearing upon their learning practices and preparation for examinations. 7 All examinations should be followed by a review later so that learning can be further improved.
The failure of the final outcome of a teaching activity culminates in shaping the final product i.e. the physicians having inadequate competencies and loss of patient care. 7,8 The scenario presented before the stem should provoke an analytic response leading to problem solving. The MCQs of lower order C1 do not fulfill this purpose and C2&C3 are better at assessing the problem solving capabilities of the student. A good educational activity must be supported by an equally purposeful high quality examination to complete the objectives of a curriculum. This necessitates an evaluation of the examination material regularly which became the reason for our present study.

METHODS
Papers consisting of MCQs from Orthopedics and other Specialties (Miscellaneous group) like medicine, surgery, ENT and urology were collected in June 2019 from different medical colleges of Lahore and reviewed by a senior medical teacher without blinding and without knowing the scores the students had been awarded before. All these had been used a least once in the internal examinations of the final year class. No student results were recalled from college records and each item was analyzed by the same senior medical teacher. Each MCQ item contained a stem and five options. A true response to an item was awarded one mark, while an incorrect response would result in the no deduction. The problem stated in the item was assessed to see if the student would respond by recalling book knowledge as taken as C1. A clinical problem leading to identifying a problem or needing further investigation through more modalities was labeled as C2. C3 was taken as for those items where the question needed a management response. All those scenarios where the diagnosis was very straight forward like a book picture( where reading the data would lead to a single classical conclusion e.g. pain right iliac fossa along with suggestive findings takes one to acute appendicitis) was taken as C1. The question was reviewed for being clear in content and intent i.e. the question or the clinical problem should lead to one option more than the others. Those which seemed less focused or vague in nature were segregated in groups. Each item was analyzed for spelling, grammatical, structural deficiencies like absence of a question statement and those beginning with an Arabic number instead of words were noted separately. The level of the question being suitable to be used in undergraduate examinations.
Those thought to be above the level expected of an undergraduate were marked as Post graduate question. Scenario given in the question was evaluated to be Focused-0, Unfocused can lead to more than one similar options-1, Vague Description-2, Logical clues-3, Data not in sequence/unnecessary information-4. Options were evaluated as being close to the true answer or for being confusing such as "none of the above or all of the above" was taken as a major fault. They were assigned as score: No Fault-0, Irrelevant-1, Implausible-2, Except/All/None of the above-3 and Unfocused-4.

RESULTS
In all 360 items for Miscellaneous group were included and 229 for Orthopedics were included. All items were systematically evaluated as per predetermined criteria. Cognitive level of the items was found to be C1 in 187/360 (51.94%) in Miscellaneous Group, 52/229 (22.7%) in Orthopedics and 239/589(40.57%) when seen as a combined group Options in an MCQ form the most important part where distractors are added to provide a challenge. Most of the items in both groups showed that majority had no problem e.g. 310/360(86.11%)

DISCUSSION
Every teaching program depends highly upon the alignment of the assessment with the objectives of the curriculum. No learning outcome can be achieved until assessment and evaluation being done during the program is scientific and proactive (according to a pre-laid blue print). In an MCQ item the correct options should be defensibly correct and distracters should be defensibly incorrect. Multiple true and false option MCQs have a disadvantage over single best choice MCQs that they are more complicated in scoring and more difficult to make. 9 All paper setting faculty members should be trained to follow the blue print laid down earlier. 5 Only then can we improve the cognitive level of the MCQs and reduce the IWFs. Single best choice MCQs are preferred because they are easy to answer, convenient to conduct for teachers and they are versatile in use. But they are difficult to make in a quality manner. 9 Our study has found that Miscellaneous groups had more (51.94%) MCQs in C1 group than Orthopedics (22.7%) Table-I which points out more experience of the faculty at making examination material. Without active monitoring of the teaching/assessment methodology, the student will change their strategy to rote learning rather than develop an analytical approach to get a deeper grasp of the subject. 7 Faculty specialization has also led to a poorly balanced curriculum as each medical unit is now being occupied by super specialists like endocrinology or gastroenterologists who have lesser and lesser experience of teaching medicine. 8 Modern undergraduate curricula tend to include only basic information regarding subspecialties hence the item writing examiner has to be very careful so that one does not cross over to the post graduate level making the examination beyond the scope of his students e.g. the students are given very basic insight in the bone tumors and it becomes very difficult for the student if he is dragged into details and differential diagnosis of bone malignancies. Table-II shows a similar picture where overall assessment of the question statement shows that Miscellaneous group shows 97.22% questions to be more focused and Orthopedics group items were 85.2% adequately focused and rest were vague and confusing. Orthopedics department has to arrange less tests than medicine and surgery where the curriculum is bigger in size. When the items were analyzed for spelling mistakes and grammatical short comings in Miscellaneous group and in Orthopedics showed faults e.g. E-spellings +punctuation, EL-E plus grammatical mistakes, N-Insufficient data/needs review, B-Badly phrased Question, V-Very Bad Question, R-Review Needed, W-Wrong Option etc. shown in Table-III. These criteria have been suggested by us and are being used first time. It would need to stand test of time upon review periodically. When options in the items were assessed they were found to be of good quality in both groups Table-