1 (Re)Writing strong Multiple-Choice Questions (MCQs)Sarah Jacobs, M.Ed. OHSU Teaching and Learning Center
2 Objectives At the end of this session, participants will be able to:Understand the anatomy of an effective MCQ Critique sample MCQs Apply reliability and validity measurements to rewriting MCQs
3 Which of the following is most important to you?Revising existing MCQs Writing new MCQs General knowledge about test statistics and question-writing All of the above Something else
4 Anatomy of an MCQ OR ALTERNATIVES
5 Reliability and ValidityReliability – Does a test consistently measure student knowledge over time? Does the test have stable and consistent results? Validity – Does a test measure a learning outcome it purports to measure? The center of the target is the concept or learning outcome you want to measure. The first image shows a test that, if given multiple times, performance is not stable and varies wildly. It’s also not measuring the desired concept or outcome. The second image is, on average, measuring the concept or learning outcome, but results, again, are wildly varied and is unreliable The third image has very consistent measurement, and results are very similar, but isn’t measuring the correct concept or outcome The fourth image shows a test measuring the desired concept and is measuring consistently whether students understood that concept. Image from: https://commons.wikimedia.org/wiki/File:Reliability_and_validity.svg
6 Reliability and Validity – how are they measured?Reliability is measured by: Overall exam – KR 20 score Individual questions - Point biserial Validity is a little more complicated
7 Correcting Item-Writing Flaws (Ali & Ruit, 2015)A booklet from NBME and Correcting item-writing flaws increased functioning distractors and point biserial (item reliability). Note: knowing which distractors are functioning or not, means the question needs to be tested at least once so there are psychometrics. Why do we want functioning distractors? Because it helps discriminate whether the student achieved the learning outcome versus just being someone good at taking tests. Functioning Distractors >5% selection frequency Point Biserial (Item Reliability) Functioning distractors discriminate the good test-taker from someone who knows the content.
8 Item Analysis – what to review?Question Performance/Difficulty/Diff(p) < 0.70 Discrim < 0.25 Significant # students answered the same incorrect answer Point biserial < 0.2 Student queries
9 What would you flag for review?
10 Technical Item Flaws and Testwiseness (Case & Swanson, 2001)Students can answer questions based on test- taking skills alone
11 Testwiseness: Grammatical CuesA 60-year-old man is brought to the emergency department by the police, who found him lying unconscious on the sidewalk. After ascertaining that the airway is open, the first step in management should be intravenous administration of A. examination of cerebrospinal fluid B. glucose with vitamin B1 (thiamine) C. CT scan of the head D. phenytoin E. diazepam Testwiseness: Grammatical Cues One or more distractors don’t follow grammatically from the stem A and C do not match the stem grammatically and can be ruled out.
12 Testwiseness: Logical CuesCrime is A. equally distributed among the social classes B. overrepresented among the poor C. overrepresented among the middle class and rich D. primarily an indication of psychosexual maladjustment E. reaching a plateau of tolerability for the nation Testwiseness: Logical Cues A subset of the options are collectively exhausted A, B, and C are homogenous and contain all possibilities. Test wise students can rule out D and E easily, and so distractors are non-functioning
13 Testwiseness: Absolute TermsIn patients with advanced dementia, Alzheimer’s type, the memory defect A. can be treated adequately with phosphatidylcholine (lecithin) B. could be a sequela of early parkinsonism C. is never seen in patients with neurofibrillary tangles at autopsy D. is never severe E. possibly involves the cholinergic system Testwiseness: Absolute Terms Terms such as “always” or “never” are used in the options C and D are very absolute and so testwise students could easily rule them out. Focus the stem and make the distractors short.
14 Testwiseness: Long Correct AnswerSecondary gain is A. synonymous with malingering B. a frequent problem in obsessive-compulsive disorder C. a complication of a variety of illnesses and tends to prolong many of them D. never seen in organic brain damage Testwiseness: Long Correct Answer Correct answer is longer, more specific, or more complete than other options C is the correct answer.
15 Testwiseness: Word repeatsA 58-year-old man with a history of heavy alcohol use and previous psychiatric hospitalization is confused and agitated. He speaks of experiencing the world as unreal. This symptom is called A. Depersonalization B. Derailment C. Derealization D. focal memory deficit E. signal anxiety Testwiseness: Word repeats A word or phrase is included in the stem and in the correct answer. “Unreal” in the question stem, De”real”ization in the correct option.
16 Testwiseness: Word repeatsA 58-year-old man with a history of heavy alcohol use and previous psychiatric hospitalization is confused and agitated. He speaks of experiencing the world as unreal. This symptom is called A. Depersonalization B. Derailment C. Derealization D. focal memory deficit E. signal anxiety Testwiseness: Word repeats A word or phrase is included in the stem and in the correct answer.
17 Testwiseness: Convergence StrategyLocal anesthetics are most effective in the A. anionic form, acting from inside the nerve membrane B. cationic form, acting from inside the nerve membrane C. cationic form, acting from outside the nerve membrane D. uncharged form, acting from inside the nerve membrane E. uncharged form, acting from outside the nerve membrane Testwiseness: Convergence Strategy The correct answer includes the most elements in common with the other options See next slide
18 Testwiseness: Convergence StrategyThe correct answer includes the most elements in common with the other options Local anesthetics are most effective in the A. anionic form, acting from inside the nerve membrane B. cationic form, acting from inside the nerve membrane C. cationic form, acting from outside the nerve membrane D. uncharged form, acting from inside the nerve membrane E. uncharged form, acting from outside the nerve membrane Anionic appears only once, so A can be eliminated. Outside appears less than inside, so C and E can be eliminated. 3/5 options involve a charge (cation or anion), so D can be eliminated. Leaving B
19 Activity: Name that strategy!
20 What strategy can be used to answer this question?During the comprehensive periodontal evaluation of a new patient you note multiple sites in all four quadrants that probe 6-8 mm, have moderate interproximal bone loss, subgingival calculus, and demonstrate bleeding on probing. A review of the patient’s health history reveals the patient has type 2 diabetes mellitus. The patient is taking oral medications for diabetes and high cholesterol. Based on the above case scenario, please construct a proper periodontal treatment plan by selecting the best treatment option for the problem. Subgingival calculus: Medical consultation Oral hygiene instructions Scaling and root planning Reevaluation Referral to a periodontist (CORRECT ANSWER) Periodontal in question stem, periodontist in correct answer
21 What strategy can be used to answer this question?During the comprehensive periodontal evaluation of a new patient you note multiple sites in all four quadrants that probe 6-8 mm, have moderate interproximal bone loss, subgingival calculus, and demonstrate bleeding on probing. A review of the patient’s health history reveals the patient has type 2 diabetes mellitus. The patient is taking oral medications for diabetes and high cholesterol. Based on the above case scenario, please construct a proper periodontal treatment plan by selecting the best treatment option for the problem. Subgingival calculus: Medical consultation Oral hygiene instructions Scaling and root planning Reevaluation Referral to a periodontist Word repeat between stem and correct answer.
22 Technical Item Flaws and Irrelevant Difficulty (Case & Swanson, 2001)Make the question difficult for reasons irrelevant to the focus of the assessment
23 Irrelevant Difficulty: Options long, complicated or doublePeer review committees in HMOs may move to take action against a physician’s credentials to care for participants of the HMO. There is an associated requirement to assure that the physician receives due process in the course of these activities. Due process must include which of the following? A. Notice, an impartial forum, council, a chance to hear and confront evidence against him/her. B. Proper notice, a tribunal empowered to make the decision, a chance to confront witnesses against him/her, and a chance to present evidence in defense. C. Reasonable and timely notice, impartial panel empowered to make a decision, a chance to hear evidence against himself/herself and to confront witnesses, and the ability to present evidence in defense. Irrelevant Difficulty: Options long, complicated or double Stem contains extraneous reading Options long and/or complicated Shifts what is measured from content knowledge to reading speed
24 Irrelevant Difficulty: Numeric data not stated consistentlyFollowing a second episode of infection, what is the likelihood that a woman is infertile? A. Less than 20% B. 20 to 30% C. Greater than 50% D. 90% E. 75% Irrelevant Difficulty: Numeric data not stated consistently Numeric options should be listed in a single format Ranges or single terms (not both). Additionally, option C includes both D and E, which most certainly rules out D and E.
25 Irrelevant Difficulty: Frequency Terms are VagueSevere obesity in early adolescence A. usually responds dramatically to dietary regimens B. often is related to endocrine disorders C. has a 75% chance of clearing spontaneously D. shows a poor prognosis E. usually responds to pharmacotherapy and intensive psychotherapy Irrelevant Difficulty: Frequency Terms are Vague Frequency terms used in the options are vague, such as rarely, usually Research shows frequency terms are not consistently defined, even by experts. What does “usually” mean? What does “often” mean?
26 In a vaccine trial, year-old boys were given a vaccine against a certain disease and then monitored for five years for occurrence of the disease. Of this group, 85% never contracted the disease. Which of the following statements concerning these results is correct? A. No conclusion can be drawn, since no follow-up was made of nonvaccinated children B. The number of cases (ie, 30 cases over five years) is too small for statistically meaningful conclusions C. No conclusions can be drawn because the trial involved only boys D. Vaccine efficacy (%) is calculated as 85-15/100 Irrelevant Difficulty: Language not parallel, options in non-logical order Options are long and the language makes it difficult to determine which is the most correct. See following slide for a way this question could be rewritten for simplicity and clarity.
27 Original Suggested rewriteIn a vaccine trial, year-old boys were given a vaccine against a certain disease and then monitored for five years for occurrence of the disease. Of this group, 85% never contracted the disease. Which of the following statements concerning these results is correct? A. No conclusion can be drawn, since no follow-up was made of nonvaccinated children B. The number of cases (ie, 30 cases over five years) is too small for statistically meaningful conclusions C. No conclusions can be drawn because the trial involved only boys D. Vaccine efficacy (%) is calculated as 85-15/100 In a vaccine trial, year-old boys were given a vaccine against a certain disease and then monitored for five years for occurrence of the disease. Of this group, 85% never contracted the disease. For which of the following reasons can no conclusion be drawn from these data? A. No follow-up was made of nonvaccinated children B. The number of cases was too small C. The trial involved only boys D. [Insert new option]
28 Irrelevant Difficulty: None of the aboveThe diagnosis of a large ovarian cyst is most strongly suggested by A. an anterior dullness, lateral tympany B. a decreased peristalsis C. a fluid wave D. a shifting dullness E. none of the above Irrelevant Difficulty: None of the above Can be problematic when options are not absolutely true or false Knowledgeable students can craft an option that is more true than the one you have developed, and can make a case (and be confused by) for none of the above.
29 Irrelevant Difficulty: Stems are tricky or unnecessarily complicatedArrange the parents of the following children with Down’s syndrome in order of highest to lowest risk of recurrence. Assume that the maternal age in all cases is 22 years and that a subsequent pregnancy occurs within 5 years. The karyotypes of the daughters are: I: 46, XX, -14, +T (14q21q) pat II: 46, XX, -14, +T (14q21q) de novo III: 46, XX, -14, +T (14q21q) mat IV: 46, XX, -21, +T (14q21q) pat V: 47, XX, -21, +T (21q21q) (parents not karyotyped) A. III, IV, I, V, II B. IV, III, V, I, II C. III, I, IV, V, II D. IV, III, I, V, II E. III, IV, I, II, V Irrelevant Difficulty: Stems are tricky or unnecessarily complicated
30 Writing MCQs: What to AvoidTestwiseness: Logical cues Absolute terms Long correct answer Word repeats Convergence strategy Irrelevant Difficulty: Long, complicated options and/or stems Inconsistent data and/or language Vague terms Options are in a nonlogical order “None of the above” is used as an option The answer to an item is “hinged” to the answer of a related item Shoshana
31 Activity: What is wrong with this question?
32 Activity: What is wrong with this question?Who received a Nobel Prize for discovering the structure of DNA? Francis Crick James Watson Rosalind Franklin A and B B and C A and C Unnecessarily complicated answer choices.
33 Activity: What is wrong with this question?How many chromosomes are found in a typical human cell? 12 18 32 46 54 Answers overlap: A cell that has 46 chromosomes also has 32, 18 and 12 chromosomes, though these aren’t the complete count.
34 Activity: Critique QuestionsUsing the activity handout: Get into groups 2-4 You will be assigned a question number Review statistics Review for test-writing flaws Suggest rewrite Share results!
35 How to Write an MCQ Test main concepts, not triviaUse your objectives! Pose a clear question “Cover the options” rule Avoid item-writing flaws Answers should be homogenous Move away from basic recall and toward higher order thinking Now that we have talked about what not to do, what SHOULD we do? Cover the options rule – student should be able to know the answer a question without seeing the possible answers.
36 Moving away from basic recall to higher order questions, an exampleBasic Science Recall: What area is supplied with blood by the posterior inferior cerebellar artery? Basic Science Application of Knowledge: A 62-year-old man develops left-sided limb ataxia, Horner’s syndrome, nystagmus, and loss of appreciation of facial pain and temperature sensations. What artery is most likely to be occluded? Sarah
37 Objectives At the end of this session, participants will be able to:Understand the anatomy of an effective MCQ Critique sample MCQs Apply reliability and validity measurements to rewriting MCQs
38 Questions?
39 References Ali, S. H., & Ruit, K. G. (2015). The Impact of item flaws, testing at low cognitive level, and low distractor functioning on multiple-choice question quality. Perspect Med Educ Perspectives on Medical Education, 4(5), doi: /s x Brame, C., (2013) Writing good multiple choice test questions. Retrieved August 22, 2016 from https://cft.vanderbilt.edu/guides-sub- pages/writing-good-multiple-choice-test-questions/. Case, S. M., & Swanson, D. B. (2001). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners. Phelan, C., & Wren, J. (2005). Exploring Reliability In Academic Assessment. Retrieved August 23, 2016, from https://www.uni.edu/chfasoa/reliabilityandvalidity.htm