Improving the Quality of Testing in the Fire Service

1 Improving the Quality of Testing in the Fire ServiceCon...

Author: Corey Fields

0 downloads 1 Views

1 Improving the Quality of Testing in the Fire ServiceContact Dr. Ben Hirst Performance Training Systems, Inc. 395 Tequesta Dr., Suite B-2 Tequesta, FL 33469 –

2 Criterion-Referenced TestingA criterion-referenced test is an assessment method provided for translating test scores into a statement about the behavior to be expected of a person with that score. Additionally, a candidate’s score can be compared in relation to a specified subject matter. A criterion-referenced test is also known as a standards-based assessment test. Source: USLegal.com

3 Mastery Learning Mastery learning (or, as it was initially called, "learning for mastery") is an instructional strategy and educational philosophy, first formally proposed by Benjamin Bloom in Mastery learning maintains that students must achieve a level of mastery (e.g., 90% on a knowledge test) in prerequisite knowledge before moving forward to learn subsequent information. If a student does not achieve mastery on the test, they are given additional support in learning and reviewing the information and then tested again. This cycle continues until the learner accomplishes mastery, and they may then move on to the next stage. Source: Writings of Benjamin Bloom and Dr. W. James Popham

4 GAP Analyses A technique that businesses use to determine what steps need to be taken in order to move from its current state to its desired, future state. Also called need-gap analysis, needs analysis, and needs assessment. Gap analysis consists of (1) listing of characteristic factors (such as attributes, competencies, performance levels) of the present situation ("what is"), (2) listing factors needed to achieve future objectives ("what should be"), and then (3) highlighting the gaps that exist and need to be filled. Gap analysis forces a company to reflect on who it is and ask who they want to be in the future. Source:

5 Importance of Currency and ValidityDocument currency and validity process Assure certification or licensing agencies meet: EEOC Uniform Guidelines for Employee Selection American Psychological Association’s Standards for Educational and Psychological Testing Good practices in the testing industry

6 Importance of Currency and Validity (continued)Currency means knowledge and skills/abilities being tested are up-to-date. Validity means test-item measures what it was intended to measure. Currency and validity are achieved by: reference to most recent NFPA Standards. reference to most recent training materials. job incumbent agreement that knowledge and skills/abilities are currently required on the job.

7 What is Test-Item Validity?Validity is the ability of a test item to measure what it was intended to measure.

8 Types of Validity Face Validity Technical Content ValidityJob/Construct Validity Predictive Validity

9 Face Validity-SME ApproachSubject matter experts are trained in test-item writing. Subject matter experts write test items. SME work is reviewed by job incumbents in the subject area.

10 Technical Content ValidityUse current resources. Document title, publisher, edition, and page number(s). Get and record permission to use any copyrighted materials.

11 Training Materials Current editions Publicly availableAppropriate level for target population

12 Manufacturer’s InformationEquipment manuals Technical specifications Engineering drawings

13 Job/Construct ValidityUse a team of job incumbents to: Check technical accuracy Review reference materials Reach consensus that knowledge/ability being measured is required on the job.

14 Managing the Testing ProcessGood testing practices are more than test-item currency and validity: Selecting and training test proctors Written tests Oral tests Performance tests Establishing effectiveness and integrity of test proctoring and administration

15 Managing the Testing Process (continued)Maintaining test security and control Establishing role of test proctor/examiner Written tests Oral tests Performance tests

16 Scoring the test - Musts for Reporting IntegrityDouble-score test to ensure there are no scoring errors. Apply the Standard Error of Measurement to your cut score. Protect scores and reports in keeping with Freedom of Information Act requirements.

17 True Score Range or Considering Standard Error of Measurement (SEM)Possible High Score = 73% Cut Score = 70% Possible Lowest Score = 67% True Score Range is from 67% to 73% or the cut score plus and minus the SEM

18 True Score Range True Score Range is the cut score plus and minus the SEM. In this case, the SEM is 3%

19 Improving Test Items/Tests Using DataSpeak with facts about test-item quality. Conduct a test-item/test analysis Test-item difficulty Test-item discrimination

20 Test-Item Difficulty A computed value ranging from .0 to 1.00 for each test item Often called “P” Factor Percent that get the test item right .10 means 10% answered correctly (Difficult) .95 means 95% answered correctly (Easy)

21 Test-Item Difficulty Computed Value P Factor Difficulty LevelHigh High Moderate Low Low

22 Test-Item DiscriminationComputed value ranging from -1.0 to +1.0 for each test item. 25% of high scorers’ responses identified 25% low scorers’ responses identified Middle scoring 50% ignored Formula = H(r) - L (r) / N

23 Range of Item DiscriminationNegative None Positive What condition(s) will yield 0 discrimination? What condition(s) will yield -1.0 discrimination? What condition(s) will yield +1.0 discrimination?

24 Analyzing Marking PatternsTest-item marking patterns (A-D, N=100) a. (15) (15%) b. (27) (27%) c. (0) (0%) *d. (58) (58%) What is the difficulty? Discrimination situation: 15 high scorers correctly answered the item; 10 low scorers missed it. Is the discrimination (+) or (-) ?

25 Test Reliability The extent that a test consistently measures what it is intended to measure. A computed value with a range of The higher the value, the more reliable the test Test Reliability = .43 (Low reliability) Test Reliability = .88 (High reliability)

26 Real Life Fortunately, we now have incredible tools in our computers! The right programs can: Randomly generate tests Automate the scoring/reporting process Compute item difficulty and discrimination Calculate test reliability Print out test-item marking patterns Provide summary reports that make improving testing programs much easier

27 Beware of whistles, bells, and wishful thinking!Make sure features of the test bank are easy to use and make your testing program better. Nothing is worse than having great features manipulating invalid test items.

28 Four MUSTS to Look For: Are the test items current and valid?Face, technical content, and job content valid Are the tests generated reliable? Reports of reliability, standard error of measurement and other statistics are available Reliability Validity Currency

29 Four MUSTS (continued)Is there training and technical support? Organized and competently delivered training and support Do the test items/tests meet National Standards and good practices in the testing industry? American Psychological Association recommended criteria EEOC’s “Uniform Guidelines for Employee Selection”

30 Get Help from Professionals!Testing technology is complex! Get help by automating everything possible. Find a competent professional to advise you as you put together/improve the quality of your testing and instructional programs.

31 What does all this mean? Meeting recognized standards for testing can be systematic if you: know what to do. know how to do it or get it done. keep a constant eye on improving test-items and tests by: Computing item analyses, studying marking patterns, computing reliability to provide a benchmark.

32 What does this have to do with certification of firefighters?Testing data analyses can improve instruction by: Identifying weakness in curriculum Improving lesson planning Eliminating out-of-date or trivial content Improving instructor professionalism Rewarding outstanding instructors and instructional institutions

33 What is next in your quest for improving testing?The answer to this question is up to you! Actions should include a GAP Analysis: Assessing where you are in striving toward a systematic testing program Determining what is missing Filling the gaps with automation and/or professional assistance

34 Thank you for the opportunity to discuss these important issues and for your attention and participation.

Improving the Quality of Testing in the Fire Service

Recommend Documents