Nội dung text 05 PsyAs - Test Development.pdf
05 – Test Development PSYAS | 2024 - 2025 | NOT FOR SALE OUTLINE 1. Defining Test Development 2. Test Conceptualization 3. Test Construction 4. Test Tryout 5. Item Analysis 6. Test Revision DEFINING TEST DEVELOPMENT DEFINING TEST DEVELOPMENT Test Development – an umbrella term for all that goes into the process of creating a test ● Occurs in 5 stages: ○ Test Conceptualization ○ Test Construction ○ Test Tryout ○ Item Analysis ○ Test Revision ● Some tests are conceived of and constructed but neither tried-out, nor item-analyzed, nor revised TEST CONCEPTUALIZATION DEFINING TEST CONCEPTUALIZATION Test Conceptualization – process of conceptualizing the construct, items included, and design of the test ● Step 1: Describe purpose and rationale for test ● Step 2: Describe the target population for the test ● Step 3: Clearly define the key variable of interest ● Step 4: Create item specifications ● Step 5: Choose item format ● Step 6: Specify administration and scoring procedures Preliminary Questions ● What is the test designed to measure? ● What is the objective of the test? ● Is there a need for this test? ● Who will use this test? ● Who will take this test? ● What content will the test cover? ● How will the test be administered? ● What is the ideal format of the test? ● Should more than one form of the test be developed? ● What special training will be required of test users for administering or interpreting the test? ● What types of responses will be required of test takers? ● Who benefits from the administration of this test? ● Is there any potential for harm as the result of an administration of this test? ● How will meaning be attributed to scores on this test? Target Population – relevant characteristics to consider ● Age ● Educational Status ● Language (which language and level of language ability) ● Literacy Level ● Disabilities Variable of Interest – developed by referencing ● Theory ● Empirical Literature ● Cultural Definitions ● DSM-V or other diagnostic criteria Item Specifications ● List major content areas to be included in the test ○ Can include number of items ● For clinical tests, the DSM-V (or other guidelines for diagnosis) can provide guidelines for item specifications ● Otherwise, develop item specifications from theory or definitions of variable TYPES OF INSTRUMENTS Self-Report – participants are asked to report their own attitudes, beliefs, knowledge, feelings, and behavior ● Self-report can be either a questionnaire or interview ● Typically assess attitudes, beliefs, values, interests, knowledge, feelings, and overt behavior PROS CONS Questionnaire Easy to administer and score Can be administered to larger numbers of test takers The data is typically not as in-depth as an interview Requires a high level of literacy Interview Gathers rich, in-depth information Can ask test-taker clarify certain responses Can be used with any test-taker (e.g., cannot Can be time-consuming compared to questionnaires Typically subject to issues of inter-rater reliability 1 | @studywithky
read) Interview – one-on-one conversation between an interviewer and an interviewee ● Open-Ended Questions: offer respondents a fixed set of choices to select from ○ Ex. How do you feel about your experience with us? ● Closed-Ended Questions: does not expect a specific, narrow answer ○ Ex. From a scale of 1 to 10, how would you rate your experience? ● Structured Interviews: method that relies on asking questions in a set order to collect data on a topic ● Semi-Structured Interview: a few questions are predetermined, whereas the other questions aren’t planned ● Unstructured Interview: none of the questions are predetermined ● Focus Group: questions are presented to a group instead of one individual TYPES OF QUESTIONS PROS CONS Open-Ended Provides in-depth information Provides more detailed qualitative data Allow for unlimited responses Time-consuming Difficult to compare and analyze Lower response rates May include irrelevant information Closed-Ended Easy and quick to answer Scoring is objective and simple to analyze Responses are consistent Lack nuance Not all answers are covered May be misinterpreted by respondent May lead the respondent to think differently Structured Reduced bias Increased credibility, reliability and validity Simple, cost-effective and efficient little opportunity to build rapport Questions cannot be altered or removed without damaging the quality of the interview Limited scope Pilot Testing – preliminary research surrounding the creation of a prototype of the test ● Also called Pilot Work, Pilot Study, Pilot Research ● Attempts to determine how best to measure a targeted construct ● Entail lit reviews and experimentation, creation, revision, and deletion of preliminary items TEST CONSTRUCTION DEFINING TEST CONSTRUCTION Test Construction – involves writing test items, revisions, formatting, setting scoring rules COMPARATIVE SCALES OF MEASUREMENT Scaling – process of setting rules for assigning numbers in measurement ● Age-Based: age is of critical interest ● Grade-Based: grade is of critical interest ● Stanine: if all raw score of the test are to be transformed into scores that range from 1-9 ● Unidimensional: only one dimension is presumed to underlie the ratings ● Multidimensional: more than one dimension Paired Comparison – produces ordinal data by presenting with pairs of two stimuli which they are asked to compare ● Respondent is presented with two objects at a time and asked to select one object according to some criterion ● Data obtained is ordinal in nature BRAND Coke Pepsi Sprite Limca Coke Pepsi ✔ ✔ Sprite ✔ Limca ✔ ✔ ✔ # of times preferred 3 1 2 0 Rank Order 2 | @studywithky
-1 Unsatisfactory -2 Poor -3 Terrible OTHER SCALING METHODS Rating Scale – grouping of words, statements, or symbols on which judgments of the strength of a particular trait are indicated by the testtaker ● Can be used to record judgments of oneself, others, experiences, or objects ● can take several forms Likert Scale – each item presents the testtaker with five alternative responses (sometimes seven) ● Usually on an agree–disagree or approve–disapprove continuum ● Used to scale attitudes ● Usually reliable Thurstone Scale – involves the collection of a variety of different statements about a phenomenon which are ranked by an expert panel in order to develop the questionnaire ● Unidimensional scale that is used to track respondent’s behavior, attitude or feeling towards a subject ● First formal technique to measure an attitude Categorical Scaling – stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum ● Relies on sorting Guttman Scale – yields ordinal-level measures ● Items on it range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured ● All respondents who agree with the stronger statements of the attitude will also agree with milder statements ● Scalogram Analysis: item-analysis procedure and approach to test development that involves a graphic mapping of a testtaker’s responses ● Objective for the developer of a measure of attitudes is to obtain an arrangement of items wherein endorsement of one item automatically connotes endorsement of less extreme positions ITEM FORMATS Item Pooling – reservoir or well from which the items will or will not be drawn for the final version of the test ● A comprehensive sampling provides a basis for content validity of the final version of the test ● The test developer may write a large number of items from personal experience or academic acquaintance with the subject matter or experts Item Format – form, plan, structure, arrangement, and layout of individual test items ● Selected-Response Format: require test takers to select response from a set of alternative responses ○ Multiple-Choice Format ○ Matching Item ○ Binary Choice ● Constructed-Response Format: requires test takers to supply or to create the correct answer, not merely selecting it ○ Completion Item ○ Short-Answer Item ○ Essay Item TYPES OF ITEM FORMATS Multiple-Choice Has 3 elements - Stem (question) - Correct option - Incorrect alternatives (distractors or foils) Should have one correct answer, grammatically parallel alternatives, similar length, alternatives that fit grammatically with the stem Avoid ridiculous distractors, not excessively long, “all of the above”, “none of the above” Probability of getting the correct answer is 25% Easy to score but prone to guessing Matching Item Test taker is presented with two columns: Premises and Responses Should be fairly short and to the point and only one premise would match to one response Binary Choice True-False Item Usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact Contains single idea and not subject to debate Probability of obtaining the correct 4 | @studywithky