Nội dung text 04 PsyAs - Psychometric Properties (1).pdf
● All factors associated with the process of measuring some variable, other than the variable being measured ● Ex. a ruler may be accurate in some areas but not all ● Error: refers to the component of the observed test score that does not have to do with the test taker's ability ● Random Error: source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process ○ Affects precision ○ Also called “noise:” ○ Ex. physical events that occur during a test ● Systematic Error: source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured ○ Affects accuracy ○ Either consistently inflate scores or consistently deflate scores ○ Ex. a miscalibrated scale ○ Could result to a Type I Error or Type II Error ○ Likelihood of both errors to occur can be minimized by increasing the sample size Test Construction ● Item Sampling: refer to variation among items within a test as well as to variation among items between tests ○ Also called content sampling ○ Extent to which a testtaker’s score is affected by the content sampled on a test and by the way the content is sampled ○ Ex the way in which the item is constructed Test Administration ● Testtaker’s reactions to error variance that occur during test administration are the source of one kind of error variance ● Test Environment: room temperature, level of lighting, and amount of ventilation and noise ● Testtaker Variables: emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication ● Examiner-Related Variables: examiner’s physical appearance and demeanor Test Scoring and Interpretation ● Technical glitches in computer scoring may contaminate data ● Element of subjectivity in scoring TYPES OF RELIABILITY Test-Retest Reliability – evaluates the stability of a measure by correlating pairs of scores from the same people on 2 different administrations ● Source of error is time sampling ● Ideal time is 2-4 weeks ● Appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time (ex. Personality trait) ● ↑ interval between tests = ↓ correlation / reliability ● Coefficient of Stability: estimate of test-retest reliability when the interval between testing is > 6 months ● Most appropriate in gauging the reliability of tests that employ outcome measures such as reaction time or perceptual judgments Parallel/Alternate Forms Reliability – evaluates the correlation between 2 different forms of a test ● Coefficient of Equivalence: estimate of alternate-forms or parallel-forms reliability ● Parallel Forms Reliability: for each form of the test, the means and the variances of observed test scores are equal ○ Means of scores obtained on parallel forms correlate equally with the true score ● Alternate Forms Reliability: different versions of a test that have been constructed so as to be parallel ○ Typically designed to be equivalent with respect to variables such as content and level of difficulty ○ Can be time-consuming and expensive ○ Ex. Army Alpha and Army Beta ○ Immediate Form: administered at the same time ○ Delayed Form: interval between both administrations ● Both forms must have balanced difficulty and high internal consistency 2 | @studywithky
Split-Half Reliability – method of internal consistency that correlates 2 pairs of scores obtained from equivalent halves of a single test administered once ● Appropriate when evaluating psychological variables that are more state-like than trait-like ● Steps ○ Divide the test into equivalent halves ■ Top-Bottom (least reliable) ■ Odd-Even ○ Calculate a Pearson r between scores on the two halves of the test ○ Adjust the half-test reliability using the Spearman–Brown formula ● Odd-Even Reliability: assigning odd-numbered items to one half of the test and even-numbered items to the other half Measurement Tools for Internal Consistency ● Spearman–Brown Formula: used to estimate internal consistency reliability from a correlation between two halves of a test ○ Can also be used to estimate the effect of shortening the test on the test’s reliability ○ ↑ length = ↑ reliability ○ could also be used to determine the number of items needed to attain a desired level of reliability ● Coefficient Alpha ○ Also called cronbach’s alpha ○ Measure non-dichotomous items ○ May range in value from 0 to 1 only ○ Helps answer questions about how similar sets of data are ○ Accurately measures internal consistency when multiple loadings are equal (ex. Likert scale) ● Kuder-Richardson Formula ○ KR-20: dichotomous items with varying levels of difficulty ○ KR-21: dichotomous items with uniform level of difficulty ● Average Proportional Distance: measure used to evaluate internal consistency of a test that focuses on the degree of differences that exists between item scores ○ Not connected to the number of items on a measure ★ Spearman-Brown Formula (half test) rxy = Pearson r in the original-length test n = Number of items in the revised version divided by the number of items in the original version ★ Spearman-Brown Formula (whole test) rhh = Pearson r of scores in the two half tests n becomes 2 in this equation ★ Coefficient Alpha rα = Coefficient Alpha k = number of items σ2 i = Variance of 1 item σ2 = Variance of the total test scores CRONBACH’S ALPHA RANGES α ≥ 0.9 Excellent Consistency 0.9 > α ≥ 0.8 Good Consistency 0.8 > α ≥ 0.7 Acceptable Consistency 0.7 > α ≥ 0.6 Questionable Consistency 0.6 > α ≥ 0.5 Poor Consistency 0.5 > α Unacceptable Interrater Reliability – degree of agreement or consistency between two or more scorers with regard to a particular measure ● Also called scorer reliability, judge reliability, observer reliability, and inter-scorer reliability ● Often used when coding nonverbal behavior ● Coefficient of Inter-Scorer Reliability: degree of consistency among scorers in the scoring of a test ● Kappa Statistics: used for nominal data ○ Cohen’s Kappa: used to measure the level of agreement between two raters or judges only ○ Fleiss Kappa: determine the level of agreement between two or more raters ● Kendall’s W: used for rankings / ordinal data COHEN’S KAPPA RANGES 1.0 Perfect Agreement 0.81 - 0.99 Near Perfect Agreement 0.61 - 0.80 Substantial Agreement 0.41 - 0.60 Moderate Agreement 0.21 - 0.40 Fair Agreement 0.10 - 0.20 Slight Agreement 0.0 No Agreement Nature of the Test 3 | @studywithky