Table of Contents

Errors
are inaccuracy in our measurements. Between Actual and Observed. The less errors, the more reliable the test
Basis of test score theory
(We never able to measure accurately the reality)
X = T + E
where X is observed, T is true, and E is error (inaccuracy)
Errors (Random and Systematic Error)
Types of Errors
Systematic errors occurs due to instrumentation, for example, use the wrong scale. If you know what went wrong you can correct for it, but Random errors are out of control. You don't know what/where went wrong exactly.
Standard error of measurement
Tells us on average how much a score varies from the true score. ?
Domain Sampling Model
Measure : Actual
It's like you want to see what is behind the door (What you want to measure), so the items on your tests are holes you drill on the door (Your measures), the more holes you got, the more clearly you can see behind the door.
Reliability Coefficient
Is the ratio between difference (variance) of true and difference (variance) of observed scores.
Reliability = Variance of true scores / variance of observed scores
e.g .40 reliability means that only 40% reliable, 60% occurs by chance.
Source of errors
Errors (inaccuracy) can come from situational (time), the person, or not representativeness of domain.
Counteraction such as
TestRetest Method
administer same test for two times, to test for consistency. Usually for stable traits (IQ etc.). Then correlate the two test.
Carryover effects
Have to watch out for carry over effects, 1st test influence 2nd test. An example will be practice effect. Many things can happen in between the two test. Time too close to each other may allow for practice effect, they remember what they did… time too far apart may bring in historical, maturation etc..
Parallel Forms Method
Compare 2 equivalent forms of the test that measure the same attributes. 2 forms use different items. However rule used to select items of a particular difficulty remain the same.
Can give 2 forms (counterbalanced) to same group, same day.
SplitHalf Method
Oddeven system to split the test, score them and correlate them. Use "Spearmanbrown formula" to correct the problem of short item list. The more item it has, the more reliable it is.. because you split the test into 2, it's not accurate.. so got to use this formula to correct it.
KuderRichardson Formula 20
Is use to measure Internal consistency reliability for dichotomous choices.
Coefficient Alpha (Cronbach's Alpha)
Extension to KR20 Formula, it can handle nondichotomous choices.
Correlation Coefficient
The correlation coefficient is a number between 0 and 1. If there is no relationship between the predicted values and the actual values the correlation coefficient is 0 or very low (the predicted values are no better than random numbers). As the strength of the relationship between the predicted values and actual values increases so does the correlation coefficient. A perfect fit gives a coefficient of 1.0. Thus the higher the correlation coefficient the better.
Internal consistency
The correlations between different items on the same test (or subscale), whether each item measure the same thing.
How good is good enough?
Depends on the purpose of the test.
What to do with low reliability?
 Increase the number of items  "drill more holes on the door", use spheremanbrown prophecy formula to calculate how many items more to add to increase reliability
 Factor and item analysis (Determinability analysis)  to remove useless items that reduces reliability
 Correction for attenuation  rid a correlation coefficient from the weakening effect of measurement error" from wikipedia, no idea what it means