Chapter 4 Reliability


are inaccuracy in our measurements. Between Actual and Observed. The less errors, the more reliable the test

Basis of test score theory

(We never able to measure accurately the reality)

X = T + E
where X is observed, T is true, and E is error (inaccuracy)
Errors (Random and Systematic Error)

Types of Errors

Systematic errors occurs due to instrumentation, for example, use the wrong scale. If you know what went wrong you can correct for it, but Random errors are out of control. You don't know what/where went wrong exactly.

Standard error of measurement

Tells us on average how much a score varies from the true score. ?

Domain Sampling Model

Measure : Actual

It's like you want to see what is behind the door (What you want to measure), so the items on your tests are holes you drill on the door (Your measures), the more holes you got, the more clearly you can see behind the door.

Reliability Coefficient

Is the ratio between difference (variance) of true and difference (variance) of observed scores.

Reliability = Variance of true scores / variance of observed scores
e.g .40 reliability means that only 40% reliable, 60% occurs by chance.

Source of errors

Errors (inaccuracy) can come from situational (time), the person, or not representativeness of domain.
Counteraction such as

Test-Retest Method

administer same test for two times, to test for consistency. Usually for stable traits (IQ etc.). Then correlate the two test.

Carryover effects

Have to watch out for carry over effects, 1st test influence 2nd test. An example will be practice effect. Many things can happen in between the two test. Time too close to each other may allow for practice effect, they remember what they did… time too far apart may bring in historical, maturation etc..

Parallel Forms Method

Compare 2 equivalent forms of the test that measure the same attributes. 2 forms use different items. However rule used to select items of a particular difficulty remain the same.

Can give 2 forms (counterbalanced) to same group, same day.

Split-Half Method

Odd-even system to split the test, score them and correlate them. Use "Spearman-brown formula" to correct the problem of short item list. The more item it has, the more reliable it is.. because you split the test into 2, it's not accurate.. so got to use this formula to correct it.

Kuder-Richardson Formula 20

Is use to measure Internal consistency reliability for dichotomous choices.

Coefficient Alpha (Cronbach's Alpha)

Extension to KR20 Formula, it can handle non-dichotomous choices.

Correlation Coefficient

The correlation coefficient is a number between 0 and 1. If there is no relationship between the predicted values and the actual values the correlation coefficient is 0 or very low (the predicted values are no better than random numbers). As the strength of the relationship between the predicted values and actual values increases so does the correlation coefficient. A perfect fit gives a coefficient of 1.0. Thus the higher the correlation coefficient the better.

Internal consistency

The correlations between different items on the same test (or subscale), whether each item measure the same thing.

How good is good enough?

Depends on the purpose of the test.

What to do with low reliability?

  1. Increase the number of items - "drill more holes on the door", use sphereman-brown prophecy formula to calculate how many items more to add to increase reliability
  2. Factor and item analysis (Determinability analysis) - to remove useless items that reduces reliability
  3. Correction for attenuation - rid a correlation coefficient from the weakening effect of measurement error" from wikipedia, no idea what it means
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License