Reliability Of a Test

The characteristic of a test about the consistency with which a test yields the same result in measuring whatever it does measure is called reliability (Swain et al, 2000).

Taiwo (1995) defines reliability as it refers to the consistency of measurement that is how consistent test scores are from one measurement to another. For example, the students use a stopwatch to measure time for 15 vibrations of a pendulum. They take the reading twice or thrice. If two or three times the reading is consistent then they proceed with it further. It means that the stopwatch provides reliable readings.

Nature of reliability

Reliability refers to the consistency of the results obtained with a test but not the test itself. It means that the results obtained by a tool or test are reliable not the tool or test is said to be reliable. It refers to a particular interpretation of test scores. For example, a test score that is reliable over a period of time may not be reliable from one test to another equivalent test. Reliability is a statistical concept. To determine the consistency, a test is administered once or more than once. Then the consistency is measured in terms of relative shifts. It is necessary but not a sufficient condition for validity ( Linn & Gronlund, 2000).

Functions of reliability

The reliability coefficient provides the most revealing statistical index of quality that is ordinarily available. Estimates of the reliability of tests provide essential information for judging the technical quality and motivating efforts to improve the tests. Reliability estimation determines how much of the variability in test scores is due to measurement error and how much is due to variability in true scores (Swain et al, 2000).

Methods of determining the reliability

Test-Retest Reliability

The test is administered twice on the same group to assess the consistency of test scores over a period of time. The two tests are similar but not the same. Then the correlation between two sets of scores obtained by test and retest is found using Pearson product-moment “r”. Test-retest reliability is best used for things that are stable over time, for example, intelligence. Generally, reliability will be higher when little time has passed between two tests (Kubiszyne & Borich, 2003).

Equivalent /Parallel-Forms method

In the parallel-forms method of determining reliability, the reliability is estimated by comparing two different tests that were created using the same content, difficulty, format, and length at the same test. The two tests are administered to the same group within a short interval of time. Then the test scores of two tests are correlated. This correlation provides an index of equivalence. For example, in intermediate or secondary board examinations, two questions paper for a particular subject are constructed and named as paper A or paper B, and sometimes paper C is prepared which show equivalent forms tests ( Linn & Gronlund, 2000).

Internal Consistency method

The consistency of test results across items on the same test is determined in this method of determining the reliability of a test. Test items are compared with each other that measure the same construct to determine the test’s internal consistency. Questions are similar and designed to measure the same thing, the test taker should answer the same for both questions, which would indicate that the test has internal consistency(Swain et al, 2000). Three methods to find the internal consistency of a test known as split-half method and Kuder Richardson 21 formula and inter-rater internal consistency are given below.

Split-half method

Linn and Gronlund (2000) share that the split-half method of determining internal consistency employs single administration of an even-number test on a sample of pupils. The test is divided into two equivalent halves and a correlation for these half test scores is found. The test is divided into even-numbered items such as 2,4,6…, in one half and odd numbers such as1,3,5,…., in another half. Then the scores of both the halves are correlated by using the spearman brown formula. The formula is given below.

r₂= 2 r₂/1₊ r₁

Where r_{2 =}reliability coefficient on the full test

r₁₌correlation of coefficient between half tests

Kuder-Richardson formula 21 method

Linn & Gronlund (2003), states that it is another method of determining reliability using single administration of a test. It is known to provide a conservative estimate of the split-half type of reliability. The procedure is based on the consistency of an individual’s performance from item to item and on the standard deviation of the test such that the reliability coefficient obtained denotes the internal consistency of the test. Internal consistency here means the degree to which the items of a test measure a common attribute of the testee.

Inter-rater Reliability

In this method, two or more independent judges score the test. The scores are then compared to determine the consistency of the raters’ estimates. One way to test inter-rater reliability is to assign each rater score each test. For example, each rater might score items on a scale from 1 to 10. Then the correlation between the two ratings is found to determine the level of inter-rater reliability. Another means of testing inter-rater reliability is to have raters determine which category each observation falls into and then calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate (Swain et al, 2000).

Factors affecting reliability

Factors related to testing which affect the reliability of a test are the length of the test, the content of the test, characteristics of test items, and spread of scores. If the time for taking a test is short then the reliability of the test will be affected. If the content of the test is not representative of the whole content to be tested then the reliability of the test will be reduced. The more spread of the test score, the less the reliability of a test. Factors related to testee which affect the reliability of a test are; heterogeneity of the group, test wiseness of the students, and motivation of the students. The time limit of the test and the cheating opportunity given to the students are the factors related to the testing procedure that affect the reliability of the test (Linn & Gronlund, 2003).

Reference:

Kubiszyne, T., &Borich, G. (2003). Educational testing and measurement: Classroom application and practice (7^thed.). New York: John Wiley & sons.

Linn, R. L., &Gronlund, N.E. (2000). Measurement and assessment in teaching (8^thed.). Delhi: Pearson Education.

Rehman, A. (2007). Development and validation of objective test items analysis in the subject physics for class IX in Rawalpindi city. Retrieved May 12, 2009, from International Islamic University, Department of Education Web site: http://eprints.hec.gov.pk/2518/1/2455.htm.

Swain, S. K., Pradhan, C., &Khotoi, S. P. K. (2000). Educational measurement: Statistics and guidance. Ludhiana: Kalyani.

Taiwo, A. A. (1995). Fundamentals of classroom testing. New Delhi: Vikas publishing house.

Fun Page

Mobile Menu

Social Mobile