Supporting academic facilitators


Saturday, June 3, 2017

Reliability of a test

The characteristic of a test about the consistency with which a test yields the same result in measuring whatever it does measure is called reliability (Swain et al, 2000).
Taiwo (1995) defines reliability as it refers to the consistency of measurement that is how consistent a test scores are from one measurement to other. For example, the students use a stop watch to measure time for 15 vibrations of a pendulum. They take the reading twice or thrice. If two of three times the reading is consistent then they proceed with it further. It means that the stop watch provides reliable readings.
Nature of reliability
Reliability refers to consistency of the results obtained with a test but not the test itself. It means that the results obtained by a tool or test are reliable not the tool or test is said to be reliable.  It refers to a particular interpretation of test scores. For example a test score which is reliable over a period of time may not be reliable from one test to another equivalent test.Reliability is a statistical concept. To determine the consistency, a test is administered once or more than once. Then the consistency is measured in terms of relative shifts. It is necessary but not a sufficient condition for validity ( Linn & Gronlund, 2000).
Functions of reliability
Reliability coefficient provides the most revealing statistical index of quality that is ordinarily available.Estimates of the reliability of test provide essential information for judging the technical quality and motivating efforts to improve the tests.Reliability estimation determines how much of the variability in test scores is due to measurement error and how much is due to variability in true scores (Swain et al, 2000).
Methods of determining reliability
Test-Retest Reliability
The test is administered twice on the same group to assess the consistency of a test scores over a period of time. The two tests are similar but not the same. Then the correlation between two sets of scores obtained by test and retest is found using Pearson product moment “r”. Test-retest reliability is best used for things that are stable over time, for example intelligence. Generally, reliability will be higher when little time has passed between two tests (Kubiszyne & Borich, 2003).
Equivalent /Parallel-Forms method
In parallel-forms method of determining reliability, the reliability is estimated by comparing two different tests that were created using the same content, difficulty, format and length at the same test. The two tests are administered to the same group within a short interval of time. Then the test scores of two tests are correlated. This correlation provides an index of equivalence. For example, in intermediate or secondary board examinations, two questions paper for a particular subject are constructed and named as paper A or paper B and some times paper C is prepared which show equivalent forms tests ( Linn & Gronlund, 2000).
Internal Consistency method
The consistency of test results across items on the same test is determined in this method of determining reliability of a test. Test items are compared with each other that measure the same construct to determine the test’s internal consistency. Questions are similar and designed to measure the same thing, the test taker should answer the same for both questions, which would indicate that the test has internal consistency(Swain et al, 2000). Three methods to find the internal consistency of a test known as split-half method and Kuder Richardson 21 formula and inter-rater internal consistency are given below.
Split-half method
Linn and Gronlund (2000) shares that the split-half method of determining internal consistency employs single administration of an even-number test on a sample of pupils. The test is divided into two equivalent halves and correlation for these half test scores is found. The test is divided into even numbered items such as 2,4,6…, in one half and odd numbers such as1,3,5,…., in another half.Then the scores of both the halves are correlated by using spearman brown formula. The formula is given below.
r2    = 2 r2/1+ r1
                                        Where            r2 = reliability coefficient on full test
                                      r1= correlation of coefficient between half tests
Kuder-Richardson formula 21 method

Linn & Gronlund (2003), states that it is another method of determining reliability using single administration of a test. It is known to provide conservative estimate of the split-half type of reliability. The procedure is based on the consistency of an individual’s performance from item to item and on the standard deviation of the test such that the reliability coefficient obtained denotes internal consistency of the test. Internal consistency here means the degree to which the items of a test measure a common attribute of the testee.  
Inter-rater Reliability
In this method two or more independent judges score the test. The scores are then compared to determine the consistency of the raters’ estimates. One way to test inter-rater reliability is to assign each rater score each test. For example, each rater might score items on a scale from 1 to 10. Then the correlation between the two ratings is found to determine the level of inter-rater reliability. Another means of testing inter-rater reliability is to have raters determine which category each observation falls into and then calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate (Swain et al, 2000).
Factors affecting reliability

Factors related to test which affect the reliability of a test are, length of the test, content of the test, characteristics of test items and spread of scores. If the time for taking a test is short then the reliability of the test will be affected. If the content of the test is not the representative of the whole content to be tested than the reliability of the test will be reduced. The more spread of the test score, the less the reliability of a test. Factors related to testee which affect reliability of a test are; heterogeneity of the group, test wiseness of the students and motivation of the students. Time limit of the test and cheating opportunity given to the students are the factors related to testing procedure which affect the reliability of the test (linn & Gronlund, 2003).
Kubiszyne, T., &Borich, G. (2003). Educational testing and measurement: Classroom                                         application and practice (7thed.). New York: John Wiley & sons. 
Linn, R. L., &Gronlund, N.E. (2000). Measurement and assessment in teaching (8thed.).    Delhi:                     Pearson Education.
Rehman, A. (2007). Development and validation of objective test items analysis in the subject                           physics for class IX in Rawalpindi city. Retrieved May 12, 2009 form International                               Islamic university, Department of Education Web site:                                                                   
Swain, S. K., Pradhan, C., &Khotoi, S. P. K. (2000). Educational measurement: Statistics             and                guidance. Ludhiana: Kalyani.
Taiwo, A. A. (1995). Fundamentals of classroom testing. New Delhi: Vikas publishing house.