The characteristic of a test about the consistency
with which a test yields the same result in measuring whatever it does measure is
called reliability (Swain et al, 2000).
Taiwo (1995) defines reliability as it refers to the
consistency of measurement that is how consistent test scores are from one
measurement to another. For example, the students use a stopwatch to measure
time for 15 vibrations of a pendulum. They take the reading twice or thrice. If
two or three times the reading is consistent then they proceed with it further.
It means that the stopwatch provides reliable readings.
Nature
of reliability
Reliability refers to the consistency of the results
obtained with a test but not the test itself. It means that the results
obtained by a tool or test are reliable not the tool or test is said to be
reliable. It refers to a particular
interpretation of test scores. For example, a test score that is reliable over
a period of time may not be reliable from one test to another equivalent test. Reliability
is a statistical concept. To determine the consistency, a test is administered
once or more than once. Then the consistency is measured in terms of relative
shifts. It is necessary but not a sufficient condition for validity ( Linn
& Gronlund, 2000).
Functions
of reliability
The reliability coefficient provides the most revealing
statistical index of quality that is ordinarily available. Estimates of the
reliability of tests provide essential information for judging the technical
quality and motivating efforts to improve the tests. Reliability estimation
determines how much of the variability in test scores is due to measurement
error and how much is due to variability in true scores (Swain et al, 2000).
Methods
of determining the reliability
Test-Retest Reliability
The test is administered twice on the same group to
assess the consistency of test scores over a period of time. The two tests
are similar but not the same. Then the correlation between two sets of scores obtained
by test and retest is found using Pearson product-moment “r”. Test-retest
reliability is best used for things that are stable over time, for example, intelligence.
Generally, reliability will be higher when little time has passed between two tests (Kubiszyne & Borich, 2003).
Equivalent /Parallel-Forms method
In the parallel-forms method of determining reliability,
the reliability is estimated by comparing two different tests that were created
using the same content, difficulty, format, and length at the same test. The two
tests are administered to the same group within a short interval of time. Then
the test scores of two tests are correlated. This correlation provides an index
of equivalence. For example, in intermediate or secondary board examinations,
two questions paper for a particular subject are constructed and named as paper
A or paper B, and sometimes paper C is prepared which show equivalent forms
tests ( Linn & Gronlund, 2000).
Internal Consistency method
The consistency of test results across items on the
same test is determined in this method of determining the reliability of a test. Test
items are compared with each other that measure the same construct to determine
the test’s internal consistency. Questions are similar and designed to measure
the same thing, the test taker should answer the same for both questions, which
would indicate that the test has internal consistency(Swain et al, 2000). Three methods
to find the internal consistency of a test known as split-half method and Kuder
Richardson 21 formula and inter-rater internal consistency are given below.
Split-half method
Linn and Gronlund (2000) share that the split-half
method of determining internal consistency employs single administration of an
even-number test on a sample of pupils. The test is divided into two equivalent
halves and a correlation for these half test scores is found. The test is divided
into even-numbered items such as 2,4,6…, in one half and odd numbers such
as1,3,5,…., in another half. Then
the scores of both the halves are correlated by using the spearman brown formula. The
formula is given below.
r2 =
2 r2/1+ r1
Where r2 = reliability
coefficient on the full test
r1= correlation of coefficient between
half tests
Kuder-Richardson formula 21 method
Linn & Gronlund (2003), states that it is another method of determining reliability using single administration of a test. It is known to provide a conservative estimate of the split-half type of reliability. The procedure is based on the consistency of an individual’s performance from item to item and on the standard deviation of the test such that the reliability coefficient obtained denotes the internal consistency of the test. Internal consistency here means the degree to which the items of a test measure a common attribute of the testee.
Inter-rater Reliability
In this method, two or more independent judges score
the test. The scores are then compared to determine the consistency of the
raters’ estimates. One way to test inter-rater reliability is to assign each
rater score each test. For example, each rater might score items on a scale
from 1 to 10. Then the correlation between the two ratings is found to
determine the level of inter-rater reliability. Another means of testing
inter-rater reliability is to have raters determine which category each
observation falls into and then calculate the percentage of agreement between
the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater
reliability rate (Swain et al, 2000).
Factors affecting reliability
Factors related to testing
which affect the reliability of a test are the length of the test, the content of the
test, characteristics of test items, and spread of scores. If the time for
taking a test is short then the reliability of the test will be affected. If
the content of the test is not representative of the whole content to be
tested then the reliability of the test will be reduced. The more spread of the
test score, the less the reliability of a test. Factors related to testee which affect the reliability of a test are;
heterogeneity of the group, test wiseness of the students, and motivation
of the students. The time limit of the test and the cheating opportunity given to the
students are the factors related to the testing procedure that affect the reliability of the test (Linn & Gronlund,
2003).
Reference:
Kubiszyne,
T., &Borich, G. (2003). Educational testing and measurement: Classroom application and practice (7thed.).
New York: John Wiley & sons.
Linn,
R. L., &Gronlund, N.E. (2000). Measurement and assessment in teaching (8thed.).
Delhi: Pearson Education.
Rehman, A. (2007). Development and validation of objective test items analysis in the
subject physics for class IX in
Rawalpindi city. Retrieved May 12, 2009, from International Islamic University, Department of
Education Web site: http://eprints.hec.gov.pk/2518/1/2455.htm.
Swain,
S. K., Pradhan, C., &Khotoi, S. P. K. (2000). Educational measurement:
Statistics and guidance.
Ludhiana: Kalyani.
Taiwo, A. A. (1995). Fundamentals of classroom
testing. New Delhi: Vikas publishing house.
0 Comments
Post a Comment