Reliability and Validity


A test must also be reliable. Reliability is “Self-correlation of the test.” It shows the extent to which the results obtained are consisted when the test is administered. Once or more than once on the same sample with a reasonable gap. Consistency in results obtained in a single administration is the index of internal consistency of the test and consistency in results obtained upon testing and retesting is the index of temporal consistency. Reliability thus, includes both internal consistency as well as temporal consistency. A test to be called sound must be reliable because reliability indicates the extent to which the scores obtained in the test are free from such internal defects of standardization, which are likely to produce errors of measurement.


Validity is another prerequisite for a test to be sound. Validity indicates the extent to which the test measure what it intends to measure, when compared with some outside independent criteria. In other words it is the correlation of the test with some outside criteria. The criteria should be independent one and should be regarded as the best index of trait or ability being measured by the test. Generally, validity of a test is dependent upon the reliability because a test which yields inconsistent results( poor reliability) is ordinarily not expected to correlate with some outside independent criteria.

Types of Reliability

  • Internal reliability
  • External reliability

Internal Reliability;

Internal reliability assesses the consistency of results across items within a test.

External Reliability;

External reliability refers to the extent to which a measure varies from one use to another.

Errors in Reliability:

At a time scores are not consistent because some other factors also affect reliability e.g.

  • Noise
  • Health
  • Time

There is always a chance of 5% error in reliability which is acceptable.


  • Random error
  • Systematic error

Random error

Random error exists in every measurement and is often major source of uncertainty. These errors have no particular assignable cause. These errors can never be totally eliminated or corrected. These are caused by many uncontrollable variables that are inevitable part of every analysis made by human being. These variables are impossible to identified, even if we identify some they cannot be measured because most of them are so small.

Systematic error

Systematic error is caused due to instruments, machines, and measuring tools. It is not due to individuals. Systematic error is acceptable we can fix and handled it.


Following are the methods to check reliability

  • Test-retest
  • Alternate form
  • Split –half method


It is the oldest and commonly used method of testing reliability. The test retest method assesses the external consistency of a test. Examples of appropriate tests include questionnaires and psycho metric tests. It measures the stability of a test over time.

A typical assessment would involve giving participants the same test on two separate occasions. Each and every thing from start to end will be same in both tests. Results of first test need to be correlated with the result of second test. If the same or similar results are obtained then external reliability is established.

The timing of the test is important if the duration is to brief then participants may recall information from the first test which could bias the results. Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results.

Utility and worth of a psychological test decreases with time so the test should be revised and updated. When tests are not revised systematic error may arise.


In alternate form two equivalent forms of the test are administered to the same group of examinees. An individual has given one form of the test and after a period of time the person is given a different version of the same test. The two form of the rest are then correlated to yield a coefficient of equivalence.

  • Positive point

In alternate form no deal to wait for time.

  • Negative point

It is very hectic and risky task to make two test of equivalent level.


The split half method assesses the internal consistency of a test. It measures the extent to which all parts of the test contribute equally to what is being measured. The test is technically spitted into odd and even form. The reason behind this is when  we making test we always have the items in order of increasing difficulty if we put (1,2,—-10) in one half and (11,12,—-20) in another half then all easy question/items will goes to one group and all difficult questions/items will goes to the second group.

When we split the test we should split it with same format/theme e.g. Multiple questions – multiple questions or blanks – blanks.


It refers to the extent to which test claim to measure what it claims to measure.

If a test is reliable then it is not necessary that it is valid but if a test is valid then it must be reliable.


  • External validity
  • Internal validity


It is the extent to which the results of a research study can be generalized to different situations, different groups of people, different settings, different conditions etc.


It is basically the extent to which a study is free from flaws and that any differences in a measurement are due to an independent variable.


  • Face validity
  • Construct validity
  • Criterion related validity

Face validity is determined by a review of the items and not through the use of statistical analysis. Face validity is not investigated through formal procedures. Instead anyone who looks over the test, including examinees, may develop an informal opinion as to whether or not the test is measuring what it is supposed to measure. While it is clearly of some value to have the test appear to be valid, face validity alone is insufficient for establishing that the test is measuring what it claims to measure.


It implies using the construct correctly (concepts, ideas, notions). Construct validity seeks agreement between a theoretical concept and a specific measuring device or procedure.

For example, a test of intelligence now a day’s must include measures of multiple intelligences, rather than just logical-mathematical and linguistic ability measures.


It states that the criteria should be clearly defined by the teacher in advance. It has to take into account other teachers criteria to be standardized and it also needs to demonstrate the accuracy of a measure or procedure compared to another measure or procedure which has already been demonstrated to be valid.

One thought on “Reliability and Validity

Leave a Reply

error: Content is protected !!
%d bloggers like this: