Selection a best suitable candidate for a job is an essential function of HR department. And, the effective selection is depending to a large degree on the basic testing concepts of validity and reliability.
Reliability: It is a test’s first major requirement and refers to its consistency. A test is said to be reliable only when the result an outcome is consistent on identical test obtained form same person at two different occasions.
Validity (Legal acceptance): It measures to prove that something is true or correct. In other words, validity tells us whether the test is measuring what we think it’s supposed to be measuring.
“Reliability means that the selection methods, tests and ensuing results are consistent and do not vary with time, place or different subjects”. Or as Cowling puts it: “Reliability is a measure of the consistency with which a predictor continues to predict performance with the same degree of success”. That means that, for instance, two interviews at a different time and place, with different interviewers and questions but under otherwise same conditions and with the same applicants will bring the same result; namely the best candidate should still be the best and the interviewees who failed should still fail. It is also possible to maintain the conditions, the applicants and the structure but to change the other parameters of the assessment. By comparing the results, information about the reliability can be gained. However it is difficult to conduct these tests due to several constraints. It is considered to be nearly impossible to guarantee equal conditions for each sequence, as well as to provide sets of questions with different formulations but in a similar context. Furthermore the applicants have to be willing to take part a second time. These are only a few factors out of a bundle of problems which arise by testing reliability. Yet the measurement of validity is not easier.
The two concepts are connected: Reliability is a prerequisite for validity, this means that it is necessary, but not sufficient to ensure validity. This is easily understandable because: “if a test is so unreliable that it produces two different estimates of a person’s present behaviour, how can we believe that it gives a good estimate of future behaviour” ? “In the selection context, validity refers to the extent to which performance on the selection device/test is associated with performance on the job” or again Cowling: “Validity purports to measure how far a correct prediction of success in employment has been made. Validation then consists of analysing the extent of the match between predicted performance and eventual performance”. Validity describes for instance the case when an employee is selected in an interview and eventually constitutes the best choice out of all the other applicants (who have taken part) for this special job. There are plenty of restraints concerning the verification of the validity (assuming reliability would be possible), however in this context it is suitable to list only two: one would have to ask the same candidates again and again and it is not sure if they are willing to participate. Consequently, it is hard to examine if interviews are reliable and valid recruitment methods, although the two concepts set standards which are useful to build confidence in this selection method.
Interviews vary in many ways, nonetheless one can distinguish three main forms: The individual interview, where each participant has to compete with only one other speaker, or sequential interviews which take the form of a series of individual interviews. Besides those we can find panel interviews in which several persons conduct the interrogation (Cowling defines it as meaning that the applicant is faced with three or more interviewers). We can therefore imagine different situations with a different number and position of the interviewers: the departmental manager in small firms, a personnel officer with technical assistance, panels of senior executives sitting together, large committees in some of the public services, and variations and combinations of all these possibilities.
Validity can be of following types:
- Content validity: Content validity means, the content of the test item correlates highly with the job content. In other words, the content that choose for data entry test is a representative sample of what the person needs to know for the job, then the test is probably content valid.
- Predictive validity: It means the performance of an employees or test score highly correlates with the future requirement of the job.
- Concurrent validity: It means, the degree to which test score correlates with job performance (i.e. those we do well in the test do well in job).
- Construct validity: It means the relation between the job and its score in practical aspects. In other words, the extent to which the test measures the psychological quality or quantifies the psychological aspect of an individual.
Types of Reliability Estimates
Test-retest reliability indicates the repeatability of test scores with the passage of time. This estimate also reflects the stability of the characteristic or construct being measured by the test.
Some constructs are more stable than others. For example, an individual’s reading ability is more stable over a particular period of time than that individual’s anxiety level. Therefore, you would expect a higher test-retest reliability coefficient on a reading test than you would on a test that measures anxiety. For constructs that are expected to vary over time, an acceptable test-retest reliability coefficient may be lower.
Alternate or parallel form reliability indicates how consistent test scores are likely to be if a person takes two or more forms of a test.
A high parallel form reliability coefficient indicates that the different forms of the test are very similar which means that it makes virtually no difference which version of the test a person takes. On the other hand, a low parallel form reliability coefficient suggests that the different forms are probably not comparable; they may be measuring different things and therefore cannot be used interchangeably.
Inter-rater reliability indicates how consistent test scores are likely to be if the test is scored by two or more raters.
On some tests, raters evaluate responses to questions and determine the score. Differences in judgments among raters are likely to produce variations in test scores. A high inter-rater reliability coefficient indicates that the judgment process is stable and the resulting scores are reliable.
Inter-rater reliability coefficients are typically lower than other types of reliability estimates. However, it is possible to obtain higher levels of inter-rater reliabilities if raters are appropriately trained.
Internal consistency reliability indicates the extent to which items on a test measure the same thing.
A high internal consistency reliability coefficient for a test indicates that the items on the test are very similar to each other in content (homogeneous). It is important to note that the length of a test can affect internal consistency reliability. For example, a very lengthy test can spuriously inflate the reliability coefficient.
Tests that measure multiple characteristics are usually divided into distinct components. Manuals for such tests typically report a separate internal consistency reliability coefficient for each component in addition to one for the whole test.
Test manuals and reviews report several kinds of internal consistency reliability estimates. Each type of estimate is appropriate under certain circumstances. The test manual should explain why a particular estimate is reported.