8.02.2012

THE HISTORY OF PSYCHOLOGICAL TESTING


THE HISTORY OF PSYCHOLOGICAL TESTING
The history of psychological testing is a fascinating story and has abundant relevance to present-day practices. After all, contemporary tests did not spring from a vacuum; they evolved slowly from a host of precursors introduced over the last one hundred years. Accordingly, Chapter 1 features a review of the historical  roots of present-day psychological tests. In Topic 1A, The Origins of Psychological Testing, we focus largely on the efforts of European psychologists to measure intelligence during the late nineteenth century and pre–World War I era. These early intelligence tests and their successors often exerted powerful effects on the examinees who took them, so the first topic also incorporates a brief digression documenting the pervasive importance of psychological test results.
Topic 1B, Early Testing in the United States, catalogues the profusion of tests developed by American psychologists in the first half of the twentieth century. Psychological testing in its modern form originated little more than one hundred years ago in laboratory studies of sensory discrimination, motor skills, and reaction time. The British genius Francis Galton (1822–1911) invented the first battery of tests, a peculiar assortment of sensory and motor measures, which we review in the following. The American psychologist James McKeen Cattell (1860–1944) studied with Galton and then, in 1890, proclaimed the modern testing agenda in his classic paper entitled “Mental Tests and Measurements.”He was tentative and modest when describing the purposes and applications of his instruments: Psychology cannot attain the certainty and exactnessof the physical sciences, unless it rests on a foundation of experiment and measurement. A step in this direction could be made by applying a series of mental tests and measurements to a large number of individuals. The results would be of considerable scientific value in discovering the constancy of mental processes, their interdependence, and their variation under different circumstances. Individuals, besides, would find their tests interesting,
and, perhaps, useful in regard to training, mode of life or indication of disease. The scientific and practical value of such tests would be much increased should a uniform system be adopted, so that determinations made at different times and
places could be compared and combined. (Cattell,1890) Cattell’s conjecture that “perhaps” tests would be useful in “training, mode of life or indication of disease” must certainly rank as one of the prophetic understatements of all time. Anyone reared in the Western world knows that psychological testing has emerged from its timid beginnings to become a big business and a cultural institution that permeates
modern society. To cite just one example, consider the number of standardized achievement and ability tests administered in the school systems of the United States. Although it is difficult to obtain exact data on the extent of such testing, an estimate of 200 million per year is probably not extreme (Medina & Neill, 1990). Of course, the total number of tests administered yearly also includes
millions of personality tests and untold numbers of the thousands of other kinds of tests now in existence (Conoley & Kramer, 1989, 1992; Mitchell, 1985; Sweetland & Keyser, 1987). There is no doubt that testing is pervasive. But does it
make a difference?
THE IMPORTANCE OF TESTING
Tests are used in almost every nation on earth for counseling, selection, and placement. Testing occurs in settings as diverse as schools, civil service, industry, medical clinics, and counseling centers. Most persons have taken dozens of tests and thought nothing of it. Yet, by the time the typical individual reaches retirement age, it is likely that psychological test results will help shape his or her destiny. The deflection of the life course by psychological test results might be subtle, such as when a prospective mathematician qualifies for an accelerated calculus course based on tenth-grade achievement scores. More commonly, psychological test results alter individual destiny in profound ways. Whether a person is admitted to one college and not another, offered one job but refused a second, diagnosed as depressed or not—all such determinations rest, at least in part, on the meaning of
test results as interpreted by persons in authority. Put simply, psychological test results change lives. For this reason it is prudent—indeed, almost mandatory—that students of psychology learn about the contemporary uses and occasional abuses
of testing. In Case Exhibit 1.1, the life-altering aftermath of psychological testing is illustrated by means of several true case history examples.
The importance of testing is also evident from historical review. Students of psychology generally regard historical issues as dull, dry, and pedantic, and sometimes these prejudices are well deserved. After all, many textbooks fail to explain the relevance of historical matters and provide only vague sketches of early developments in mental testing. As a result, students of psychology often conclude incorrectly that historical issues are boring and irrelevant. In reality, the history of psychological testing is a captivating story that has substantial relevance to present-day practices. Historical developments are pertinent to contemporary testing for the following reasons:
1. A review of the origins of psychological testing helps explain current practices that might other- wise seem arbitrary or even peculiar. For example, why do many current intelligence tests incorporate a seemingly nonintellective capacity, namely, short-term memory for digits? The answer is, in part, historical inertia—intelligence tests have always included a measure of digit span.
2. The strengths and limitations of testing also stand out better when tests are viewed in historical context. The reader will discover, for example, that modern intelligence tests are exceptionally good at predicting school failure—precisely because this was the original and sole purpose of the first such instrument developed in Paris, France, at the turn of the twentieth century.
3. Finally, the history of psychological testing contains some sad and regrettable episodes that help remind us not to be overly zealous in our modern-day applications of testing. For example, based on the misguided and prejudicial application of intelligence test results, several prominent psychologists helped ensure passage of the Immigration Restriction Act of 1924. In later chapters, we examine the principles of psychological testing, investigate applications in specific fields (e.g., personality, intelligence, neuropsychology), and reflect on the social and legal consequences of testing. However, the reader will find these topics more comprehensible when viewed in historical context. So, for now, we begin at the beginning by reviewing rudimentary forms of testing that existed over four thousand years ago in imperial China.
THE CONSEQUENCES OF TEST RESULTS
The importance of psychological testing is best illustrated by example. Consider
these brief vignettes:
• A shy, withdrawn 7-year-old girl is administered an IQ test by a school psychologist.
Her score is phenomenally higher than the teacher expected. The
student is admitted to a gifted and talented program where she blossoms into
a self-confident and gregarious scholar.
• Three children in a family living near a lead smelter are exposed to the toxic
effects of lead dust and suffer neurological damage. Based in part on psychological
test results that demonstrate impaired intelligence and shortened
attention span in the children, the family receives an $8 million settlement
from the company that owns the smelter.
• A candidate for a position as police officer is administered a personality inventory
as part of the selection process. The test indicates that the candidate
tends to act before thinking and resists supervision from authority figures.
Even though he has excellent training and impresses the interviewers, the
candidate does not receive a job offer.
• A student, unsure of what career to pursue, takes a vocational interest inventory.
The test indicates that she would like the work of a pharmacist. She
signs up for a prepharmacy curriculum but finds the classes to be both difficult
and boring. After three years, she abandons pharmacy for a major
in dance, frustrated that she still faces three more years of college to earn a
degree.
• An applicant to graduate school in clinical psychology takes the Minnesota
Multiphasic Personality Inventory (MMPI). His recommendations and grade
point average are superlative, yet he must clear the final hurdle posed by the
MMPI. His results are reasonably normal but slightly defensive; by a narrow
vote, the admissions committee extends him an invitation. Ironically, this is
the only graduate school to admit him—nineteen others turn him down. He
accepts the invitation and becomes enchanted with the study of psychological
assessment. Many years later, he writes this book
RELIABILITY
Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly. For example, if a test is designed to measure a trait (such as introversion), then each time the test is administered to a subject, the results should be approximately the same. Unfortunately, it is impossible to calculate reliability exactly, but it can be estimated in a number of different ways. Test-Retest Reliability
To gauge test-retest reliability, the test is administered twice at two different points in time. This kind of reliability is used to assess the consistency of a test across time. This type of reliability assumes that there will be no change in the quality or construct being measured. Test-retest reliability is best used for things that are stable over time, such as intelligence. Generally, reliability will be higher when little time has passed between tests.
Inter-rater Reliability
This type of reliability is assessed by having two or more independent judges score the test. The scores are then compared to determine the consistency of the raters estimates. One way to test inter-rater reliability is to have each rater assign each test item a score. For example, each rater might score items on a scale from 1 to 10. Next, you would calculate the correlation between the two rating to determine the level of inter-rater reliability. Another means of testing inter-rater reliability is to have raters determine which category each observations falls into and then calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate.
Parallel-Forms Reliability
Parellel-forms reliability is gauged by comparing two different tests that were created using the same content. This is accomplished by creating a large pool of test items that measure the same quality and then randomly dividing the items into two separate tests. The two tests should then be administered to the same subjects at the same time.
Internal Consistency Reliability
This form of reliability is used to judge the consistency of results across items on the same test. Essentially, you are comparing test items that measure the same construct to determine the tests internal consistency. When you see a question that seems very similar to another test question, it may indicate that the two questions are being used to gauge reliability. Because the two questions are similar and designed to measure the same thing, the test taker should answer both questions the same, which would indicate that the test has internal consistency.
VALIDITY
Validity is the extent to which a test measures what it claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted.
Validity isn’t determined by a single statistic, but by a body of research that demonstrates the relationship between the test and the behavior it is intended to measure. There are three types of validity:
Content validity:
When a test has content validity, the items on the test represent the entire range of possible items the test should cover. Individual test questions may be drawn from a large pool of items that cover a broad range of topics.
In some instances where a test measures a trait that is difficult to define, an expert judge may rate each item’s relevance. Because each judge is basing their rating on opinion, two independent judges rate the test separately. Items that are rated as strongly relevant by both judges will be included in the final test.
Criterion-related Validity:
A test is said to have criterion-related validity when the test has demonstrated its effectiveness in predicting criterion or indicators of a construct. There are two different types of criterion validity:
  • Concurrent Validity occurs when the criterion measures are obtained at the same time as the test scores. This indicates the extent to which the test scores accurately estimate an individual’s current state with regards to the criterion. For example, on a test that measures levels of depression, the test would be said to have concurrent validity if it measured the current levels of depression experienced by the test taker.
  • Predictive Validity occurs when the criterion measures are obtained at a time after the test. Examples of test with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations.
Construct Validity: A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity.

No comments:

Followers

Popular Posts