Applied History of Psychology/Models of Assessment
A Brief History of Psychological Testing 
Although the widespread use of psychological testing is largely a phenomenon of the 20th century, it has been noted that rudimentary forms of testing date back to at least 2200 B.C., when the Chinese emperor had his officials examined every third year to determine their fitness for office (Gregory, 1992). Such testing was modified and refined over the centuries until written exams were introduced in the Han dynasty. The Chinese examination system took its final form about 1370 when proficiency in the Confusian Classics was emphasized. The examinations were grueling and rigorous (e.g., spend a day and a night in a small isolated booth composing essays on assigned topics and writing a poem). Those who passed the hierarchical examinations became mandarins or eligible for public office (Gregory, 1992). However, the similarities between the ancient Chinese traditions and current testing practices are superficial.
Psychological testing also owes as much to early psychiatry as it does to the laboratories of experimental psychology. The examination of the mentally ill around the middle of the last century resulted in the development of numerous early tests. For instance, in 1885, German physician Hubert von Grashey developed the antecedent of the memory drum as a means of testing brain-injured patients. In 1889, German psychiatrist Conrad Rieger developed a battery to assess defects resulting from brain injury, which included assessment of long- term memory, visual recognition, and short-term memory (Gregory, 1992). These early tests lacked standardization and were relegated to oblivion (Gregory, 1992). Nonetheless, they were influential in determining the course of psychological testing.
Most historians trace the beginnings of psychological testing to the experimental investigation of individual differences that flourished in Germany and Great Britain in the late 1800s. Early experimentalists like Wilhelm Wundt, Francis Galton, and James Cattell laid the foundation for testing in the twentieth century (Gregory, 1992). They departed from the wholly subjective and introspective methods and began to test human abilities in laboratories. For instance, Galton used several of the psychophysical procedures practiced by Wundt and others in Europe and adapted them to a series of simple and quick sensorimotor measures. To further study individual differences, Galton set up a laboratory in London at the International Health Exhibition in 1884, which was later transferred to a London Museum (Gregory, 1992). The tests and measures used involved both the physical and behavioral domains. Galton has often been regarded as the father of mental testing by historians (Gregory, 1992). Even though his simplistic attempts to gauge intellect with measures of reaction time and sensory discrimination proved fruitless, he provided a tremendous impetus to the testing movement by demonstrating that objective tests could be devised and that meaningful scores could be obtained through standardized procedures (Gregory, 1992).
James McKeen Cattell studied the new experimental psychology with both Wundt and Galton before settling in Columbia University. Cattell continued studying reaction times to measure individual differences (Gregory, 1992). Cattell also introduced the term “mental test” in his famous paper entitled “Mental Tests and Measurements”. This paper described ten mental tests, which were physiological and sensory measures, reflecting his Galtonian heritage (Gregory, 1992). Clark Wissler, one of Cattell’s doctoral graduates, conducted a study to test whether results could predict academic performance. His results showed virtually no tendency for the mental test scores to correlate with academic achievement.
With the publication of Wissler’s results, experimental psychologists largely abandoned the use of reaction time and sensory discrimination as measures of intelligence. However, the void created by the abandonment of the Galtonian tradition did not last long. In Europe, Alfred Binet (see below for biographical information) introduced his scale of intelligence in 1905 and shortly thereafter H.H. Goddard imported it to the United States. Binet developed his tests in early 1900s to help identify children in the Paris school system who were unlikely to profit from ordinary instruction. Binet’s measures of intelligence focused on higher psychological processes rather than the elementary sensory processes such as reaction time. Binet developed his 1905 scales in collaboration with Theodore Simon. The character of the 1905 scales owed much to a prior test developed by Dr. Blin (1902) and his pupil M. Damaye, who attempted to improve the diagnosis of mental retardation by using a battery of assessments (Gregory, 1992). Binet criticized the scales for being too subjective and for having items reflecting formal education; however, he was much impressed with the idea of using a battery of tests, a feature which he adopted in his 1905 scales (Gregory, 1992).
Timeline of Early Milestones in the History of Testing 
- 2200 B.C.: Chinese emperor examined his officials every third year to determine their fitness for office.
- 1862 A.D.: Wilhelm Wundt uses a calibrated pendulum to measure the “speed of thought”.
- 1869: Scientific study of individual differences begins with the publication of Francis Galton’s Classification of Men According to Their Natural Gifts.
- 1879: Wundt establishes the first psychological laboratory in Leipzig, Germany.
- 1884: Galton administers the first test battery to thousands of citizens at the International Health Exhibit.
- 1888: J.M. Cattell opens a testing laboratory at the University of Pennsylvania.
- 1890: Cattell uses the term "mental test" in announcing the agenda for his Galtonian test battery.
- 1901: Clark Wissler discovers that Cattellian “brass instruments” tests have no correlation with college grades.
- 1904: Charles Spearman describes his two-factor theory of mental abilities. First major textbook on education measurement, E. L. Thorndike’s Introduction to the Theory of Mental and Social Measurement, is published.
- 1905: Binet and Simon invented the first modern intelligence scale. Carl Jung uses word-association test for analysis of mental complexes.
- 1914: Stern introduces the intelligence quotient (IQ): the mental age divided by chronological age.
- 1916: Lewis Terman revises the Binet-Simon scales, publishes the Standford-Binet. Revisions appear in 1937, 1960, and 1986.
- 1917: Army Alpha and Army Beta, the first group intelligence tests, are constructed and administered to U.S. Army recruits. Robert Woodworth develops the Personal Data Sheet, the first personality test.
- 1920: Rorschach Inkblot test is published.
- 1921: Psychological Corporation – the first major test publisher – is founded by Cattell, Thorndike, and Woodworth.
- 1927: First edition of Strong Vocational Interest Blank for Men is published.
- 1938: First Mental Measurements Yearbook is published.
- 1939: Wechsler-Bellevue Intelligence Scale is published. Revisions are published in 1955, 1981, and 1997
- 1942: Minnesota Multiphasic Personality Inventory is published.
- 1949: Wechsler Intelligence Scale for Children is published. Revisions are published in 1974 and 1991.
Heredity, Historiometry and Eugenics 
Galton to Ceci: Concerning The Reductionist View of Intelligence 
It is with Galton (1892) that the reductionist conception of intelligence lies. His hypothesis was that eminence in the arts, sciences, letters, and law had its roots in the hereditary transmission of microlevel sensory and perceptual processes. He thought that differences in general intelligence were manifested in individual differences in the speed and accuracy of these sensory and perceptual processes.
Even though much of Galton’s own data failed to support his hypothesis, his idea slowly gained momentum and appeared to be borne out by research in the 20th Century. The late 1970s and early 1980s saw the development of improved batteries of microlevel tasks. These were superior to those of the earlier researchers because they had better psychometric properties and were explicitly focused on theoretically important constructs that were cognitive in nature rather than on constructs that were unrelated to intellectual ability (e.g., simple motor speed).
From this late 20th century work the link between microlevel measures and macrolevel abilities seemed clear (see, for example, Eysenck, 1982, and Jensen, 1982), suggesting that, essentially, individuals inherit a central nervous system (CNS) of determinate efficiency. This efficiency enables the individual to more or less effectively glean information from the environment. Thus, individual differences at this microlevel were thought to lead to individual differences on macrolevel measures, such as IQ test performance, school performance, and vocational outcomes.
But others have rejected this reductionist view first propounded by Galton. As one, relatively recent example, the arguments put forward by Ceci (1990) based on his own experimental research are briefly considered here. The first of these is that microlevel measures are not impervious to environmental differentiation. Ceci used an encoding task to demonstrate this. Subjects were presented with a number for a brief duration, which was then followed by an unfilled inter-stimulus interval. A pattern mask was then imposed where the number had appeared. While all subjects recognized the number, with a mask individual differences emerged in the time taken to detect the number. According to Ceci, although all subjects were likely to be equally familiar with the number, they differed in the degree of elaboration with which they represented the number in their memory. By elaboration he meant, for example, that 49 can be represented as simply an odd number, or more elaborately by its factors, its roots, and other associations. Ceci found that the greater the elaboration, the faster the recognition, suggesting that a microlevel task like encoding is not merely a straightforward measure of CNS efficiency; clearly individual differences in knowledge-bases are involved.
A second argument against Galtonian reductionism advanced by Ceci (1990) was based on the finding that inter-correlation patterns between microlevel tasks could not be satisfactorily accounted for by reference to a singular resource pool (that is, CNS efficiency). Ceci found that correlations were actually higher between conceptually dissimilar mircolevel tasks (e.g., encoding shapes and encoding auditory words) than they were between similar microlevel tasks (e.g., auditory encoding of pitch and auditory encoding of duration). Such findings undermined the assertion that microlevel measures directly reflected some fundamental physiological resource; if they did, Ceci argued, one would have expected there to be a higher correlation between similar mircolevel tasks.
Ceci (1990) also argued that the existing genetic evidence was ambiguous regarding what precisely is being transmitted. Ceci observed that inherited dispositions, which are not themselves considered evidence of intellectual capacity, could affect the cognitive abilities of an individual. The example he gives is temperament. Ceci’s point was that the path from genetics to IQ is not one that necessarily has much to do with biology but is instead the manner in which we navigate our social environment; this point is, of course, relevant when one reviews Galton’s own studies of eminence and accomplishment with his rather select samples.
Building on this argument, Ceci (1990) goes on to critique the research concerning real world attainment. In particular, he addresses Terman’s work (1925; Terman & Oden, 1959) on the predictive validity of IQ. The general interpretation of Terman’s data on the outcomes of high IQ children was that IQ was a predictor of real world success. Ceci focused on earnings in his own re-analysis of Terman’s data, and reported that, in fact, there was no relationship between IQ and earnings across the whole IQ range once social variables were adequately controlled for.
In essence, Ceci’s contribution to the debate that Galton began was that the causal pathways from performance on microlevel tasks to macrolevel indices of achievement are moderated by aspects of the individual’s ecology rather than being determined directly by a basic, innate intelligence as Galtonian thought maintained.
Wilhelm Wundt (1832-1920) 
Years before Wundt started the first psychological laboratory in 1879, he started measuring mental processes in 1862 when he experimented with his thought meter (Gregory, 2007). This instrument was a calibrated pendulum with needles protruding from each side. The pendulum would swing back and forth, striking bells with the needles. The observer’s task was to take note of the position of the pendulum when the bells sounded. Wundt thought that the difference between the observed pendulum position and the actual position would provide a means of determining the speediness of thought of the observer, an attribute he thought that is different from one person to another. The use of an empirical analysis to explain individual differences was the most significant contribution Wundt made to modern psychological testing (Gregory, 2007).
Sir Francis Galton (1822-1911) 
Galton was originally trained in medicine in London, Cambridge and Birmingham until inheriting a considerable fortune at the age of twenty-two. Then, he abandoned his medical studies and spent several years traveling. In two years in southwest African he made important contributions to geography. He was the first to publish weather maps and to describe the anticyclone as a weather system. Besides his important influence in eugenics and heredity, as reflected by his work Hereditary Genius (1869) and Inquiries into Human Faculty and Its Development (1883), Galton also studied a wide variety of subjects. For example, he developed a method of composite photography for summarizing portraits and conducted research that eventually led to the use of fingerprints as a means of identification. Galton also pioneered in the study of resemblance in physical and mental characteristics in successive generations, and used twins to investigate the relative effects of nurture and nature. He also recognized the need for a method to describe the relationship of two variables, hence developing the product-moment formula for linear correlation, which was considered as his most outstanding contribution to the area of test theory (DuBois, 1970).
Galton’s fascination with quantification and individual differences led him to invent methods for measuring human traits. When his cousin, Charles Darwin, proposed his theory of natural selection – that nature naturally selects human traits that are most successful – Galton followed by suggesting that human traits could be measured and ranked for the purpose of breeding superior people. His interest in promoting human betterment led him to found the eugenics movement (Myers, 1998).
- “I have no patience with the hypothesis occasionally expressed and often implied, especially in tales written to teach children to be good, that babies are born pretty much alike, and that the sole agencies in treating differences between boy and boy, and man and man, are steady application and moral effort. It is in the most unqualified manner that I object to pretensions of natural equality” (Galton, 1892 in Myers, 1998).
Over the next few years, Galton attempted to measure innate mental capacity in an effort to quantify human superiority. In his book, Hereditary Genius (1869), he experimented with the idea of measuring one’s head size to assess intelligence and, in later years he developed a number of diverse measures of what he believed were the “biological underpinnings of genius” (Myers, 1998, p. 334).
Galton borrowed the psychophysical procedures practiced by Wundt and adapted them into a series of simple sensorimotor measures. Because of his efforts in devising practicable measures of individual differences, Galton is often regarded as the "father of mental testing" (Goodenough, 1949, as cited in Gregory, 2007). In 1884, he set up a psychometric laboratory in London at the International Health Exhibition, where for a small fee a person could have a series of measurements taken and recorded, including height, weight, head length, head breadth and arm span (DuBois, 1970; Gregory, 2007). Although Galton’s simplistic attempts to assess intellect with measures of reaction time and sensory discrimination proved futile, he pioneered in the development of objective tests to investigate psychological problems to obtain meaningful scores through standardized procedures (DuBois, 1970; Gregory, 2007).
James McKeen Cattell (1860-1944) 
Cattell was the American psychologist who introduced the Galton tradition in testing to the US and was responsible for many early developments in mental measurements (DuBois, 1970; Gregory, 2007). From 1880 to 1882, he studied under Wundt in his psychological laboratory, where he did a series of Reaction Time (RT) studies. He noticed that he and another colleague had small but consistent differences in RT and proposed to Wundt that such individual differences ought to be studied systematically. However, he received no support from Wundt to continue research in this area.
Cattell also worked for Galton in his Anthropometric Laboratory in 1888, where he received enthusiastic support for his research on individual differences. After that, he lectured and collected psychological test data at Cambridge and in the U.S. Cattell was also the first person to receive the title “Professor of Psychology” in the U.S. (DuBois, 1970). Cattell (1890; as cited in DuBois, 1970) invented the term "mental test" in his famous paper entitled “Mental Tests and Measurements”, which described his research programming detailing ten mental tests he proposed for use with the general public. These tests were adapted from Galton’s battery of testing, which include items like strength of hand squeeze, degree of pressure to cause pain, time for naming colors and weight differentiation.
In 1891, Cattell accepted a position at Columbia University, where he founded the Psychological Laboratory, and soon he inaugurated a battery of physical and mental tests which were given to approximately 50 freshmen in the college each year. During his professorship, he supervised many students who turned out to be very influential in psychology, such as E.L. Thorndike (author of famous Introduction to the Theory of Mental and Social Measurement), R.S. Woodworth (creator of the first personality test), E.K. Strong (creator of the Strong Vocational Interest Blank) and Clark Wissler. Wissler (1901; as cited in Gregory, 2007) himself was a great influence on the early history of psychological testing, as he demonstrated that mental scores do not correlate with academic achievement, which eventually led to the abandonment of the use of RT and sensory discrimination as measures of intelligence (Gregory, 2007).
Intelligence Testing 
Alfred Binet (1857-1911) 
Alfred Binet was born in Nice on July 11, 1857. He was a very bright child and his mother decided to send him off to Paris to study at the age of 12. Although he was most well known for inventing the first modern intelligence test, the Metric Intelligence Scale, he had also studied in many areas such as perception, hallucinations, language and anatomy.
Binet began his career in medicine and later switched to psychology. At the Salpetriere Hospital, Binet met the neurologist J. M. Charcot (1825-1893); Charcot later became his mentor. Binet and his colleague Charles Fere published four studies that were thought to demonstrate how the polarity of a magnet could induce complete mood changes in a single hypnotized subject, but the results were met with harsh criticism. Later, Binet published a recantation of the findings and realized the importance of using scientific experimental procedures.
Learning from the lesson, Binet’s significant contribution to education psychology was the use of scientific methods, which is to experiment and observe: "a collaboration between theory and experimentation" (Binet & Simon, 1908, p. 1). The study of mental fatigue, which referred to whether the workload imposed on children in school was too heavy and exhausting, was an illustration of an experimental investigation by Binet and his colleagues (1898). The psychological effects of mental fatigue were explored in two setting – the laboratory and the classroom. Binet considered it important to begin experiments in the laboratory because methodological problems could be solved and the important research questions could be established in this setting. Then, experiments would be carried out in real-life settings so that more efficient, effective and elaborate plans could be devised and hypotheses tested. Binet asserted that all experimental research should follow four steps: hypothesis, collection of facts, interpretation of data, and replication. His insistence on using scientifically sound methodology prompted him to develop new statistical tools and to make use of control groups, providing a model experiemental approach for later psychologists to follow. However, it is worth noting that many of Binet's observations, which formed the basis of much of his theoretical work on cognitive development, were of his own children! Like Piget after him, Binet gained considerable insight into the developmental processes that interested him through this qualitative aspect of his research.
In 1904, Binet was appointed by the Minister of Public Education in France to a commission charged with devising a measure that could identify retarded school children. With the help of his doctoral student, Theodore Simon, Binet created the Metric Intelligence Scale. In this respect, Binet pioneered the notion of a general intelligence when the prevailing view was of quite distinct mental functions. The Scale was actually comprised of 30 separate tests (some developed by Binet, others based on existing cognitive tests), assessing a wide range of abilities from psychomotor coordination to complex mental reasoning. Setting the trend for later intelligence test developers, Binet's tests increased in difficulty and provided a means to determine normal performance between children according to their chronological age as well as whether or not they were normal or abnormal. Children identified with learning impairments, based on their performance of the scale, were selected and put into special education class. Of note, however, was Binet's insistence that before these special education classes were made available to all children with abnormality, their impact must be assessed and verified experimentally.
Binet’s conception of abnormality was quite different from the widespread views of the time (which essentially held the abnormal child to be one whose development has slowed down or stopped). He believed that abnormality was actually a different developmental pattern, in which the abnormal child shares some aspects with normal children while other aspects differ. Most importantly, this conceptualization of abnormality meant that it could be addressed or overcome by special education designed to improve a child's cognitive functioning through specific instruction and practice. He also emphasized that the scale should only be used as an indication of the child’s cognitive level at the time of administration, suggesting that this level could change over time (and as a consequence of remediation and practice). He clearly warned against construing the child's performance on his IQ test as a fixed measure of the child's intelligence.
To identify children eligible for special education, Binet also considered the schoolteacher’s impression of "possibly retarded" pupils based on their performance on studies. In effect, Binet was seeking to establish concurrent validity for his test by corroborating the results of its administration with the impressions of the child's school teacher. To Binet's credit, he insisted that these impressions of teachers were not known to his test administrators prior to testing. This served to control for any possible confirmation bias. Binet made it clear that his scale was just one assessment tool to identify retarded children and the observation of the child’s usual performance was also critical in making decisions about identification and placement.
Although Binet’s aim was to detect and assist, and clearly not to segregate, retarded or abnormal children, his test was put to use for precisely the purpose he had so carefully and explicitly counselled against when Lewis Terman introduced it in the US in 1916. Terman stressed the importance of hereditary factors in explaining IQ performance. Contrary to Binet, he believed that the reason for placing children with low IQ scores in special classes was that there was no hope of them being able to benefit from normal schooling. Fortunately Terman’s views are certainly not representative of contemporary policy and practice in the field of School Psychology in North America and Europe. Indeed, it is interesting to note in reading Binet’s work on assessment procedures that many of his concerns and recommendations of a century ago are explicitly emphasized in the authoritative texts used today concerning the testing of children (Sattler, 2001; Sattler & Hoge, 2006).
Binet’s views on learning also made an impact in the field of psychology. He realised the importance of studying individual differences both between children of different ages and also between children of the same age in order to identify the strengths and weaknesses of different learners. As such, he emphasized the need to assess a broad range of skills to identify children’s potential and so plan and provide optimal education corresponding to their abilities. Binet believed that any child could learn providing they experience optimal conditions for this learning. One crucial condition was that the level of difficulty in the material or concepts taught had to be carefully adjusted for the child so that its comprehension and mastery did not lie completely beyond the child's capacity. This idea bares striking similarity with Vygotsky's "zone of proximal development". Also, Binet believed that learning came with practice and that students should not only be lectured but also given opportunities to practice their knowledge. Moreover, Binet contended that a multitude of factors must be taken into account when studying the complexities of human functioning. For example, based on his belief that cognitive functioning cannot be separated from emotional functioning, he examined the relation between eating habits and intellectual achievement.
Henry H. Goddard (1866-1957) 
In 1906, Henry H. Goddard was hired by the Vineland training School in New Jersey to conduct research on the classification and education of “feeble-minded” children. Soon, he realized that a diagnostic instrument would be necessary and was amazed to learn about the 1908 Binet-Simon scale. He quickly set about translating the scale, making minor changes so that it would be applicable to American children (Gregory, 2007).
Goddard was a strong advocate of eugenics, engaging in the use of intelligence testing to demonstrate that a large number of immigrants entering the US were mentally retarded (Gregory, 2007). He also tested many normal children with his translation of the Binet-Simon scales, and supported the viewpoint that children identified as mentally deficient should be segregated so that they would be prevented from “contaminating society” (1911; as cited in Gregory, 2007). He also gained a reputation as one of the leading experts on the use of intelligence tests to identify persons with impaired intellect.
Lewis M. Terman (1877-1956) - Stanford-Binet Intelligence Scale 
Lewis Terman was born the eleventh of fourteen children on an Indiana farm in 1877 (Chapman, 1988). He quickly became and avid reader, excelled in school, and began to prepare for a career in teaching at the age of fifteen. He found an interest in psychology from reading famous works such as Darwin’s On the Origin of Species and William James’s Principles of Psychology (Chapman, 1988). His interest in mental testing originated in the years at Clark University where he did his Ph.D. In his doctoral dissertation entitled “Genius and Stupidity”, he derived a variety of tests, including measures of inventiveness and imagination, logical processes, mathematical ability and language, to seven “bright” and seven “stupid” boys selected as extreme cases from approximately 500 children (DuBois, 1970; Chapman, 1988). He concluded that the nature of intelligence could best be explained through the use of mental tests by which an individual’s performance could be quantified and compared to the normal performance of the population at large (Chapman, 1988).
Terman discontinued his research of the measurement of intelligence until his appointment to the Department of Education in the Standford University in 1910, where he began a revision of Binet’s intelligence scale for use in America (DuBois, 1970; Chapman, 1988). Using the 1911 Binet-Simon scale as the source, Terman produced the Stanford-Binet in 1916, which became the standard of intelligence testing for decades (DuBois,1970; Gregory, 2007), and the work he was best known for.
The new scale was based on comprehensive and systematic research (DuBois, 1970; Chapman, 1988). The Binet material and forty additional tests were prepared for tryout with 905 normal children between the ages of 5 and 14, all within two months of a birthday. Also, results of testing some 1400 other cases, including 200 defective and superior children and 400 adults, were considered in making the revision. A considerable amount of time was spent to train the examiners and all records were scored by Terman himself to ensure uniformity.
The new Standford-Binet consisted of 90 items and it was suitable for those with mental retardation, children, and both normal and superior adults (DuBois, 1970; Gregory, 2007). It had clear and well-organized instructions for administration and scoring. One of the major additions that Terman made to the test was the introduction of the concept of mental quotient (originally propsed by Loius William Stern), whereby which an individual's mental age is divided by their chronological age, to express test results. Terman renamed this ratio intelligence quotient (IQ), a term that continues to be widely used today (Sattler, 2001).
David Wechsler (1896 – 1981) – Wechsler-Bellevue Intelligence Scale 
Major improvements in the measurement of intelligence were made by David Wechsler, who published the Wechsler-Bellevue Intelligence Scale in 1939 (DuBois, 1970). he scale composed of subscales so that a give type of task or item was administered only once to the subject. The IQ became a standard score with a mean of 100 at each age level and a standard deviation so that 50 percent of the IQs were between 90 and 110. The instrument also yielded a verbal IQ, a performance IQ and a total IQ (DuBois, 1970; Gregory, 2007). Modifications were made so that the scale was more suitable for adults than earlier scales had been, which was known as the Wechsler Adult Intelligence Scale (DuBois, 1970). A version for children, the Wechsler Intelligence Scale for Children, was developed in 1949 (Gregory, 2007).
Early Group Tests 
With the success of the Binet scale, measurement of mental abilities by a device that could be administered simultaneously to large numbers of subjects was a logical next step (DuBois, 1970). Among the first to develop group tests was Pyle, who in 1913 published schoolchildren age norms for a battery consisting of measures such as memory span, digit-symbol substitution, and oral word association (Gregory, 2007), which was intended to be used diagnostically (DuBois, 1970).
In 1917, Pintner modified Pyle’s approach so as to measure general intelligence (DuBois, 1970). Using the five tests from Pyle that seemed to have the highest correlations with general intelligence, he added a timed cancellation test in which the child crossed out the letter a wherever it appeared in a body of text (DuBois, 1970; Gregory, 2007).
The pace of developments in group testing picked up dramatically as the US entered World War I in 1917 (Gregory, 2007). Robert M. Yerkes was the president of the American Psychological Association at that time and he immediately took energetic steps to discover and implement ways in which psychology could be of service in the national effort (DuBois, 1970). He assembled the committee on the examination of recruits in May 1917 and it was decided that a group intelligence test should be administered to all recruits for the purposes of classification and assignment (DuBois, 1970; Gregory, 2007). Some criteria for the new group test include: adaptability for group use, correlation with valid intelligence measures, measurement of a wide range of ability, and objectivity of scoring and economy of time (DuBois, 1970).
Two group tests emerged from this effort: the Army Alpha and the Army Beta (Gregory, 2007). The Alpha consisted of eight verbally loaded tests for average and high-functioning recruits. The eight tests were (1) following oral directions, (2), arithmetical reasoning, (3) practical judgment, (4) synonym-antonym pairs, (5) disarranged sentences, (6) number series completion, (7) analogies, and (8) information.
The Beta was a nonverbal group test designed for use with illiterates and recruits whose first language was not English. It consisted of various visual-perceptual and motor tests such as tracing a path through mazes and visualizing the correct number of blocks depicted in a three-dimensional drawing (Gregory, 2007).
The Army testing program was the first large-scale use of intelligence tests. Approximately 1.75 million recruits were examined, of which at least 1.25 million were tested with one of the five forms of the Army Alpha (DuBois, 1970). It received good acceptance by the military establishment, which used the results in making important personnel decisions.
Personality Testing 
Some Introductory Comments on General Approaches to Personality Assessment 
Although there is marked variation in the ways lay people and even psychologists define it, personality can be taken to refer to the characteristic patterns of thinking, feeling, and behaving that reflect a person’s individual style and influence the way the person interacts with his or her environment. A number of different theories concerning personality have been expounded and closely associated with these are contrasting approaches to personality assessment. Of particular note given their longstanding influence on thinking and assessment practices in the field are trait theory and psychoanalytic theory and their associated assessment procedures.
Trait theorists hold that there are distinct categories or types of personality. An individual is categorized according to their position on two or more continuous dimensions, which are usually measured using rating scales that can be summed and combined in pre-specified manner to generate a constellation of higher order personality traits. Trait theorists have sought to derive an optimal set of these traits so that the underlying constructs make theoretical sense and receive empirical support while also sufficiently accounting for the diversity that characterizes human personality. Questionnaire format personality inventories have been developed on which respondents rate their thoughts, feelings, or reactions to the scenarios described. Construction of these tests and item inclusion has been based on theory (the rational method) or through the use of statistical procedures such as factor analysis (the criterion or empirical method).
Significant contributions to the trait theory of personality during the 20th Century include the work of McDougall (1932), Thurstone (1934), Gordon Allport (1937, 1961), Raymond Catell (1943, 1947), Hans Eysenck (1970), and John Digman (1963, 1972). Currently, the five-factor model of personality (comprised of Neuroticism [maladjustment], Extraversion, Openness to Experience, Agreeableness, and Conscientiousness) enjoys wide acceptance in the field and a firm empirical basis (Costa & McCrae, 1985, 1989, 1992). Examples of contemporary inventories representing this approach include the NEO Personality Inventory (NEO-PI-R: Costa & McCrae, 1992) and, in clinical contexts, the Minnesota Multiphasic Personality Inventory (MMPI-2: Butcher & Megargee, 1989) (see below).
The psychoanalytic view of personality implicates unconscious motivations as the underlying cause of individuals’ behaviors across situations (as well as of their dreams and slips of the tongue). Freud (1933, 1991) viewed personality as being composed of the id, the ego, and the superego, which he argued were often in conflict because of the differing principles according to which each operates. The Pleasure Principle, always driving toward instant gratification of biological impulses, controls the id. In contrast, the ego operates on the Reality Principle, tempering the impulses of the id by delaying gratification until socially acceptable means of securing it are possible. The superego represents the individual’s conscience, imposing moral standards against which the individual plans and judges his thoughts, feelings, and behavior. In this theory it is the ego that balances the influences and tension between the other two components in the well-integrated personality.
In contrast to the structured and standardized assessment procedures developed and used by trait theorists, psychoanalytically oriented personality assessors use projective tests. In these tests the stimuli are deliberately ambiguous and the individual is thought to reveal unconscious motives and desires by projecting his or her personality onto the stimuli. Examples of these projective tests include the Rorschach Test (see below) and the Thematic Apperception Test. Administration of these tests involves considerable judgment and interpretation on the part of the assessor.
Robert S. Woodworth (1869-1962) – Personal Data Sheet 
Although Galton had devised an assessment method to investigate imagery, it was not until WWI that R.S. Woodworth applied the technique to develop an instrument to detect Army recruits for their susceptibility to emotional stability (DuBois, 1970; Gregory, 2007). In doing so, he developed the Personal Data Sheet in 1919, which was the first personality test in history.
The Personal Data Sheet consisted of 116 questions that required the subject to answer Yes or No. The questions involved fairly serious symptomatology. Items that were found to differentiate between normal and abnormal subjects were the following:
- Do you feel sad or low-spirited most of the time?
- Are you ever bothered with feeling that people are reading your thoughts?
In 1919, Woodworth reported that of the 100 symptoms inquired about, the average college student reported about 10 and the typical hysteric reported over 40 (DuBois, 1970).
Hermann Rorschach (1884 – 1922) – Rorschach Inkblot Test 
Hermann Rorschach was born in Zurich on November 8, 1884. He spent his youth in Schaffhausen and studied medicine mostly in Zurich. He worked as a resident physician in the asylums of several Swiss towns, and for seven months in 1914 he worked at a sanatorium in Moscow. Hermann was the associate director of the asylum when he died prematurely on April 2, 1922 at the age of 37. Ten months prior to his death, in June 1921, Rorschach published Psychodianostics, the monograph of the famous Inkblot Test, which became a milestone in the history of projective testing (Ellenberger, 1993; Gregory, 2007). The Rorschach Inkblot test consisted of 10 inkblots, which was formed by dribbling ink on a piece of paper and folding the paper in half, producing relatively symmetrical designs (Gregory, 2007). Five of the inkblots are black or shades of gray, while five contain color. As Rorschach was more interested in exploring subject’s modality of perception than the content of the associations invoked, he focused on how the subject responded, such as their reaction time, whether the form is conceived in parts or in its entirety, and how did the form, movement and color influence the subject’s appreciation of the blot (DuBois, 1970).
To one’s amazement, Hermann’s nickname as a secondary school student was “Klex”, a word meaning “inkblot”, which coincides with the test for which he is renowned for (Ellenberger, 1993). Klecksography was a popular game among Swiss children that consists of making inkblots on a piece of paper and folding it to construct forms of an object, such as a bird or a butterfly.
One incident that stimulated Rorschach’s interest in studying human unconscious was a dream that he had when he was a medical student (Ellenberger, 1993). He dreamt that his own brain was being cut into slices exactly as he had seen it done during the autopsy, and he felt these slices falling forward, one after the other, across his forehead. Two questions immediate arose in his mind: How can someone experience in a dream perceptions that are physiologically impossible? And how could a succession of optic images be translated into and re-experienced as a succession of kinesthetic images? These questions proved to be the guiding force of Rorschach’s (1964) Psychodiagnostics, in which he concluded: The apparatus with which the individual is endowed for assimilating experiences is much broader, more extensive instrument than that which he uses in daily life. A person has a number of registers which enable him to experience, but he uses only a few in the ordinary run of living.
A major influence in Rorschach’s Inkblot Test was Jung’s Word Association Test, the first experimental method applied to dynamic psychiatry (Ellenberger, 1993). Using this test, Jung detected mental complexes and isolated a special test syndrome for every disorder. He classified subjects into introverts and extroverts by distinguishing their semantic and verbal associations. This formal classification of answers is considered as Jung’s most original contribution, which greatly influenced the detail and framework of Rorschach’s test (Ellenberger, 1993). For example, in Psychodiagnostics, Rorschach (1964) presented observations of 405 subjects and classified their responses by types such as normal, feebleminded, epileptics and schizophrenics, a structure similar to Jung’s classification. Although using it in an entirely different fashion, Rorschach also borrowed the word introversion from Jung (Rorschach, 1964).
Another thrust in the development of the Inkblot Test was Rorschach’s encounter of S. Hen’s dissertation entitled “Testing the Imagination of School Children, Adults, and Mental Patients by Means of Formless Blots” published in 1917. Hens employed eight inkblots to evaluate the content of interpretation of a thousand children, a hundred normal adults, and a hundred psychotics (Rorschach, 1964; Ellenberger, 1993). At the end of the study, Hens offered a few suggestions for future direction of research. He noted that some subjects tended to interpret the whole picture while others only in parts, which prompted him to speculate whether this pattern is meaningful. He also noticed that all of his eight cards were in black and white colors, which led him to consider whether colored cards would elicit differential responses. Finally, Hens wondered about the possibility of using the inkblots test to diagnose psychoses. All these questions were attempted by Rorschach (1964) in Psychodiagnostics, with the focus of investigating the pattern of perceptive process.
Myers-Briggs, Keirsey's Temperments, and True Colours 
Myers-Briggs Personality Tests, is influenced by the work of Carl Jung and is made up of 16 distinct personality types. These personality types are derived from four main variables:
- Intorvert/extrovert (I/E)
- Sensor/ Intuitive (S/N)
- Feeling/Thinking (F/T)
- Judger/Perceiver (J/P)
From this combination of opposite tendencies, the assumption is that most people tend towards one more than the other. For example, a person may find that they lean more towards an extroverted personality type versus an introverted personality type. Furthermore, derived from these main variables are the 16 personality types, which are expressed in four letters (i.e. INFJ or ENFP). Most often when a person is administered this test they will find that they can see a little bit of them self in more than one personality type. As such, the administrator of the test will usually comment that it is a test of ones most frequent preference or dominant tendency as apposed to a rigid diagnosis.
David Keirsey, a renowned psychologist born in Oklahoma in 1921, modified the test by organising the 16 personality types into four main temperaments. These temperaments he called Guardians, Artisans, Idealists, and Rationals.
The True Colours Test, which is another personality test, is also related to the Myers-Briggs and Keirsey's temperaments. Don Lowry, founded True Colours in 1978 after becoming interested in the work of Keirsey, studying Katerine Briggs, Isable Myers, and Carl Jung. He took Keirsey's four main personality types and then created a test for children and adults that would be easy, fun and convinient to use. Each colour of the test is used to characterize a certain type of person and there are also comparisons of compatibility and non-compatibility across colours. An individual taking this test can then rank, through a series of tests, the order of these colours in their personality (from most prevalent to least). This test has been used in classrooms, corporate offices, apprenticeship programs, and even in career development. It is very useful for a person to understand their strengths, tendencies, attitudes, and the perseption others may have of them.
Starke R. Hathaway (1903-1984) and J. C. McKinley (1891-1950) - Minnesota Multiphasic Personality Inventory (MMPI) 
Using Woodworth’s procedure of writing items that seemed to have clinical significance and establishing validity by contrasting the responses of normal and abnormal subjects, S. R. Hathaway and J. C. McKinley published the MMPI in 1943 (DuBois, 1970; Gregory, 2007). They also used the model of the Strong Vocational Interest Blank in that a large item pool was created with the idea that only a relatively small subset would be included in any one key or scale (DuBois, 1970). The MMPI also introduced the use of validity scales to determine fake bad, fake good and random response patterns (DuBois, 1970; Gregory, 2007).
The MMPI consisted of 566 true-false items designed to diagnosis psychiatric symptoms (Gregory, 2007). These items were selected from more than 1000 items covering health conditions, habits, personal and social attitudes, and psychiatric symptoms, and were administered both to normal subjects and to individuals exhibiting a defined pathological condition (DuBois, 1970). Items showing the greatest differentiation were selected for the scale, which was then cross-validated on new groups of cases (DuBois, 1970; Gregory, 2007).
Personality Testing with Criminals in the 20th Century: The Contributions of Eysenck and Hare 
That criminals are individuals who are in some way different from other members of society has been a pervasive view for centuries and remains so today among many. The contribution of psychology has included the formulation of personality theories explaining criminality as the product of a certain personality type. As one area in the psychology of individual differences research, efforts to measure the personality of criminals can be likened to Galton’s and Terman’s efforts to discriminate between the eminent or intelligent and the weak-minded, with the social engineering implications clear in both areas of testing. Within the broader framework of a multidisciplinary approach, personality theories represent a middle range explanation, that is, below macro-theories of social structure and organisation and above micro-theories that are based on research into the biological make-up of individuals.
The sociologist Durkheim (1895, 1938) posited that crime was in fact a normal social phenomenon and not a pathological element within society. In contrast, Eysenck (1964, 1970, 1977) advanced the theory that criminals were psychologically different from others and his work can be seen to have ushered in a new era of psychology inquiry into the question. Using the concept of personality, Eysenck sought to identify a subset of ‘aberrant’ people, criminals, different from the larger ‘normal’ whole, the general population. While this conception of crime as pathology, in essence the Positive School of Criminology, was not new, Eysenck’s introduction of psychometric tests to illustrate between-group differences was.
Adapting his general theory of personality, in which individuals could be categorized according to behavioral tendencies, Eysenck hypothesised that criminals would have a distinct personality type, measurable using his inventories (Maudsley Personality Inventory, MPI; Eysenck Personality Inventory, EPI, see Eysenck, 1960, and Eysenck & Eysenck, 1964). Basically, individuals could be located on two spectrums, extroversion/introversion and neuroticism, which are independent and represent overall personality dimensions. Extroversion is characterised by impulsiveness, sociability and generally outgoing behavior while introversion is typically demonstrated by shyness, control, withdrawal and inversion. According to Eysenck, the criminal would be high in extroversion and high in neurotism, and prone to criminal behavior because of this personality type. Initial data with offender populations were generally consistent with this (Eysenck, 1964). Other evidence consistent with Eysenck’s theory came from twin and adoption studies (Eysenck, 1977; Mednick, Gabrielle, & Hutchings, 1984).
However, Eysenck’s work on personality testing with criminals has been subject to a number of criticisms. One concern was that his measures were self-report questionnaires; the degree of transparency in the material left them susceptible to response biases. While the incorporation of the L (Lie) scale in 1964 was an attempt to counter this, other researchers found that response profiles could be manipulated without elevating scores on this scale (see Farrington, Biron, & LeBlanc, 1982). Another problem with Eysenck’s personality measures was the finding that the dimensions correlated with one another to varying degrees (Eysenck & Eysenck, 1970; Eysenck & Eysenck, 1976). Clearly personality dimensions should be homogeneous and independent of each other if they are to be taken to be fundamental constructs.
Of course, Eysenck’s personality inventories represented only one contribution to personality research among many. Tennenbaum (1977) undertook a substantial review of the literature and found that 52 different tests had been used to investigate personality in criminals with little evidence of discriminative validity in general. As such, it seemed necessary to begin to look for a more homogenous subset within the criminal population in order to better elucidate core personality traits with discriminative and predictive validity. So Eysenck and Eysenck (1978) attempted to operationalize the concept of psychoticism, “… the psychotic is ‘mad’ in the sense that his cognitive processes are deranged… while the neurotic is usually in full possession of his mental faculties, but not in control of his emotions (p. 57). Effectively Eysenck was modifying his personality theory by postulating a three dimensional personality breakdown, consisting of E, N and P dimensions, which predicted that psychopaths and criminals should have high scores on the E, N and P scales. Hare (1982) found that P scores correlated with six of the 22 items on his Psychopathy Checklist (Hare 1980). However, he opined, “… high scores on the P scale may be more a reflection of criminal and antisocial tendencies and behavior than of the inferred psychological constructs (e.g. lack of empathy, guilt, remorse, concern for others etc.) that are essential for the diagnosis of psychopathy.” Similarly, diagnostic criteria for Antisocial Personality Disorder in the DSM (see most recently American Psychiatric Association, 2000) has also been widely criticised for merely providing a description of the habitual criminal.
In contrast to the earlier attempts of Eysenck and others to measure criminal personality, a considerable body of empirical evidence has emerged over the past 25 years in support of the psychopathic personality as operationalized by Hare’s Psychopathy Checklist (1980, 1991, 2003), which Hare developed following the clinical observations and seminal theorizing of the psychiatrist Hervey Cleckley (1941). Psychopathy is a complex personality disorder that has been defined by a constellation of interpersonal, affective and behavioral characteristics. The PCL-R is a structured clinical assessment instrument developed to assess a range of emotional and interpersonal traits as well as socially deviant behaviors. It is scored by a trained assessor on the basis of an individual interview and review of file materials. Its reliability and validity are well established (Hare, 2003). Of particular note, research shows that psychopathic traits, as assessed by the PCL-R, are associated with negative indices of treatment involvement among general correctional samples of offenders (e.g., Hare, Clark, Grann, & Thornton, 2000; Hobson, Shine, & Roberts, 2000; Ogloff, Wong, & Greenwood, 1990). Strong associations have also been reported between PCL-R scores and both general and violent recidivism among correctional and psychiatric populations (e.g., Hemphill Hare, & Wong, 1998; Salekin, Rogers, & Sewell, 1996).
Although the etiology of the psychopathic personality remains poorly understood, a growing body of evidence (see Patrick, 2006) supports Hare’s (1993) postulation that ‘genetic factors contribute to the biological bases of brain function and to basic personality structure, which in turn influence the way the individual responds to, and interacts with, life experiences and the social environment” (p. 173). Certainly, the theoretical and conceptual impact of the construct of the psychopathic personality has been considerable. As a personality assessment tool, the PCL-R has been highly influential in both research and applied settings. It stands as one of the key advances in the 20th century in personality research at the juncture with clinical and forensic psychology, comparing favourably with the endeavors of earlier thinkers and scientists from Eysenck all the way back though notables such as Cesare Lombroso (1835-1909) and Franz Joseph Gall (1758-1828) to Aristotle.
Interest Inventories 
Edward K. Strong (1884-1963) – The Strong Vocational Interest Blank 
Edward K. Strong, a psychologist whose professional endeavor was to measure vocational interests, devoted 36 years to the development of empirical keys for the modified instrument known as the Strong Vocational Interest Blank (SVIB) since its first publication in 1927 (DuBois, 1970; Gregory, 2007). To do so, he used large and well-selected groups of respondents, studied the test’s reliability and validity, examined variation in vocational interest with time, developed manuals and devices to aid in the interpretation of results and made various improvements in the device and its scoring methods (DuBois, 1970). Subjects taking the test could be scored on separate keys for several dozen occupations, providing a series of scores of great value in vocational guidance (Gregory, 2007). The modern version, the Strong Interest Inventory, is still widely used by guidance counselors nowadays (Gregory, 2007). A companion blank for women was introduced in 1933 (DuBois, 1970).