Contemporary Educational Psychology/Chapter 11: Standardized and Other Formal Assessments
Understanding standardized testing is very important for beginning teachers as K-12 teaching is increasingly influenced by the administration and results of standardized tests. Teachers also need to be able to help parents and students understand test results. Consider the following scenarios.
- Vanessa, a newly licensed Physical Education teacher, is applying for a job at a middle school. During the job interview the principal asks how she would incorporate key 6th grade math skills into her PE and Health classes as the 6th grade students in the previous year did not attain Adequate Yearly Progress in mathematics.
- Danielle, a 1st year Science teacher in Ohio, is asked by Mr. Volderwell, a recent immigrant from Turkey and the parent of a 10th grade son Marius, to help him understand test results. When Marius first arrived at school he took the Test of Cognitive Skills and scored on the 85th percentile whereas on State Science Graduation test he took later in the school year he was classified as “proficient.”
- James, a 3rd year elementary school teacher, attends a class in gifted education over summer as standardized tests from the previous year indicated that while overall his class did well in reading the top 20% of his students did not learn as much as expected.
- Miguel, a 1st grade student, takes two tests in fall and the results indicate that his grade equivalent scores are 3.3 for reading and 3.0 for math. William’s parents want him immediately promoted into the 2nd grade arguing that the test results indicate that he already can read and do math at the 3rd grade level. Greg,, a 1st grade teacher explains to William’s parents that a grade equivalent score of 3.3 does not mean William can do 3rd grade work.
Understanding standardized testing is difficult, as there are numerous terms and concepts to master and recent changes in accountability under the No Child Left Behind Act (NCLB) have increased the complexity of the concepts and issues. In this chapter we focus on the information that beginning teachers need to know and begin with some basic concepts.
Standardized tests are created by a team – usually test experts from a commercial testing company who consult classroom teachers and university faculty – and are administered in standardized ways. Students not only respond to the same questions they also receive the same directions and have the same time limits. Explicit scoring criteria are used. Standardized tests are designed to be taken by many students within a state, province, or nation, and sometimes across nations. Teachers help administer some standardized tests and test manuals are provided that contain explicit details about the administration and scoring. For example, teachers may have to remove all the posters and charts from the classroom walls, read directions out loud to students using a script, and respond to student questions in a specific manner.
Criterion referenced standardized tests measure student performance against a specific standard or criterion. For example, newly hired firefighters in the Commonwealth of Massachusetts have to meet physical fitness standards by successfully completing a standardized physical fitness test that includes stair climbing, using a ladder, advancing a hose, and simulating a rescue through a doorway (Human Resources Division, n.d.). Criterion-referenced tests currently used in US schools are often tied to State content standards and provide information about what students can and cannot do. For example, one of the content standards for 4th grade reading in Kentucky is “Students will identify and describe the characteristics of fiction, nonfiction, poetry or plays” (Combined Curriculum Document Reading 4.1, 2006) and so a report on an individual student would indicate if the child can accomplish this skill. The report may state that number or percentage of items that were successfully completed (e.g. 15 out of 20, i.e., 75%) or include descriptions such as basic, proficient, or advanced which are based on decisions made about the percent of mastery necessary to be classified into these categories.
Norm-referenced standardized tests report students’ performance relative to others. For example, if a student scores on the 72nd percentile in reading it means she outperforms 72 percent of the students who were included in the test’s norm group. A norm group is a representative sample of students who completed the standardized test while it was being developed. For State tests the norm group is drawn from the state whereas for national tests the sample is drawn from the nation. Information about the norm groups is provided in a technical test manual that is not typically supplied to teachers but should be available from the person in charge of testing in the school district.
Reports from criterion and norm referenced tests provide different information. Imagine a nationalized mathematics test designed to basic test skills in 2nd grade. If this test is norm referenced, and Alisha receives a report indicating that she scored in the 85th percentile this indicates that she scored better than 85% of the students in the norm group who took the test previously. If this test is criterion-referenced Alisha’s report may state that she mastered 65% of the problems designed for her grade level. The relative percentage reported from the norm-referenced test provides information about Alisha’s performance compared to other students whereas the criterion referenced test attempts to describe what Alisha or any student can or cannot do with respect to whatever the test is designed to measure. When planning instruction classroom teachers need to know what students can and cannot do so criterion referenced tests are typically more useful (Popham, 2005). The current standard-based accountability and NCLB rely predominantly on criterion based tests to assess attainment of content-based standards. Consequently the use of standardized norm referenced tests in schools has diminished and is largely limited to diagnosis and placement of children with specific cognitive disabilities or exceptional abilities (Haertel & Herman, 2005).
Some recent standardized tests can incorporate both criterion- referenced and norm referenced elements in to the same test (Linn & Miller, 2005). That is, the test results not only provide information on mastery of a content standard but also the percentage of students who attained that level of mastery.
Standardized tests can sometimes be high stakes, meaning that performance on the test has important consequences of some sort. The consequences can be for students, e.g., passing a high school graduation test is required in order to obtain a diploma or passing PRAXIS II is a prerequisite to gain a teacher license. These consequences can be for schools, e.g., under NCLB an increasing percentage of students in every school must reach proficiency in math and reading each year. Consequences for schools who fail to achieve these gains include reduced funding and restructuring of the school building. Under NCLB, the consequences are designed to be for the schools, not individual students, and the test results may not accurately reflect what students know because the students may not try hard when the tests have low stakes for them (Wise & DeMars, 2005).
Uses of standardized tests
Standardized tests are used for a variety of reasons and the same test is sometimes used for multiple purposes. The uses include:
Assessing Students’ Progress in a Wider Context Well-designed teacher assessments provide crucial information about each student’s achievement in the classroom. However, teachers vary in the types of assessment they use so teacher assessments do not usually provide information on how students’ achievement compares to externally established criteria. Consider two 8th grade students, Brian and Joshua, who received A’s in their middle school math classes. However, on the standardized norm-reference math test Brian scored on the 50th percentile whereas Joshua scored on the 90th percentile. This information is important to Brian and Joshua, their parents, and the school personnel. Likewise, two 3rd grade students could both receive C’s on their report card in reading but one may pass 25% and the other 65% of the items on the criterion referenced state test.
There are many reasons that students’ performance on teacher assessments and standardized assessments may differ. Students may perform lower on the standardized assessment because their teachers have easy grading criteria, or there is poor alignment between the content they were taught and that on the standardized test, or they are unfamiliar with the type of items on the standardized tests, or they have test anxiety, or they were sick on the day of the test. Students may perform higher on the standardized test than on classroom assessments because their teachers have hard grading criteria, or the student does not work consistently in class (e.g., doesn’t turn in homework) but will focus on a standardized test, or the student is adept at the multiple choice items on the standardized tests but not at the variety of constructed response and performance items the teacher uses. We should always be very cautious about drawing inferences from one kind of assessment.
In some states, standardized achievement tests are required for home schooled students in order to provide parents and state officials information about the students’ achievement in a wider context. For example, in New York home schooled students must take an approved standardized test every other year in grades 4-8 and every year in grades 9-12. These tests must be administered in a standardized manner and the results filed with the Superintendent of the local school district. If a student does not take the tests or scores below the 33rd percentile the home schooling program may be placed on probation (New York State Education Department, 2005).
Diagnosing Student’s Strengths and Weaknesses Standardized tests, along with interviews, classroom observations, medical examinations, and school records are used to help diagnose students’ strengths and weaknesses. Often the standardized tests used for this purpose are administered individually to determine if the child has a disability. For example, if a kindergarten child is having trouble with oral communication, a standardized language development test could be administered to determine if there are difficulties with understanding the meaning of words or sentence structures, noticing sound differences in similar words, or articulating words correctly. It would also be important to determine if the child was a recent immigrant, had a hearing impairment or mental retardation. The diagnosis of learning disabilities typically involves the administration of at least two types of standardized tests – an aptitude test to assess general cognitive functioning and an achievement test to assess knowledge of specific content areas (Peirangelo & Giuliani, 2006). We discuss the difference between aptitude and achievement tests later in this chapter.
Selecting Students for Specific Programs Standardized tests are often used to select students for specific programs. For example, the SAT (Scholastic Assessment Test) and ACT (American College Test) are norm referenced tests used to help determine if high school students are admitted to selective colleges. Norm referenced standardized tests are also used, among other criteria, to determine if students are eligible for special education or gifted and talented programs. Criterion referenced tests are used to determine which students are eligible for promotion to the next grade or graduation from high school. Schools that place students in ability groups including high school college preparation, academic, or vocational programs may also use norm referenced or criterion referenced standardized tests. When standardized tests are used as an essential criteria for placement they are obviously high stakes for students.
Assisting Teachers’ Planning Norm referenced and criterion referenced standardized tests, among other sources of information about students, can help teachers make decisions about their instruction. For example, if a social studies teacher learns that most of the students did very well on a norm referenced reading test administered early in the school year he may adapt his instruction and use additional primary sources. A reading teacher after reviewing the poor end-of-the year criterion referenced standardized reading test results may decide that next year she will modify the techniques she uses. A biology teacher may decide that she needs to spend more time on genetics as her students scored poorly on that section of the standardized criterion referenced science test. These are examples of assessment for learning which involves data based decision making. It can be difficult for beginning teachers to learn use standardized test information appropriately, understanding that test scores are important information but also remembering that there are multiple reasons for students’ performance on a test.
Promoting Accountability Standardized tests results are increasingly used to hold teachers and administrators accountable for students’ learning. Prior to 2002, many Sates required public dissemination of students’ progress but under NCLB school districts in all states are required to send report cards to parents and the public that include results of standardized tests for each school. Providing information about students’ standardized tests is not new as newspapers began printing summaries of students’ test results within school districts in the 1970’s and 1980’s (Popham, 2006). However, public accountability of schools and teachers has been increasing in the US and many other countries and this increased accountability impacts the public perception and work of all teachers including those teaching in subjects or grade levels not being tested.
For example, Erin a middle school social studies teacher, said, “as a teacher in a "non-testing" subject area, I spend substantial instructional supporting the standardized testing requirements. For example, our school has instituted "word of the day,” which encourages teachers to use, define, and incorporate terminology often used in the tests (e.g., "compare", "oxymoron" etc.). I use the terms in my class as often as possible and incorporate them into written assignments. I also often use test questions of similar formats to the standardized tests in my own subject assessments (e.g., multiple choice questions with double negatives, short answer and extended response questions) as I believe that practice in the test question formats will help students be more successful in those subjects that are being assessed.” Accountability and standardized testing are two components of Standards Based Reform in Education that was initiated in the USA in 1980’s. The two other components are academic content standards which are described later in this chapter and teacher quality which was discussed in Chapter 1.
Types of Standardized Tests
Achievement Tests: Summarizing the Past. K-12 Achievement tests are designed to assess what students have learned in a specific content area. These tests include those specifically designed by States to access mastery of state academic content standards (see more details below) as well as general tests such as the California Achievement Tests, The Comprehensive Tests of Basic Skills, Iowa Tests of Basic Skills, Metropolitan Achievement Tests, and the Stanford Achievement Tests. These general tests are designed to be used across the nation and so will not be as closely aligned with State content standards as specifically designed tests. Some States and Canadian Provinces use specifically designed tests to assess attainment of content standards and also a general achievement test to provide normative information.
Standardized achievement tests are designed to be used for students in Kindergarten though high school. For young children questions are presented orally, and students may respond by pointing to pictures, and the subtests are often un-timed. For example, on Iowa Test of Basic Skills designed for students are young as kindergarten the vocabulary test assesses listening vocabulary. The teacher reads a word and may also read a sentence containing the word. Students are then asked to choose one of three pictorial response options.
Achievement tests are used as one criterion for obtaining a license in a variety of professions including nursing, physical therapy, and social work, accounting, and law. Their use in teacher education is recent and is part of the increased accountability of public education and most States require that teacher education students take achievement tests in order to obtain a teaching license. For those seeking middle school and high school licensure these are tests are in the content area of the major or minor (e.g., mathematics, social studies); for those seeking licenses in early childhood and elementary the tests focus on knowledge needed to teach students of specific grade levels. The most commonly used tests, the PRAXIS II series, developed by Educational Testing Service, include three types of tests:
- Subject Assessments that test general and subject-specific teaching skills and knowledge. They include both multiple-choice and constructed-response test items.
- Principles of Learning and Teaching (PLT) Tests assess general pedagogical knowledge at four grade levels: Early Childhood, K-6, 5-9, and 7-12. These tests are based on case studies and include constructed-response and multiple-choice items. Much of the content in this textbook is relevant to, and organized to parallel, the PLT tests.
- Teaching Foundations Tests assess pedagogy in five areas: multi-subject (elementary), English, Language Arts, Mathematics, Science, and Social Science. These tests include constructed-response and multiple-choice items.
Which tests teacher education students must take and the scores needed in order to pass each test vary and are determined by each US state.
Diagnostic Tests: Profiling Skills and Abilities Some standardized tests are designed to diagnose strengths and weaknesses in skills, typically reading or mathematics skills. For example, an elementary school child may have difficult in reading and one or more diagnostic tests would provide detailed information about three components (Joshi, 2003):
- word recognition, which includes phonological awareness (pronunciation), decoding, and spelling;
- comprehension which includes vocabulary as well as reading and listening comprehension, and
Diagnostic tests are often administered individually by school psychologists, following standardized procedures. The examiner typically records not only the results on each question but also observations of the child’s behavior such as distractibility or frustration. The results from the diagnostic standardized tests are used in conjunction with classroom observations, school and medical records, as well as interviews with teachers, parents and students to produce a profile of the student’s skills and abilities, and where appropriate diagnose a learning disability.
Aptitude Tests: Predicting the Future Aptitude tests, like achievement tests, measure what student have learned but rather than focusing on specific subject matter learned in school (e.g., Math, Science, English or Social Studies) the test items focus on verbal, quantitative, problem solving abilities that are learned in school or in the general culture (Linn & Miller, 2005). These tests are typically shorter than achievement tests and can be useful in predicting general school achievement. If the purpose of using a test is to predict success in a specific subject (e.g., Language Arts) the best prediction is past achievement in Language Arts and so scores on a Language Arts achievement test would be useful. However when the predictions more general (e.g. success in College) aptitude tests are often used. According to the test developers, both the ACT and SAT Reasoning tests, used to predict success in college, assess general educational development and reasoning, analysis and problem solving as well as questions on mathematics, reading and writing. The SAT Subject tests, which focus on mastery of specific subjects like English, history, mathematics, science, and language, are used by some colleges as entrance criteria and are more appropriately classified as achievement tests than aptitude tests even though they are used to predict the future.
Tests designed to assess general learning ability have traditionally been called intelligence tests, but are now often called learning ability tests, cognitive ability tests, scholastic aptitude tests, or school ability tests. The shift in terminology reflects the extensive controversy over the meaning of the term intelligence and that its traditional use was associated with inherited capacity. The more current terms emphasize that tests measure developed ability in learning not innate capacity. The Cognitive Abilities Test, for example, assesses K-12 students' abilities to reason with words, quantitative concepts, and nonverbal (spatial) pictures. The Woodcock Johnson III, for another example, contains cognitive abilities tests as well as achievement tests for ages 2 to 90 years.
High Stakes Testing by States
While many States had standardized testing programs prior to 2000, the number of state-wide tests has grown enormously since then because NCLB required that all states test students in reading and mathematics annually in grades 3-8 and at least once in high school by 2005-6. Twenty three states expanded their testing...(read more...)
Standards Based Assessment
- Academic Content Standards
- Alignment of Standards, Testing and Classroom Curriculum
- Sampling Content
- (read more...)
Adequate Yearly Progress
- Sub Groups
- (read more...)
Growth or Value Added models
One concern with how AYP is calculated is that it is based on an absolute level of student performance at one point in time and does not measure how much students improve during each year...The US Department of Education in 2006 allowed some states to include growth measures into their calculations of AYP...(read more...)
- Differing State Standards
- Implications for Beginning Teachers
- Testing in Canadian Provinces
- Other International Testing
- (read more...)
Understanding Test Results
In order to understand test results from standardized tests it is important to be familiar with a variety of terms and concepts that are fundamental to “measurement theory” - the academic study of measurement and assessment. Two major areas in measurement theory - reliability and validity were discussed in the previous chapter; in this chapter we focus on concepts and terms associated with test scores...(read more...)
- The Basics
- Frequency distributions
- Measures of Central Tendency and Variability
- Normal Distribution
- Kinds of Test Scores
- Standard Scores
- Grade Equivalent Scores
Issues with Standardized Tests
Many people have very strong views about the role of standardized tests in education. Some believe they provide an unbiased way to determine an individual’s cognitive skills as well as the quality of a school or district. Others believe that scores from standardized tests are capricious, do not represent what students know, and are misleading...(read more...)
- Are Standardized Tests biased?
- Do Teachers teach to the Tests?
- Do Students and Educators try to cheat?
Summary and Conclusions
Standardized tests are a fact of life for classroom teachers. It is important therefore to understand what they can—and can not—do. Broadly speaking, the tests either assess achievement, diagnose learning problems, or predict future academic performance. For better or for worse, test results are often also used for "high stakes" purposes: assessing whether students, teachers, and/or whole schools are accomplishing what they are supposed to accomplish academically. Because of their nature and limitations, standardized tests are prone to misuse—either by reinforcing social biases, by tempting teachers to teach to the test, by tempting students to cheat when taking them, or even by tempting teachers to cheat in reporting scores.
- Human Resources Division (n.d.). Firefighter Commonwealth of Massachusetts Physical Abilities Test (PAT) Accessed November 19, 2006 from http://www.mass.gov/?pageID=hrdtopic&L=2&L0=Home&L1=Civil+Service&sid=Ehrd
- Popham, W. J. (2005). Classroom Assessment: What teachers need to know. Boston:, MA: Pearson.
- Haertel, E. & Herman, J. (2005) A historical perspective on validity arguments for accountability testing. In J. L.Herman & E. H. Haertel (Eds.) Uses and misuses of data for educational accountability and improvement. 104th Yearbook of the National Society for the Study of Education. Malden, MA: Blackwell
- Linn, R. L., & Miller, M. D. (2005). Measurement and Assessment in Teaching 9th ed. Upper Saddle River, NJ: Pearson.
- Wise, S. L. & DeMars, C. W. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment 10(1), 1-17.
- New York State Education Department (2005). Home Instruction in New York State. Accessed on November 19, 2006 from 
- Peirangelo, R. & Guiliani, G. (2006). Special education assessment. Boston: Allyn & Bacon.
- Popham, W. J. (2006). Educator cheating on No Child Left Behind Tests. Education Week, 25(32) 32-33.
- Joshi, R. M. (2003). Misconceptions about the assessment and diagnosis of reading disability. Reading Psychology, 24, 247-266.
- Linn, R. L., & Miller, M. D. (2005). Measurement and Assessment in Teaching, 9th ed. Upper Saddle River, NJ: Pearson .