Tuesday 25 March 2014

System of Assessments

In developing a system of assessments, is to be committed to ensuring that its measurement reflects the expectations of content, rigor, and performance that make up the Common Core State Standards. To that end,  item specifications to demonstrate alignment through alignment methodologies reflective of Evidence-Centered Design theory. That alignment begins with an understanding of the goals of aligning assessments and standards . According to Norman Webb (2002), “alignment of expectations for student learning and assessments for measuring students’ attainment of these expectations is an essential attribute for an effective standards-based education system.” DeMauro (2004) states, “Alignment activities…should be the guiding principal of test design, and item alignment studies should be sources of validity documentation, as should any studies of test content.” Clearly, there is a close connection between validity and alignment, validity addressing the appropriateness of inferences drawn from test results and alignment having to do with “how well all policy elements [e.g., expectations and assessments] guide instruction and, ultimately, student learning (Webb, 1997). This is intended to be accomplished by both being aligned to the same content standards, thereby assuring that students have had the opportunity to learn the tested material. Indeed, ESEA now requires that state accountability assessments be aligned with state content standards. Webb (1997) identifies several categories of criteria for judging alignment. One that is most relevant to the activity of developing items: content focus – specifically, how well the tests and items/tasks will address the expectations embodied in the content specifications and the Common Core State Standards. Test content alignment is at the core of content validity and consequential validity (Martone and Sireci, 2009). Because of the high stakes associated with testing , more attention than ever before has been given to test alignment. The emphasis on test content in alignment and validity studies is understandable. After all, a test is a small sampling of items from a much larger universe of possible items covering, at least in state assessments, a very broad domain. Thus, for inferences from test results to be justifiable, that sample of items has to be a good one – a good representation of the broad domain, providing strong evidence to support claims based on the test results. The structure of content and alignments within pairs of elements in that structure.


The concept of Universal Design focuses on “the design of products and environments to be usable by all people, to the greatest extent possible, without the need for adaptation or specialized design” (CUD, 1997). When applied to the development of assessment items and tasks, the concept of Universal Design aims to create items and tasks that accurately measure the targeted knowledge, skills, and abilities for all students. However, the concept of Universal Design recognizes that a single solution rarely, if ever, functions well for all users. For this reason, Universal Design also embraces the concept of allowing users to select from multiple alternatives. As Rose and Meyer emphasize, “Universal Design does not imply ‘one size fits all’ but rather acknowledges the need for alternatives to suit many different people’s needs…the essence of Universal Design is flexibility and the inclusion of alternatives to adapt to the myriad variations in learner needs, styles, and preferences” (Rose & Meyer, p. 4). When developing assessment items and tasks, the spirit of Universal Design is captured by first applying the general guidelines to design items and tasks that work well for a broad range of students and then applying the accompanying guidelines to develop adaptations that extend the ability of an item or task to also accurately measure students with specialized access needs. When applied to assessment items and tasks, Universal Design has two important implications. First, Universal Design requires item writers to consider the full range of students who are expected to be measured by an item or task and to design the item to function appropriately for the widest range of these students without adaptation. The item specifications and guidelines provide several considerations that can expand the range of students for which an item or task functions well. As an example, using vocabulary that is commonly used in school rather than vocabulary that is associated with specialized activities that may not be familiar to all students (e.g., sport-specific terminology such as “ski binding” or “putter,” hobby-specific vocabulary such as “yarn over” or “rabbet joint”) can improve the accuracy with which an item or task is able to stimulate the targeted knowledge, skill, and ability of students who are unfamiliar with such specialized vocabulary. Similarly, minimizing the use of visual materials such as figures, graphs, and maps to those cases when they are absolutely required by an item can improve an item or task’s functioning for students with visual needs and for students who have challenges processing multiple pieces of information. Second, Universal Design requires item writers to create items that support adaptations that are designed to meet the needs of specific subgroups of students. As an example, minimizing the complexity of visual materials so that they can be described verbally or represented as a tactile image supports the adaptations of that content for students with visual needs.


Valid assessment of student knowledge, skills, and abilities requires a two-way communication between an assessment item and a student that involves three critical steps. The first step in this communication process focuses on presenting information to a student in order to activate or stimulate the knowledge, skill, or ability that is the target of assessment. Second, the student is provided an opportunity to interact with content that is presented by an item as s/he applies the targeted knowledge, skill, or ability. Third, the student provides evidence about their knowledge, skill, or ability through their response to the assessment item or task. It is through this three-step process that an assessment item or task attempts to access the targeted knowledge, skills, or abilities that operate within the student. Access by Design is an approach to developing items and tasks that aims to improve the accuracy with which assessment items and tasks measure targeted knowledge, skills, and abilities by maximizing the range of students for which an item accurately stimulates the assessment target, allows the student to interact with content as they apply their knowledge, skills, and abilities, and enables students to produce responses that accurately reflect the outcome of their thinking. Maximizing the range of students for which items and tasks provide valid measures of the target of assessment involves a three-step process. The first step, which is a core component of Evidence Centered Design, is to clearly define the knowledge, skills, and/or abilities that are the target of assessment. The use of the term “assessment target” to refer to the knowledge, skills, and/or abilities that are the target of assessment. When defining an assessment target, it is critical to clearly articulate the knowledge, skill, or ability that is intended to be measured. As part of this process, it is important to consider what knowledge, skill, and ability the student must bring to the item in order to succeed, and what knowledge, skills, or abilities are not intended to be measured. As an example, a mathematics item that asks student to perform addition with two digits in the context of a real-world problem might require the student to bring to the item knowledge of addition, knowledge of the number system, and an ability to relate real-world situations to appropriate mathematical operations. However, this item might not intend to measure a student’s ability to read print-based text. Clearly defining assessment targets and carefully considering what is and is not intended to be measured is an essential first step in maximizing the validity of assessment. The second step focuses on applying principles of Universal Design to the design and authoring of the content that forms each assessment item and task. The third step involves providing extensions to assessment content in order to better meet specific accessibility needs. One example of an extension is specifying how text-based content is to be presented in braille form. Key to providing extensions, however, is careful consideration of whether accessibility supports provided through an extension infringe on the knowledge, skills, and/or ability that is the target of assessment. When this occurs, students may be better able to access the item, but the item no longer provides a valid measure of the assessment target. Together, the application of principles of Universal Design and the use of extensions designed to meet specific access needs are the foundation of Access by Design. While the goal of applying principles of Universal Design is to develop items that function well for all students, the Access by Design model recognizes that extensions to item content may be necessary to maximize the range of students for which an item or task accurately measures the targeted knowledge, skills, and abilities.


Performance Tasks in their Race to the Top application as follows: [Performance tasks]…will provide a measure of the student’s ability to integrate knowledge and skills across multiple [content] standards — a key component of college- and career readiness. Performance [tasks] will be used to better measure capacities such as depth of understanding, research skills, and complex analysis, which cannot be adequately assessed with [selected response] or constructed response items. (p. 42). I has identified the essential characteristics by specifying a performance task must:
• Integrate knowledge and skills across multiple content standards or English language arts strands/mathematics domains;
• Measure capacities such as depth of understanding, research skills, and/or complex analysis with relevant evidence;
• Require student-initiated planning, management of information and ideas, and/or interaction with other materials;
• Require production of more extended responses (e.g., oral presentations, exhibitions, product development), in addition to more extended written responses that might be revised and edited;
• Reflect a real-world task and/or scenario-based problem;
• Lend itself to multiple approaches;
• Represent content that is relevant and meaningful to students;
• Allow for demonstration of important knowledge and skills, including those that address 21st century skills such as critically analyzing and, synthesizing media texts;
• Focus on big ideas over facts;
• Allow for multiple points of view and interpretations;
• Require scoring that focuses on the essence of the task;
• Reflect one or more of the Standards for Mathematical Practice, Reading and Writing (or Speaking and Listening) processes; and
• Seem feasible for the school/classroom environment.

In short, performance tasks should:

• Integrate knowledge and skills across multiple claims and targets;
• Measure capacities such as depth of understanding, research skills, and/or complex analysis with relevant evidence;
• Require student-initiated planning, management of information/data and ideas, and/or interaction with other materials;
• Reflect a real-world task and/or scenario-based problem;
• Allow for multiple approaches;
• Represent content that is relevant and meaningful to students;
• Allow for demonstration of important knowledge and skills, including those that address 21st century skills such as critically analyzing and synthesizing information presented in a variety of formats, media, etc.;
• Require scoring that focuses on the essence of the Claim(s) and Targets for which the task was written. Scoring rules are described in detail in the Performance Task section of the content-specific item specifications documentation;
• Be feasible for the school/classroom environment.