Thursday, September 22, 2016

Creating Summative Assessment: The Table of Specification

WHAT IS A TABLE OF SPECIFICATIONS? 
A TOS, sometimes called a test blueprint, is a matrix where rows consists of the specific topic or competencies and columns are the objectives cast in terms of Bloom's Taxonomy. It is a table that helps teachers align objectives, instruction, and assessment (e.g., Notar, Zuelke, Wilson, & Yunker, 2004). This strategy can be used for a variety of assessment methods but is most commonly associated with constructing traditional summative tests. The TOS can help teachers map the amount of class time spent on each objective with the cognitive level at which each objective was taught thereby helping teachers to identify the types of items they need to include on their tests. There are different versions of these tables or blueprints (e.g., Linn & Gronlund, 2000; Mehrens & Lehman, 1973; Nortar et al., 2004), and the one presented here is one that we have found most useful in our own teaching. This tool can be modified to best meet your needs in developing classroom tests.
WHAT IS THE PURPOSE OF A TABLE OF SPECIFICATIONS? 
In order to understand how to best modify a TOS to meet your needs, it is important to understand the goal of this strategy: improving validity of a teacher’s evaluations based on a given assessment. 
Validity is the degree to which the evaluations or judgments we make as teachers about our students can be trusted based on the quality of evidence we gathered (Wolming & Wilkstrom, 2010). It is important to understand that validity is not a property of the test constructed, but of the inferences we make based on the information gathered from a test. When we consider whether or not the grades we assign to students are accurate we are questioning the validity of our judgment. When we ask these questions we can look to the kinds of evidence endorsed by researchers and theorists in educational measurement to support the claims we make about our students (AERA, APA, NCME, 1999). For classroom assessments two sources of validity evidence are essential: evidence based on test content and evidence based on response process (APA, AERA, NCME, 1999).
Evidence based on test content underscores the degree to which to which a test measures what it is designed to measure (Wolming & Wilkstrom, 2010). This means that your classroom tests must be aligned to the content (subject matter) taught in order for any of your judgments about student understanding and learning to be meaningful. Essentially, with test-content evidence we are interested in knowing if the measured (tested/assessed) objectives reflect what you claim to have measured.
Response process evidence is the second source of validity evidence that is essential to classroom teachers. Response process evidence is concerned with the alignment of the kinds of thinking required of students during instruction and during assessment (testing) activities.
Sometimes the tests teachers administer have evidence for test content but not response process. That is, while the content is aligned with instruction the test does not address the content at the same depth or level of meaning that was experienced in class. When students feel that they are being tricked or that the test is overly specific (nit-picky) there is probably an issue related to response process at play. As test constructors we need to concern ourselves with evidence of response process. One way to do this is to consider whether the same kind of thinking is used during class activities and summative assessments. If the class activity focused on memorization then the final test should also focus on memorization and not on a thinking activity that is more advanced.
Table 1 provides two possible test items to assess the understanding of the digestion process. In Table 1, Item 1 assesses whether or not students can identify the organ in the digestion process. Item 2 assesses whether or not students can apply the concepts learned in the digestion process described in the scenario. Thus, these two items require different levels of thinking and understanding of the same content (i.e., recognizing/identifying vs. evaluating/applying). Evidence of response process ensures that classroom tests assess the level of thinking that was required for students during their instructional experiences.

Table 1: Examples of items assessing different cognitive levels 
Item 1: A digestive organ that holds food while it is being mixed with enzymes that continue the process of breaking down food into a usable form is _________.
       a. Small intestine
       b. Stomach
       c. Esophagus
       d. Large intestine
Item 2: Drex is eating his snacks during recess time. What do you think will happen to the food as it enters in the stomach? The food ______
       a. is stored for 12 hours then it is excreted through the anus
       b. moves to the large intestine where nutrients were absorbed
       c. is chemically digested by strong acids and powerful enzymes
       d. becomes solid then it is broken into smaller pieces

LEVELS OF THINKING
There are six levels of thinking as identified by Bloom in the 1950’s and these levels were revised by a group of researchers in 2001 (Anderson et al). Thinking that emphasizes recall, memorization, identification, and comprehension, is typically considered to be at a lower level. Higher levels of thinking include processes that require learners to apply, analyze, evaluate, and synthesize.
When considering test items people frequently confuse the type of item (e.g., multiple choice, true false, essay, etc.) with the type of thinking that is needed to respond to it. All types of item formats can be used to assess thinking at both high and low levels depending on the context of the question. For example an essay question might ask students to “Describe four causes of the colon cancer.” On the surface this looks like a higher level question, and it could be. However, if students were taught “The four causes of the Colon Cancer were…” verbatim from a text, then this item is really just a low-level recall task. Thus, the thinking level of each item needs to be considered in conjunction with the learning experience involved. In order for teachers to make valid judgments about their students’ thinking and understanding then the thinking level of items need to match the thinking level of instruction. The Table of Specifications provides a strategy for teachers to improve the validity of the judgments they make about their students from test responses by providing content and response process evidence.
EVIDENCE FOR TEST CONTENT
One approach to gathering evidence of test content for your classroom tests is to consider the amount of actual class time spent on each objective. Things that were discussed longer or in greater detail should appear in greater proportion on your test. This approach is particularly important for subject areas that teach a range of topics across a range of cognitive levels. In a given unit of study there should be a direct relation between the amount of class time spent on the objective and the portion of the final assessment testing that objective. If you only spent 10% of the instructional time on an objective, then the objective should only count for 10% of the assessment. A TOS provides a framework for making these decisions.
A review of Table 2 reveals a 5 column TOS (labeled A-E). The information in column A is taken directly from the teacher’s lesson plans and curriculum guides. Using a TOS helps teachers to be accountable for the content they teach and the time they allocate to each objective (Nortar et al., 2004). The numbers in Column B represents the number of test items to be constructed in each objectives. Columns C, D & E are the different cognitive levels based on Bloom’s taxonomy. Column C contains easy level (remembering & understanding) objectives. Column D consists of average (applying) level objectives while Column E includes difficult level which are higher than application. The percentage allotted in each level is arbitrary however recommended in the basic education. To determine the number of items in each objective (column B), decide first how may items your test should have and multiply this on the % allotted in each cognitive level. For instance you decide to make a 20-item test hence 20 x 60% = 12. So your easy level may contain 12 items equally distributed to objectives 1 & 2. The distribution of items in each objective can be arbitrary or based on the time spent on the objective/topic. In this case the teacher distributed the items arbitrarily based on his/her understanding of the learners.
Test placement is the location of your test items in your test. Hence for objective 1, the 6 test items allocated for it is found in numbers 1-6 in the test. Moreover, the teacher makes use of series or sequential arrangement of test items. Test placement can be randomly arrange too and depends on the professional decision of the teacher. For instance, you may randomly distribute the 6 items in the following locations: 1, 3, 5, 7, 9, 11. The percentage can be determined by dividing the number of items allocated in each objective by the total number of items then multiply it to 100%. For instance, (4/20) x 100% =20%.

Table 2: A Sample Table of Specifications for Fourth Grade Summative Test in Science: Digestive System

A
B
C
D
E
Instructional Objectives
Number of Test Items
Easy (60%)
remembering, understanding
Average (30%)
applying
Difficult (10%)
analyzing, evaluating, creating
Test Placement
%
Test Placement
%
Test Placement
%
1. Identify the major parts of the digestive system
4
1-4
20%




2. Describe the functions of the parts of the digestive system
4
5-8
20%




3. Discuss the importance of food digestion
4
9-12
20%




4. Trace the path of food in the digestive system and the changes the food undergoes
2


13-14
10%


5. Practice desirable health habits to keep the digestive system healthy
4


15-18
20%


6. Suggest solution to prevent the common ailments of the digestive system
2




19-20
10%
Total
20
12
60%
6
30%
2
10%

HOW MANY ITEMS SHOULD BE ON YOUR SUMMATIVE TEST?
In the total of Column B of Table 2, you should note that for this test the teacher has decided to use 20 items. The number of items to include on any given test is a professional decision made by the teacher based on the number of objectives in the unit, his/her understanding of the students, the class time allocated for testing, and the importance of the assessment. Shorter assessments can be valid, provided that the assessment includes ample evidence on which the teacher can base inferences about students’ scores.
Typically, because longer tests can include a more representative sample of the instructional objectives and student performance, they generally allow for more valid inferences. However, this is only true when test items are good quality. Furthermore, students are more likely to get fatigued with longer tests and perform less well as they move through the test. Therefore, we believe that the ideal test is one that students can complete in the time allotted, with enough time to brainstorm any writing portions, and to check their answers before turning in their completed assessment.
THE TOS IS A TOOL FOR EVERY TEACHER
The cornerstone of classroom assessment practices is the validity of the judgments about students’ learning and knowledge (Wolming & Wilkstrom, 2010). A TOS is one tool that teachers can use to support their professional judgment when creating or selecting test for use with their students. The TOS can be used in conjunction with lesson and unit planning to help teacher make clear the connections between planning, instruction, and assessment.

No comments:

Post a Comment