Wednesday, September 28, 2016

Appropriateness of Assessment Methods

Overview

Learning targets are very essential to assessment. If learning targets are clearly stated, assessment can be precise and accurate. Once learning targets are clearly set, you can easily determine the appropriate assessment method. Assessment methods can be categorized according to the nature and characteristics of each method. McMillan (2007) identified four major categories: selected-response, constructed -response, teacher observation, and student self-assessment. 

Categories of Assessment Methods

  1. Selected-Response Format. In this format, students select from a given set of options to answer a question or a problem. The selected-response items are objective and efficient. It can easily be graded. The teacher can assess and score a great deal of content quickly. It can be objective tests or using a checklist. Objective tests are appropriate for assessing various levels of the hierarchy of educational objectives. These can be multiple-choice tests, true-false tests, or match type tests. The checklist is a list of items for consideration. It can be in the form of questions or actions to be carried out. It can speed up the collection of information by using tick-boxes and rating scales. The checklist needs to be carefully designed to make sure that when they are completed, the results are reliable and true. Checklists can act as memory aids to make sure that all the relevant issues have been considered.
  2. Constructed-Response Format. This format is more useful in targeting higher levels of cognition. It is also subjective because it demands that students create or produce their own answers in response to a question, problem, or task. In this type, items may fall under any of the following categories: brief-constructed response item; performance task; essay items/assessment; or oral questioning. 
    • Brief-constructed response items require only short responses from students. Examples include sentence completion where students fill in a blank at the end of a statement; short answer to open-ended questions; labeling a diagram; or answering a Math problem by showing their solutions.
    • Performance tasks/assessment requires students to perform a task rather than select from a given set of options. Unlike brief-constructed response items, students have to come up with a more extensive and elaborate answer or response. Performance tasks are called authentic or alternative assessments because students are required to demonstrate what they can do through activities, problems, and exercises. As such, they can be more valid indicators of students' knowledge and skills than other assessment methods. A scoring rubric containing the performance criteria is needed when grading performance tasks. In addition, performance tasks provide opportunities for students to apply their knowledge and skills in real-world contexts. It can be product-based or skills-oriented. This means that students have to create or produce evidence of their learning or do something and exhibit their skills. Examples of products such as book reports, maps, projects, poems, portfolios, audio-visual materials, charts, diagrams, worksheets, reflection papers, journals, and other creative endeavors. These are frequently rated using product rating scales or performance tests checklist. These are tools that state the specific criteria and allow teachers and students to gather information and to make judgments about what students know and can do in relation to the outcomes. Product rating scales offer systematic ways of collecting data about specific behaviors, knowledge, and skills while the performance test checklist determines whether or not an individual behaves in a certain way when asked to complete a task. For instance, the performance test checklist may consist of a list of behaviors that make up a certain type of performance such as using a microscope or typing a letter.
    • Essay tests/assessments involve answering a question or proposition in written form. It is powerful in the sense that it allows students to express themselves and demonstrate their reasoning. It can also assess student's grasp of higher-level cognitive skills particularly in the areas of application, and higher than application. When the essay question is not sufficiently precise and when the parameters are not properly defined, students write irrelevant and unnecessary things just to fill in blank spaces. When this happens both teachers & students will experience difficulty and frustration.
    • Oral questioning also known as the Socratic method is a form of cooperative argumentative dialogue discussion between individuals, based on asking and answering questions to stimulate critical thinking and to draw out ideas and underlying presumptions (Copeland, 2005). It is a collaborative and open-minded discussion as opposed to debate. In oral questioning, several factors need to be considered such as the students’ state of mind and feelings, anxiety, and nervousness in making oral presentations which could mask student’s true ability. Moreover, oral questioning is appropriate when objectives are: to assess the students' stock knowledge, and to determine the students' ability to communicate ideas in a coherent verbal sentence.
  3. Teacher Observations and Student Self-Reports/Assessments. These are useful supplementary assessment methods when used in conjunction with oral questioning and performance tests. Teacher observations basically record the frequency of student behaviors, activities, and remarks in the anecdotal records. Anecdotal records allow educators to record qualitative information, like details about a child’s specific behavior or the conversation between two children or how students respond to oral questions and behave during individual and collaborative activities. These details can help educators plan activities, experiences, and interventions. This assessment method can also be used to assess the effectiveness of teaching strategies and academic interventions. On the other hand, student self-reports are one of the standards of quality assessment identified by Chappuis, Chappuis & Stiggins (2009). It is a process where students are given a chance to reflect and rate their own work and judge how well they have performed in relation to a set of assessment criteria. Students track and evaluate their own progress or performance making use of a self-rating checklist that lists several characteristics or activities presented to the subject of a study. This is often employed by teachers when they want to diagnose or appraise the performance of students from the point of view of the students themselves.

Finally, the various categories of assessment methods mentioned above can be used depending on the clarity of your learning objective/s. For instance, what assessment method you use for this objective, "To identify the parts of the digestive system"? Would an essay test be appropriate? The objective is low level and a simple competency. It involves only one specific skill which is "identifying" hence essay is not an appropriate assessment method for this. A multiple-choice test or match type test is a much appropriate assessment method for this.

Matching Learning Targets with Assessment Methods

In an outcome-based approach, teaching methods and resources that are used to support learning, as well as assessment tasks and rubrics, are explicitly linked to the program and course learning outcomes. Biggs and Tang (2007) call this constructive alignment. This provides the "how-to" by verifying that the teaching-learning activities (TLAs) and the assessment tasks (ATs) activate the same verbs as in the intended learning outcomes (ILOs). The performance verbs in ILOs are indicators of the methods of assessment suitable to measure and evaluate student learning. In addition, McMillan (2007) prepared a scorecard as a guide on how well a particular assessment method measures each level of learning (see table 1).

Table 1. Learning Targets and Assessment Methods (McMillan, 2007)
Note: Higher numbers indicate better matches (e.g. 5 = excellent; 1 = poor)

Measuring Knowledge & Simple Understanding

Knowledge appears to be the simplest and lowest level of cognitive taxonomies (Bloom, 1956; Anderson & Krathwol, 2004) but is further classified into what type of thought process is involved in learning. The revision of Bloom's taxonomy (Anderson & Krathwol, 2004) recognizes how remembering can be viewed not only as being able to recall but also as being necessary for learning interrelationships among basic elements and in learning methods, strategies and procedures. McMillan (2007) refers to this as a simple understanding requiring comprehension of "concepts, ideas, and generalizations" known as declarative knowledge and application of skills and procedures learned in new situations, referred to as procedural knowledge (see table 2).

source: De Guzman & Admaos, 2015

Measuring Deep Understanding

Beyond knowledge and simple understanding, comes deep understanding which requires more complex thinking processes. McMillan (2007) utilizes a Knowledge/Understanding continuum to illustrate the relative degree of understanding from knowledge to simple understanding to deep understanding (see table 3).

source: De Guzman & Admaos, 2015
Table 4 describes the relationship between learning outcomes and test types. It can be observed that test types can be made flexible and versatile to test different levels of outcomes and not to be limited or exclusive to only one cognitive level. The arrows suggest that supply or selection type can be used for both lower-level as well as higher-level outcomes. 
source: De Guzman & Admaos, 2015

References:

  1. Anderson, L. & Krathwol, D. (2004). A taxonomy for learning, teaching, and assessing (A revision of Bloom's taxonomy of educational objectives). NY: D. Mackay
  2. Chappuis, S., Chappuis, J. & Stiggins, R. (2009). The quest for quality: Educational Leadership, 67(3), 14-19.
  3. De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. QC Manila: Adriana Pub. Co., Inc.
  4. McMillan, J. (2007). Classroom assessment: Principle and practice for effective standards-based instruction, 4th ed. USA: Pearson Education, Inc.
  5. Santos, R. D. (2007). Assessment of Learning 1. Quezon City: Lorimar

Thursday, September 22, 2016

Creating Summative Assessment: The Table of Specification

WHAT IS A TABLE OF SPECIFICATIONS? 
A TOS, sometimes called a test blueprint, is a matrix where rows consists of the specific topic or competencies and columns are the objectives cast in terms of Bloom's Taxonomy. It is a table that helps teachers align objectives, instruction, and assessment (e.g., Notar, Zuelke, Wilson, & Yunker, 2004). This strategy can be used for a variety of assessment methods but is most commonly associated with constructing traditional summative tests. The TOS can help teachers map the amount of class time spent on each objective with the cognitive level at which each objective was taught thereby helping teachers to identify the types of items they need to include on their tests. There are different versions of these tables or blueprints (e.g., Linn & Gronlund, 2000; Mehrens & Lehman, 1973; Nortar et al., 2004), and the one presented here is one that we have found most useful in our own teaching. This tool can be modified to best meet your needs in developing classroom tests.
WHAT IS THE PURPOSE OF A TABLE OF SPECIFICATIONS? 
In order to understand how to best modify a TOS to meet your needs, it is important to understand the goal of this strategy: improving validity of a teacher’s evaluations based on a given assessment. 
Validity is the degree to which the evaluations or judgments we make as teachers about our students can be trusted based on the quality of evidence we gathered (Wolming & Wilkstrom, 2010). It is important to understand that validity is not a property of the test constructed, but of the inferences we make based on the information gathered from a test. When we consider whether or not the grades we assign to students are accurate we are questioning the validity of our judgment. When we ask these questions we can look to the kinds of evidence endorsed by researchers and theorists in educational measurement to support the claims we make about our students (AERA, APA, NCME, 1999). For classroom assessments two sources of validity evidence are essential: evidence based on test content and evidence based on response process (APA, AERA, NCME, 1999).
Evidence based on test content underscores the degree to which to which a test measures what it is designed to measure (Wolming & Wilkstrom, 2010). This means that your classroom tests must be aligned to the content (subject matter) taught in order for any of your judgments about student understanding and learning to be meaningful. Essentially, with test-content evidence we are interested in knowing if the measured (tested/assessed) objectives reflect what you claim to have measured.
Response process evidence is the second source of validity evidence that is essential to classroom teachers. Response process evidence is concerned with the alignment of the kinds of thinking required of students during instruction and during assessment (testing) activities.
Sometimes the tests teachers administer have evidence for test content but not response process. That is, while the content is aligned with instruction the test does not address the content at the same depth or level of meaning that was experienced in class. When students feel that they are being tricked or that the test is overly specific (nit-picky) there is probably an issue related to response process at play. As test constructors we need to concern ourselves with evidence of response process. One way to do this is to consider whether the same kind of thinking is used during class activities and summative assessments. If the class activity focused on memorization then the final test should also focus on memorization and not on a thinking activity that is more advanced.
Table 1 provides two possible test items to assess the understanding of the digestion process. In Table 1, Item 1 assesses whether or not students can identify the organ in the digestion process. Item 2 assesses whether or not students can apply the concepts learned in the digestion process described in the scenario. Thus, these two items require different levels of thinking and understanding of the same content (i.e., recognizing/identifying vs. evaluating/applying). Evidence of response process ensures that classroom tests assess the level of thinking that was required for students during their instructional experiences.

Table 1: Examples of items assessing different cognitive levels 
Item 1: A digestive organ that holds food while it is being mixed with enzymes that continue the process of breaking down food into a usable form is _________.
       a. Small intestine
       b. Stomach
       c. Esophagus
       d. Large intestine
Item 2: Drex is eating his snacks during recess time. What do you think will happen to the food as it enters in the stomach? The food ______
       a. is stored for 12 hours then it is excreted through the anus
       b. moves to the large intestine where nutrients were absorbed
       c. is chemically digested by strong acids and powerful enzymes
       d. becomes solid then it is broken into smaller pieces

LEVELS OF THINKING
There are six levels of thinking as identified by Bloom in the 1950’s and these levels were revised by a group of researchers in 2001 (Anderson et al). Thinking that emphasizes recall, memorization, identification, and comprehension, is typically considered to be at a lower level. Higher levels of thinking include processes that require learners to apply, analyze, evaluate, and synthesize.
When considering test items people frequently confuse the type of item (e.g., multiple choice, true false, essay, etc.) with the type of thinking that is needed to respond to it. All types of item formats can be used to assess thinking at both high and low levels depending on the context of the question. For example an essay question might ask students to “Describe four causes of the colon cancer.” On the surface this looks like a higher level question, and it could be. However, if students were taught “The four causes of the Colon Cancer were…” verbatim from a text, then this item is really just a low-level recall task. Thus, the thinking level of each item needs to be considered in conjunction with the learning experience involved. In order for teachers to make valid judgments about their students’ thinking and understanding then the thinking level of items need to match the thinking level of instruction. The Table of Specifications provides a strategy for teachers to improve the validity of the judgments they make about their students from test responses by providing content and response process evidence.
EVIDENCE FOR TEST CONTENT
One approach to gathering evidence of test content for your classroom tests is to consider the amount of actual class time spent on each objective. Things that were discussed longer or in greater detail should appear in greater proportion on your test. This approach is particularly important for subject areas that teach a range of topics across a range of cognitive levels. In a given unit of study there should be a direct relation between the amount of class time spent on the objective and the portion of the final assessment testing that objective. If you only spent 10% of the instructional time on an objective, then the objective should only count for 10% of the assessment. A TOS provides a framework for making these decisions.
A review of Table 2 reveals a 5 column TOS (labeled A-E). The information in column A is taken directly from the teacher’s lesson plans and curriculum guides. Using a TOS helps teachers to be accountable for the content they teach and the time they allocate to each objective (Nortar et al., 2004). The numbers in Column B represents the number of test items to be constructed in each objectives. Columns C, D & E are the different cognitive levels based on Bloom’s taxonomy. Column C contains easy level (remembering & understanding) objectives. Column D consists of average (applying) level objectives while Column E includes difficult level which are higher than application. The percentage allotted in each level is arbitrary however recommended in the basic education. To determine the number of items in each objective (column B), decide first how may items your test should have and multiply this on the % allotted in each cognitive level. For instance you decide to make a 20-item test hence 20 x 60% = 12. So your easy level may contain 12 items equally distributed to objectives 1 & 2. The distribution of items in each objective can be arbitrary or based on the time spent on the objective/topic. In this case the teacher distributed the items arbitrarily based on his/her understanding of the learners.
Test placement is the location of your test items in your test. Hence for objective 1, the 6 test items allocated for it is found in numbers 1-6 in the test. Moreover, the teacher makes use of series or sequential arrangement of test items. Test placement can be randomly arrange too and depends on the professional decision of the teacher. For instance, you may randomly distribute the 6 items in the following locations: 1, 3, 5, 7, 9, 11. The percentage can be determined by dividing the number of items allocated in each objective by the total number of items then multiply it to 100%. For instance, (4/20) x 100% =20%.

Table 2: A Sample Table of Specifications for Fourth Grade Summative Test in Science: Digestive System

A
B
C
D
E
Instructional Objectives
Number of Test Items
Easy (60%)
remembering, understanding
Average (30%)
applying
Difficult (10%)
analyzing, evaluating, creating
Test Placement
%
Test Placement
%
Test Placement
%
1. Identify the major parts of the digestive system
4
1-4
20%




2. Describe the functions of the parts of the digestive system
4
5-8
20%




3. Discuss the importance of food digestion
4
9-12
20%




4. Trace the path of food in the digestive system and the changes the food undergoes
2


13-14
10%


5. Practice desirable health habits to keep the digestive system healthy
4


15-18
20%


6. Suggest solution to prevent the common ailments of the digestive system
2




19-20
10%
Total
20
12
60%
6
30%
2
10%

HOW MANY ITEMS SHOULD BE ON YOUR SUMMATIVE TEST?
In the total of Column B of Table 2, you should note that for this test the teacher has decided to use 20 items. The number of items to include on any given test is a professional decision made by the teacher based on the number of objectives in the unit, his/her understanding of the students, the class time allocated for testing, and the importance of the assessment. Shorter assessments can be valid, provided that the assessment includes ample evidence on which the teacher can base inferences about students’ scores.
Typically, because longer tests can include a more representative sample of the instructional objectives and student performance, they generally allow for more valid inferences. However, this is only true when test items are good quality. Furthermore, students are more likely to get fatigued with longer tests and perform less well as they move through the test. Therefore, we believe that the ideal test is one that students can complete in the time allotted, with enough time to brainstorm any writing portions, and to check their answers before turning in their completed assessment.
THE TOS IS A TOOL FOR EVERY TEACHER
The cornerstone of classroom assessment practices is the validity of the judgments about students’ learning and knowledge (Wolming & Wilkstrom, 2010). A TOS is one tool that teachers can use to support their professional judgment when creating or selecting test for use with their students. The TOS can be used in conjunction with lesson and unit planning to help teacher make clear the connections between planning, instruction, and assessment.