After taking a final exam, a few of my classmates and I were discussing the fact that we were allowed to bring in one sheet of paper with whatever we wanted on it (i.e. a “cheat sheet”). Several of them espoused the notion that this was defeating the purpose of tests. I disagree. A correct appraisal of a test requires understanding the variables that it wishes to observe, the implicit relationship it assumes between these variables and the obstacles the test poses, and how well its estimates of the test-takers’ skills fit reality.
The purpose of tests is to cut down on information asymmetries between educator and educated. In a world with perfect information, a teacher would be able to directly observe how much the student understands the material taught in class and then make a decision on whether that is enough or not. Since this is not possible in the real world, tests are the best tools we’ve come up with to ascertain a pupil’s understanding.
The problem is that understanding is in itself difficult to observe. What we get back from tests are really mixed results from a variety of factors (i.e. the result of a function as opposed to its inputs) – performance under pressure, memory, computational ability, and actual understanding of the concepts. In this sense the instructor’s problem is very similar to that of the scientist – he observes a complex phenomenon and would like to isolate the effect of each variable to understand the structure of the whole.
Much can be said about the distinction between memory and understanding – I will go no further than suggesting that memory is the compilation of data and understanding is the connection of these to form a mental model of some event or process. Of course, there is no reason why understanding should be the only attribute that we want to observe.
The different types of exams in the field reflect different hypotheses on the nature of understanding, or attempts to isolate particular variables by controlling for the rest. A take-home test, for example, significantly diminishes the constraints of time, memory, and computational ability – if designed well, it could be one of the more effective ways of testing understanding. Tests where calculators are allowed remove most computational difficulties. As a last example, the “cheat-sheet” test removes much of the memory constraint, leaving the computation, time, and understanding elements combined.
Naturally, tests must be graded on a fixed scale, and isolating skills (by removing the corresponding obstacles) implies that whatever remains to be tested should be given added weight in the overall score. This adds to the complexity of test design, since it requires an estimation of what the average test-taker’s strengths will be in order to re-scale the exam.
If, for example, an instructor suspects that all his students are extremely good under pressure, a test without a time-limit should not be much harder than a timed test, since the constraint won’t make much difference. On the other hand, if memorizing facts is suspected to be a large (and unwanted) part of a test’s difficulty, an open-book test should be more difficult than its regular counterpart in the same proportion that memory plays in the overall skill-set for succeeding in the original test.
It is clear that, as a tool of observation, tests require all these traits in order to be effective. Moreover, when thinking of examinations in the larger context of education, these considerations gain relevance. Tests are often set as added incentives for students to work diligently. A poor test design not only distorts the information yielded, but it also distorts the incentives of the test-takers – if an instructor’s test model is believed to value memory over intuitive understanding, for example, students will resort to mnemonics instead of building the desired skill set.
A poorly designed test is one that, given a specific target variable, fails to control for other relevant factors or calibrate accordingly. Its failure has impact on both the test-taker and others. Those who take the test will either have suffered in performance or reacted to the test’s shortcomings by at least partly sacrificing the correct goals for the wrong ones. All others interested in the tests’ outcome will receive inaccurate signals from it, and their corresponding decisions will suffer accordingly.