Scientific and technical exhibits exist in a profusion of types, styles, sizes,
and designs. However, their underlying purpose is generally the same-to
impart knowledge about various technical subjects, and/or to change the
attitude of the viewer in a favorable direction toward science, its practitioners, and its institutions. Such exhibits are considered by many to be a
uniquely appropriate and effective means of narrowing the gap between
the sophisticated world of modern science and technology and the “everyday” world of that abstraction we call the general public. Museums, schools,
private industry, and the Federal Government are deeply committed to
the use of scientific and technical exhibits to carry their informational and
attitudinal messages to the public. The extent of this commitment can be
judged by the number of such exhibits in existence or planned, their prominence at major fairs, exhibitions and museums, and the funds expended on
their design, fabrication and housing.
APPROACHES TO STUDYING EXHIBIT EFFECTIVENESS
Considering exhibits as a form of visual communication puts them in the
Support for this work was provided by the United States Atomic Energy Commission
under Purchase Order No. NY-65-366.
137
CURATOR
same methodological and research context as educational television, slides,
movies, and other forms of pictorially-based educational media. Thus, one
might expect that research strategies typically used to measure effectiveness
for these other media would also be used in studying exhibit effectiveness.
However, this is not generally the case. Since exhibits depend on voluntary
audiences in most situations, they must be concerned with an element not
usually shared by these other media: attracting power. Unless people come
to view an exhibit, any possible “teaching” attempt is wasted. By the same
token, of course, an exhibit that successfully attracts an audience but fails
to reach its educational and attitudinal objectives represents a waste of
effort and funds (both of which can be sizable for a modem exhibit). Because attracting power is such an obviously necessary ingredient for the
success of an exhibit, studies of exhibit effectiveness have tended to concentrate on their popularity, and at the same time have tended to neglect
efforts to measure their actual “teaching power.” The underlying rationale
of many effectiveness studies could thus be reduced to a kind of syllogism:
If exhibit A attracts more people than exhibit B, then exhibit A is better
than exhibit B. This kind of head-count research is paralleled by a complementary emphasis by exhibit designers on the use of display effects to
attract attention. One could argue, however, that the excessive use of such
attention-getting techniques may actually detract from the achievement
of the exhibit’s educational and attitudinal objectives. Given “x” dollars to
spend on an exhibit, the temptation to use them for the “sizzle” rather than
the “steak may be difficult to resist. However, this situation is not likely
to improve as long as those working in the field are content to equate
popularity with effectiveness.
This is not to say that there have been no attempts to assess success in
terms of knowledge gained and attitudes changed. Work has been done on
measurement and several interesting projects are underway that should
shed considerable light on exhibits as change agents. However, the results
of studies completed to date have been generally disappointing. The measured change in behavior resulting from exposure to a given exhibit has been
either quite small, nonexistent, or, in a few instances, in the wrong direction.
These kinds of results may be partly due to unsolved methodological problems, particularly in defining objectives, and in designing appropriate measuring instruments that are sensitive to changes. These problems, of course,
still plague all of our media houses.
APPROACH USED IN THIS STUDY
The disappointing results found in many studies of exhibits may also be
due to the fact that principles of good design and proper utilization are
known by specialists in the field but often ignored by them. Such principles
are found in abundance in the exhibit literature in the form of prescriptive
138
or normative statements as to what constitutes a “good,” or “effective”
exhibit and what, conversely, constitutes a “bad,” or “ineffective” exhibit.
While knowledge based solely on expertise and experience generally lacks
the precision of scientifically-based knowledge, it may still be valuable and
worthwhile. It would be unwise to reject off hand the experience of those
who have worked with exhibits for many, many years. If such knowledge
and experience could be condensed and shown to be related to the proven
effectiveness of exhibits, it would serve as a very useful guide. Poor and
ineffective practices would be less likely to become incorporated into exhibits, exhibit effectiveness would be enhanced, and exhibit utilization
would be based on firmer ground.
In short, exhibits may be poorer than they ought to be simply because
designers are not utilizing those principles known by the leaders in the
field, or, the experts have not adequately translated their prescriptions into
terms that designers can implement. If either of these statements is true,
then the situation could be improved by making these principles more
available or more understandable, or both. This would be essentially a
translation and dissemination problem, and would involve R more intimate
dialogue between conceptualist and designer.
Two kinds of “truths” could be determined for the principles of good
design and practice in the exhibit literature. The first has to do with their
validity. That is, is there a relationship between the use of the principles
and the measured success of the exhibit? This is a difficult question to
answer, requiring appropriate measuring instruments and large-scale field
studies. In short, this approach takes us back to complex effectiveness studies. However one could save much time and effort by first studying the
reliability of such statements. For example, if the literature states that an
effective exhibit must be “well-lighted,’’ the reliability of this statement
could be determined by asking a number of persons knowledgeable about
the exhibit field to rate an exhibit on the degree to which it does or does
not conform to this principle. If the informed persons agree with each
other, then one would be encouraged to further investigate the question of
validity; do people who view poorly lighted exhibits learn less and/or have
their attitudes changed less, than people who view well-lighted exhibits?
If, on the other hand, “A” says an exhibit is well-lighted and “B” says it is
poorly lighted, then the prescription about lighting has little substantive
meaning, at least for these two individuals. If such a result were found to
hold generally for all or most of the statements found in the literature, there
would be good reason to question the usefulness of such statements. This
finding would also help to explain why exhibits tend to be an art form
rather than a technology. It would suggest that the reason for ineffectiveness is not that designers aren’t incorporating those features known to be
effective, but that the leaders in the field are not clear in their own thinking
139
CURATOR
as to what constitutes good exhibit design.
In summary, the study reported on here was designed to determine only
the extent to which the statements made in the published literature regarding the quality of scientific and technical exhibits are meaningful and unambiguous. This was done by constructing a rating scale, the items of
which were drawn from the exhibit literature. By having persons qualified
in the exhibit field use the scale and then comparing their ratings, the
reliability of the statements could be measured. Only if such statements
were found to be reliable would a study of their validity be considered
profitable.
Literature Survey and Item Assembly
A review of the exhibit literature was conducted in an effort to locate
those sources most likely to contain prescriptive or normative statements.
A total of forty-seven references were thus identified. Each potential source
was carefully read. Whenever the author made a statement that involved
exhibit effectiveness, it was recorded. Statements that were specific to a
particular exhibit were included only if a general principle was either
explicitly or implicitly associated with the statement. Thus, the item: “The
red lettering on the agricultural exhibit did not show up against the pink
background, thus making the labels difficult to read would be recorded,
but the general principle would also be noted: “Lettering should contrast
with the background.” Most authors did, in fact, write in general terms
since their remarks were meant to apply to more than one exhibit. The
complexity and variety of exhibits would seem to preclude the possibility
of anyone saying “The letters of all exhibits must be white on a black background.” The items in the rating scale would also have to avoid this level
of specificity if the scale were to have applicability to all scientific and
technical exhibits. Over 350 different statements were thus recorded from
the forty-seven references.
Next, related detailed statements were grouped into fifteen logical categories. These categories became the general headings under which various
numbers of specific items of the draft scale were placed. The fifteen categories are shown in Table 1. The items falling under a specific category
were then reviewed to see to what extent they could be combined. In
general, the aim was to provide adequate coverage of all the different
characteristics noted in the literature. The more than 3-50 statements
recorded were reduced in this manner to seventy-four specific questionnaire items. Every effort was made to avoid distorting or changing the
meaning of an item. Even though some items seemed vague or even unintelligible, they were retained essentially as they appeared in the original
quotation. Thus the initial scale was as nearly as possible an empirically
developed instrument.
TABLE 1
Basic Exhibit Categories
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Attractiveness of Exhibit
Ease of Comprehension
Unity Within the Exhibit
Ability to Attract Attention
Ability to Hold Visitor Attention
Appropriateness of Exhibit
Presentation
Accuracy of Information Presented
Location and Crowd Flow
Visitor Characteristics
Focus of Attention
Textual Material (labels,
headings, etc.)
Relation of Exhibit to Surrounding
Area and Other Exhibits
- Design of Exhibit
Size
Physical layout
Use of color
Use of light
Use of contrast - Exhibit items
Quantity
Attractiveness
Size
Sound
Motion
Demonstrations
Charts
Films
Models
Auxiliary teaching techniques
Audience participation - Communication Techniques
TO illustrate the way in which scale items were generated, two items are
shown below along with the individual references which supported these
items. Scale Item 1: “How would you rate this exhibit on the appropriate
use of light?”
Supporting Statements from the Literature:
Wright, G., 1958. “Is lighting adequate, 01. could it be improved?’
Gardner, J. & Caroline Heller, 1960. “It is impossible to exaggerate the
importance of lighting in exhibitions since it is lighting after all that
largely determines what we see and how we feel about what we see.”
Goins, A. E. & G. B. Griffenhagen, 1957 and 1958. “Several factors operate simultaneously to determine the popularity of an exhibit. . . . Some
of the factors which are inherent in the exhibition are . . . light (e.g.,
illumination, movement), . . .”
Carmel, J. H., 1962. “As with any other part of exhibition design, light
when correctly employed in an exhibition should enhance, emphasize,
create atmosphere and otherwise help tell the story; it should never
dominate, dazzle or distract.”
New York Museum of Science and Industry, 1940.
(1) “The functions of an exhibition and the more important means for
carrying them out are: To draw attention: [by means of] color,
light, motion, sound. . . .”
(2) “From our experience here at the Museum we have found that the
141
CUR AT 0 R
three-part formula of sound architectural design, proper use of
illumination, and good color effects, is the first essential of good
exhibition practice.”
[Note that this item also contains material relating to other categories. Such items were noted on separate cards and filed under
each category, i.e., light, color and design.]
Borhegyi, S. F., 1963. “. . . light can be used to heighten the dramatic
effect of visual images.”
[Note: There were many additional references to lighting, but they all
were concerned with more specific areas of lighting and thus generated
more specific items in the initial rating scale.]
Scale Item 2: Not all subject matter lends itself to an exhibit presentation.
How suitable is this subject matter for exhibit presentation?
Supporting Statements from the Literature:
Carmel, J. H., 1962.
(1) “. . . material ill-suited to temporary exhibition use . . . includes
anything hazardous, objects requiring lengthy labels for comprehension, or anything which is too complex or obscure to be comprehended in a reasonable length of time by a standing visitor.”
(2) “For permanent exhibitions, it is probably sound policy to avoid
exhibitions on any subject that can be explained as well or better
by an article, book or pamphlet with well selected illustrations.”
Inverarity, R. B., 1961. “An exhibition should not attempt to do what
can be better done in snme other medium.”
Gardner, J. & Caroline Heller, 1960.
(1) “Exhibition has its limitation . . . complicated stories and arguments
(2) “It must be accepted that there are some subjects that will never
Dale, E., 1946. “Is the material worth the time, expense, and effort involved?”
Weiss, R. S. & S. Boutourline, Jr., 1963. “The most attractive exhibits
had the characteristic that they could only be seen in the museum. Visitors would have been justified in feeling it unnecessary to come to the
museum for information which might be found in a book.”
Weiss, R. S. & S. Boutourline, Jr., 1963. “A show which fair-goers believe could have been seen in a book, in their local library, in their local
museum, or even at an industrial show is likely to create some resentment.”
Hull, T. G. & T. Jones, 1961. “Not all subjects lend themselves to presentation by exhibits.”
should be left to leaflet or guides.”
make good exhibitions.”
Draft Scale
The draft form of the scale contained fifty-five items, five of which had
two or more subparts for a total of seventy-four individual questions. The
rater was asked to judge each of the items by circling one of six verbal
“tags.” An item from the scale, along with the six rating categories, is
142
shown below:
How well do the various elements of the exhibit combine or relate to
one another to produce a coherent unity?Z
Excellent Very Good High Average Low Average Fair Poor
The scale used freely distributed ratings rather than forced ratings. It
was felt that the use of unforced ratings was a realistic approach to evaluation since an exhibit need not have a given number of excellent qualities
and an equal number of poor ones.
In order to make it possible to check the internal reliability of the ratings,
the initial scale was divided into two parts. Part I contained items dealing
only with each broad category (as seen in Table l), and Part 11 contained
the more specific items falling under each of these categories. The latter
were not identified to the rater as to the broad category to which they
pertained. This format of the scale made it possible to compare the rating
of a broad category (such as the “lighting” item referred to earlier) with
the various specific items which would fall under the general lighting category ( such as one dealing only with “glare and reflection”).
An item asking for an overall judgment of the exhibit was also included
as part of the scale. In this way, it could be determined to what extent the
raters agreed with each other on a total evaluation of the entire exhibit. The
format of this item was the same as that shown above for the sample item.
While both Parts I and I1 of the scale required the rater to circle the
appropriate word which best reflected his judgment for each item, Part I1
further requested the rater to indicate in a few words why he rated each
specific item the way he did. Two raters may both agree that the “attractiveness of the display materials” is Fair, but one may have rated it that way
because “they were poorly selected while another may have rated it Fair
because “they were all bunched together.” Thus, the written comments
would make it possible to better evaluate the extent of agreement as indicated by the ratings.
The draft scale was tried out at the American Museum of Atomic Energy
at Oak Ridge, Tennessee. This museum is operated by the Information and
Exhibits Division, Oak Ridge Institute of Nuclear Studies, Inc. The various
exhibits and models show nuclear reactors and atomic power plants, describe the production of raw materials and radioisotopes, and emphasize
the peaceful applications of atomic energy in industry, agriculture, and
medicine. Most of the exhibits are designed and fabricated by the staff of
the Information and Exhibits Division. Seven displays in the museum were
This item is an example of one which seemed ambiguous and potentially unreliable
but since it was noted several times in the literature, it was included in the scale for
initial tryout. After all, those knowledgeable in the exhibit field may have a very
precise meaning for “coherent unity.”
143
C U It AT OR
selected for rating. They covered a variety of exhibit techniques, design
features, size, complexity and subject matter. Members of the museum
staff, including those concerned with management, tours, and exhibit design and construction used the scale. A total of thirty-three scales were
completed by twenty-five separate raters at the museum. Of the seven
exhibits covered by the tryout, one was rated by six raters, three by five
raters, and three by four raters. Eight of the twenty-five individual raters
rated two exhibits.
A form was prepared that asked the raters to make written comments on
the scale itself. In addition, a member of the project staff discussed the
scale with each rater after he had completed the rating of at least one
exhibit.
ANALYSIS
The initial step in the data analysis was to transform the six values into
a numerical scale (Excellent = 6, Very Good = 5, High Average = 4,
Loto Average = 3, Fair = 2, Poor = 1 ) . In assigning integral weights, the
assumption was made that equal intervals existed between categories.
Two sorts of questions might be asked in connection with scale reliability. One involves self-agreement, that is, the consistency with which a given
rater evaluated similar elements in a given exhibit. The second has to do
with interrater reliability, the extent to which different raters agreed with
each other in their evaluation of the same element. This latter question will
be dealt with first.
lnterrater Reliability
A primary interest in the analysis of ‘these data was in determining the
agreement or lack of agreement among the raters on individual items and
over the entire scale, e.g., interrater reliability. Such information could be
derived only by an exhibit-by-exhibit analysis, since there is no logical
reason for assuming that the rating of a particular feature in one exhibit has
any relation to the rating of that feature in another exhibit.
As an estimate of variability in individual item ratings, the standard
error of the mean was computed for each item in the scale for each of the
seven exhibits. This analysis would indicate the range in which the true
mean will be found. The larger the standard error of the mean, the wider
the range and the greater the variability; therefore, the lower the reliability.
A logical (and liberal) criterion for the mean was established by defining
the actual mean rating for an item as being within plus or minus one rating
category of the obtained mean. In other words, if the obtained mean for
all ratings of a given item was three, or Low Average, it was assumed that
the actual mean was between High Average and Fair. Any item where
the standard error of the mean failed to meet this criterion (at the .05 level)
144
was considered to be statistically unreliable for that exhibit. The data for
each item on the scale were obtained for each exhibit rated.
The items failing to meet this criterion by exhibit and across exhibits is
found in Tables 2 and 3. It can be seen in Table 2, for example, that for
Exhibit 6, forty-eight of the seventy-four items on the scale, or sixty-five
percent, do not meet the established criterion of reliability. For four of the
seven exhibits, more than half the items fail to meet the criterion. From
Table 3, one notes that two of the seventy-four items fail to meet the established criterion for all seven exhibits and only one item shows acceptable
reliability for all seven exhibits.
TABLE 2 Items Failing to Meet Reliability Criterion on Each Exhibit
Items Failing to Meet
Criterion (Total = 74)
Number of
Number Percent Raters
Exhibit 1
Exhibit 2
Exhibit 3
Exhibit 4
Exhibit 5
Exhibit 6
Exhibit 7
44 59 5
31 42 5
38 51 5
26 35 4
42 57 4
48 65 4
36 49 6
TABLE 3 Items Failing to Meet Reliability Criterion Across Exhibits
Number of
Exhibits
Number of Items Failing to
Meet Criterion (Total = 74)
2
4
11
25
16
9
6
1
145
CURATOR
The standard error of the mean was also computed for the two overall
judgment items on the scale. This provides comparison with the individual
items within the scale. On only one exhibit does the standard error of
the mean for the overall rating fail to meet the established criterion. This
occurs on both the pre- and postscale ratings. It should be pointed out that
this result was found on the exhibit where the greatest number of individual items failed to meet the established criterion (Exhibit 6, Table
2). Thus, while raters tend to disagree on the quality, success, or suitability of the individual items that go to make up the exhibit, they tend to
agree (at least within plus or minus one category) that a given exhibit is
Fair, Very Good, etc.
One is not encouraged to attach much signscance to the reliability of
the general ratings since they seem to be based on large areas of disagreement. The extent of this disagreement can perhaps be better understood
by examples of the actual item ratings. To the item, “How would you
rate the overall design of the exhibit?” the following results were obtained
from six raters: 1 Excellent, 1 Very Good, 1 High Average, 1 Low Average,
2 Fair. And on an exhibit evaluated by four raters, the question, “How
would you rate the actual wording of the main title of the exhibit?” brought
the following results: 1 Excellent, 1 Very Good, 1 Low Average, 1 Poor. In
looking at the “Why” answers for this item, the areas of disagreement are
revealed: “What else would be more clear?” “It speaks principally to this
topic.’’ “A title which would stir the curiosity of the audience must be used.”
“Title not complete. Should be . . .”
Some of the items that had varied ratings had quite similar comments,
such as the following item evaluated by four raters: “The size of an exhibit is influenced by a variety of factors, some having to do with its
objective( s ), its subject matter, amount of material displayed, the surrounding objects, etc. How would you rate the appropriateness of the
size of this exhibit?” The responses were: 1 Very Good, 2 Low Average,
and 1 Poor. To the question “Why” we found, “Possibly taking up too
much space for its purpose and also with respect to other exhibits around
it . . .,” “Seems a little large for material used,” “It could be smaller and
still do the same job,” and, “Too large to present such few concepts.”
No statistic was used to account for this anomaly of divergent ratings
and convergent comments, since it is difficult to weigh the statements.
While the raters all seemed to agree that the exhibit was too large, it is
not clear how serious each rater considered this deficiency except by his
own rating. The only logical conclusion one could draw from this situation
is that although experts may agree on the nature of a particular deficiency,
they may differ widely on the importance attached to that deficiency.
Another way of looking at the reliability of the scale is to measure
146
XI12 1968
agreement among judges over the entire scale. That is, knowing how rater
A scaled the individual items, how accurately can one predict how rater
B will scale those items, using, of course, the same exhibit. To answer
this, all possible interjudge correlations ( Pearson product-moment correlation coefficient) were computed for each exhibit. A total of sixty-three
correlations were computed. The distribution of these correlation coefficients is shown in Table 4. The individual coefficients range from-.17 to S8.
There are three negative correlation coefficients. The median correlation
coefficient is .24. From these results, it is evident again that there are large
areas of disagreement. With few exceptions, knowing how one rated individual aspects of an exhibit would tell relatively little about how another
would rate the same aspects in that exhibit.
TABLE 4
Distribution of Interjudge Correlations on All Exhibits
(Total Number of Correlations = 63)
On the basis of these interrater measures, it may be concluded that the
terms commonly used in exhibit literature for describing the effectiveness
of an exhibit are not adequate or, at the very least, are not sufficiently
reliable. There may be agreement that lighting, color, labels, etc., are important elements in exhibits, but those knowledgeable in the field seem
not to agree as to the quality of these elements as they exist in a particular
exhibit.
Intern1 Scale Reliability
The draft scale was divided into two parts to permit a check on its
internal consistency, or more accurately, the internal consistency of individual judges in rating specific categories. For most categories, a general item appeared in Part I of the scale and the more specific items falling
under that category appeared in Part 11. A test for internal consistency
would measure the degree to which individuals gave the specific items
in a category (e.g., glare and reflection) the same rating as to the corresponding general item (e.g., lighting).
This analysis was performed on only six of the fifteen categories. The
six categories selected were: design, lighting, color, title, labelling, and
general text. Because consistency in rating the items should be independent of the exhibit and because combining the exhibits would provide a
greater spread of scores, a single correlation coefficient ( Pearson productmoment) was computed for each variable over all seven exhibits. Since
each category contained more than one related item, each subject’s average rating for all the specific items in the category was correlated with his
corresponding general item rating. The resulting six correlation coefficients
147
CURATOR
are shown in Table 5. It is interesting to note that only two of these fall
below the highest interjudge correlation coefficients (-58). Thus, as would
be expected, there tends to be greater consistency among judges in rating
general and specific items in individual categories than there is between
judges over the entire scale. However these correlations still illustrate that
lack of agreement is evident in both interrater and intrarater measures.
TABLE 5
Internal Consistency Correlation Coefficients
Category r
Design .65
Color .85
Light .70
Title .40
Label .64
Text .52
Rater-Evaluation Sheet
Each rater using the initial version of the scale was asked to fill out an
evaluation sheet noting suggestions and criticisms. The comments thus
collected indicated the following major drawbacks to the initial rating
scale: it was too long (average time to complete was one hour, ten
minutes), it was overly redundant, and the discrimination required on the
six-point scale was too fine. The first two criticisms were not unexpected
since redundancy was purposely built into the scale to permit intrarater
measures to be taken. The latter objection was not anticipated, although
it appeared to be well taken. Many reported that at times they had to
resort to guess-work in choosing between two adjacent ratings. Many of
them recommended a four-point scale as being more realistic.
The raters consistently and enthusiastically noted that the scale forced
them to “look” at an exhibit in a more analytic fashion, and that for this
reason deficiencies came to their attention that had previously gone unnoticed. This is particularly interesting because most raters had been exposed to the exhibits for extended periods of time, and in many cases had
actually conducted tours which used the rated exhibits.
SCALE REVISION
One objective in revising the rating scale was to shorten it. An effort
was also made to reduce any apparent ambiguities in the format of the
148
XI12 1968
scale or in wording individual items.
The remaining revision involved eliminating overlapping items and adding a few items about certain exhibit features that had not been adequately
covered. As noted earlier, the intrusion of uninformed opinion was avoided
in preparing the initial form of the scale. By the time the scale had been
completed and tested, the authors felt better informed and did add several
items. Since there had been some confusion as to how to rate an item when
a feature was not present in a particular exhibit (motion, for example), the
format was changed to make it appropriate to answer only if that feature
were present. At the end of the scale the rater is given the opportunity to
check those features which the exhibit does not include but which, in his
opinion, should have been included. Comments from the rater’s evaluation
sheets, the reliability data, and the data on internal consistency were all instrumental in making these revisions.
The revised scale is substantially sh,orter than the version used in the
tryout ( thirty-five items yielding forty-eight separate questions ), and is
no longer divided into two separate parts. Because of the objections by
the raters to the six-point scale, the scale in its revised form uses only four
categories: Excellent, Good, Fair, Poor. Estimated time to complete the
revised scale is forty-five minutes. A sample of items from the revised version of the rating scale is found at the end of this article.
DISCUSSION
A note of caution is in order in interpreting the results of this investigation. There was a rather wide diversity of qualifications and duties among
the raters used in the tryout. Perhaps it could be said that the raters
represented different categories of exhibit expertise. Some of them were
responsible for designing and fabricating exhibits, some with using and
interpreting exhibits for the public, and some with the dissemination of
technical information in the atomic energy field. Each of these individuals
may be expected to view a particular exhibit from his own personal and
professional bias, based on his background, and associations with the exhibits, etc. It may be argued that this diversity in raters would result in
less consistent ratings than would be the case had the raters all been designers, or managers, or curators. On the other hand, one could argue that
fundamental agreement ought to exist within the field on so important a
question as exhibit effectiveness, regardless of the particular interests of
any one group of specialists. In any case, due to the relatively small number of raters who used the scale in the present study, no effort was made
to divide them into subgroups for analysis purposes. If the scale (as revised) is given wider use, it would be possible to separate raters by the
categories noted above (and perhaps others as well) to see to what extent
they differed in their judgment of exhibit characteristics, and to see if any
149
CURATOR
patterns emerge that reIate specific occupation to ratings?
It remains true that the data analysis indicates the general inadequacy
and unreliability of published criteria as over-all guides to determining
exhibit effectiveness, and suggests that there would be little to gain from
an effort to test the validity of such criteria. Nevertheless, the scale seemed
to make a worthwhile contribution to the more analytic inspection of
exhibits. Each of the raters knew how he would improve the exhibit he
rated. However, since raters tended to disagree on what the deficiencies
were, these improvements might not lead to an increase in actual effectiveness as measured by some external measure, such as knowledge gained,
attitudes changed, or people attracted. In short, the scale may at best lead
to better informed opinions as to what constitutes an effective exhibit.
At worst, it may mislead those who use it to believe that the categories on
the scale are known to be related to actual effectiveness (they may or may
not be), and, that the rater knows what the relationship is. It is therefore
the opinion of the author that the scale should be made available to interested parties while making them aware of its limitations and its lack
of demonstrated reliability. Additional use may lead to improvements and
refinements in the scale which would increase its usefulness and perhaps
even its reliability.
One observation stands out very clearly as a result of this small-scale
study, and that is the need for more clearly stated objectives for exhibits.
This deficiency very likely contributed to the low reliability of the scale.
Raters often reflected this need in their written comments and in their
discussions after using the scale. They realized (many of them for the first
time) that they had no baseline against which to judge the various elements. The question which should be asked is, “Specifically what do you
want whom to do, know, or feel after seeing the exhibit that they could
not do, know, or feel before seeing the exhibit?” Answering such a question in adequate detail would cut through much of the ambiguity, and
even mistique, that surrounds the exhibit field. Otherwise it is not possible
to design reliable and valid measuring instruments that would determine
exhibit effectiveness, since it is not clear what should be measured. HOW
can a rater judge the adequacy of a label if it is not known exactly what
the label is supposed to communicate (teach) and the characteristics of
the intended audience ( age, background, education, etc. ) . If those who write about exhibit effectiveness have difficulty in com-
‘In this connection, a more detailed report of this study is available from the Clearinghouse for Federal and Scientific and Technical Information, National Bureau of
Standards, U.S. Department of Commerce, Springfield, Virginia 22151, ( TID-22703,
“An Evaluation of Existing Criteria for Judging the Quality of Science Exhibits”).
This report contains the revised rating scale and the annotated bibliography used in
developing the scale.
150
municating with others in the field (as the data here seem to indicate)
then it is not surprising that attracting power is often equated with the
success of an exhibit. A designer may not know what “coherent unity”
means, and he may not have very specific objectives to use as a basis for
his design, but he does know that he can attract people with clever and
dramatic effects. And the success of such efforts can be easily and accurately measured.
It is probably true that prescriptions for effective exhibit design will
never be reduced to a set of specifications that can be looked up in a
handbook. It is equally true, though, that until those responsible for the
preparation of scientific and technical exhibits become more analytic and
more concerned with objectives and evaluation, little real advance in the
field will be made. No sensible person would want to take the art out of
exhibit design; no sensible person should resist injecting more systematic
knowledge into exhibit design. Ad hoc solutions to these problems are not
only a bad risk, they are becoming increasingly more expensive.
The techniques for improving this situation do exist. They have been
successfully applied to other media of communication and information such
as films, educational television, and programmed instructional materials.
In each of these areas, improved statements of intended objectives and
evaluation instruments based on these objectives are of primary importance. Better statements of objectives and improved evaluation instruments
can be prepared in the exhibit area. Ultimately, design variables can be
related to effectiveness variables. Only when this is accomplished wilI it
be possible to put the development of scientific and technical exhibits on a
solid foundation.
SELECTED ITEMS FROM THE REVISED
EXHIBIT EFFECTIVENESS RATING SCALE - Not all subject matter lends itself to the exhibit medium. How suitable is this
subject matter for exhibit presentation?
Excellent Good Fair Poor
WHY?
(NOTE: The above format was repeated for each item in the scale.) - How would you rate the following in terms of visitor ease of viewing?
a. The exhibit’s distance from the visitor
b. Its physical layout (height of exhibit; amount of material displayed;
placement and arrangement of material within the exhibit) - How would you rate this exhibit on its appropriate use of color?
- How would you rate the main title of this exhibit from the standpoint of:
a. Wording-content
151
CURATOR
5.
6.
7.
8.
9.
b. Design
By its very nature, an exhibit can never tell the “complete” story or show
“everything.” To what extent does this exhibit incorporate material which
contributes most to the exhibit story and avoid including unrelated or unimportant material?
Consider “drawing power” for a moment. How would you rate the popularity of this exhibit?
Now consider “holding power.” To what extent do you feel this exhibit will
hold visitor interest?
If this exhibit makes use of slide and film projection techniques, how appropriately are these devices used?
The appropriateness of the visual materials, the clarity of the textual
material, and the spatial arrangement of the various exhibit elements all
affect the exhibit’s intelligibility. How would you rate the overall exhibit for
ease of comprehension by its intended audience?
REFERENCES
Bloomberg, Marguerite: An experiment in museum instruction. American Association of Museums, Washington, D.C., no. 8. 1929.
Borhegyi, S. F.: Visual communication in the science museum. Curator, vol. 6,
no. 1, pp. 45-57. 1963.
Bureau of Social Science Research: Audience reaction to two ICS cultural exhibits: report on the pretest of a questionnaire. Washington, D.C., no. 518.
1954.
Bureau of Social Science Research: The Japanese house: a ~tudy of its visitors
and their reactions. Washington, D.C., no. 518. 1956.
Bureau of Social Science Research: The peopk’s capitalism cnhihit: n study of
reactions to foreign visitors to the Washington preview. Washington, D.C., no. - 1956.
Carmel, J. H.: Exhibition techniques-traveling and temporary. Reinhold Publishing Corporation, New York. 1962.
Dale, E.: Audio-visual methods in teaching. The Dryden Press, New York. 1946.
Gardner, J. and Caroline Heller: Exhibition and display. F. W. Dodge Corporation, New York. 1960.
Goins, A. E. and G. B. Griffenhagen: Psychological studies of museum visitors
and exhibits at the US. National Museum. Museologist, vol. 64, pp. 1-6. 1957.
The efject of location, and a combination of color, lighting, and artistic design
on exhibit appeal. Museologist, vol. 67, pp. 6-10. 1958.
Hull, T. G. and T. Jones: Scientific exhibits. Charles C. Thomas, Springfield,
Illinois. 1961.
Inverarity, R. B.: Museum and exhibits. Museologist, vol. 81, pp. 2-4. 1961.
Melton, A. W., N. G. Feldman, and C. W. Mason: Experimental studies of the
education of children in a museum of science. American Association of MUseums, Washington, D.C., no. 15. 1936.
New York Museum of Science and Industry: Exhibition techniques. 1940.
Robinson, P. V.: An experimental study of exhibit arrangement and viewing
method to determine their eflect upon learning a factual material. Unpublished doctoral dissertation, University of Southern California. 1980.
152
a
Seattle World’s Fair, 1962. Institute for Sociological Research, University of
Washington, Seattle. 1963.
Weiss, R. S. and S. Boutourline Jr.: A summary of fairs, exhibits, pavilions, and
their audiences. Robert S. Weiss. 1962.
The communication due of exhibits. Museum News, vol. 42, no. 3, pp. 23- - 1963.
Wright, G.: Some criteria for evaluating displays in museums of science and industry. Midwest Museum Quarterly, vol. 18, no. 3,pp. 62-71. 1958.