Information for Action: Perceptual Principles Related to Remote Sensing

M. M. Taylor

DCIEM

October 29, 1971

INTRODUCTION

This paper introduces a somewhat speculative account of the principles underlying the construction and use of natural sensory systems. These principles also apply directly to the design of artificial systems for enhancing human perception through the use of exotic spectral regions or sensors remote from the perceiver.

The concept central to the exposition is that perception has evolved as a method of coordinating the behaviour of the individual. He perceives in order to act, and correct perception of important things enables him to act correctly in order to survive. Survival is the survival of the entire organism, and accordingly this paper follows a systems approach to the analysis of the perceiving system. The argument will be that natural surveillance systems agree very well with systems that might be developed through a cost-effectiveness approach to their design.

The Problem of natural and remote perception

There is an enormous amount of information in the physical world that surrounds us. We are bombarded from all sides with electromagnetic radiation, the air vibrates at a vast range of frequencies; surfaces react in different ways to physical and chemical probing; we touch and are touched by numerous objects and by the air and whatever is carried by the air. Potentially, one could detect and analyse all the information from all these sources, but we do not. Of all the things and events in the world, only a tiny fraction of any interest for an individual or can in any way influence his behaviour. The problem facing any perceiving system is to discover the useful information and to organize it in such a way that the perceiver can act correctly and in time to ensure his (or the species) survival. The evolutionary process has presumably selected those species with perceptual systems adequate to the problems posed by their behavioural habits and their habitats.

The problems facing the natural perceptual systems are faced also by artificial sensory systems. There is too much information, and only a small fraction is relevant. Since the whole range of physical stimulation is available to remote sensing systems, and sensors can in principle be placed almost anywhere, the scope of the problem facing remote sensing systems is even greater than that posed to natural sensory systems.

In the foreseeable future, most acting systems will probably be controlled by a human. This human must be provided with the information appropriate to his potential for action, and the remote sensing equipment must therefore be considered as a part of his perceptual apparatus. The fact that a human is part of the total system has an important implication: since his perceptual equipment was developed to deal with a habitat and behaviour pattern very different from that implied by his technological environment, the machines must be designed to present him with information adapted to his evolved capabilities. Designers of remote sensing systems thus have two almost independent sets of problems. They must design systems that pick up the information that they want, and these same systems must translate the information into a form that the human can use in the task at hand.

Remote sensing systems which can continuously provide high information rates are relatively new. Limits to their physical capacities for providing raw data are unknown, but operational limits to their capabilities will probably be set by the limited capabilities of the human user who has to decide on relevant action. It is not clear what information the human will need in any particular situation.

The desired information almost certainly depends not on the sensor or the things being detected, but upon the range of actions open to the human.

There are two major approaches possible to the design of remote sensing systems, the engineering approach and the bionics approach. The engineering approach, as evidenced by currently operational remote sensing systems, tends towards the provision for the human of as much information as the sensors can produce. The information is presented in some uniform manner, typically by a pictorial image, and the human is expected to sort out the useful from the useless. There may well be some tasks or some occasions for which a total visualization of the incoming data stream is the best way to present the relevant information; conversely, for other tasks, it may be a poor way.

The bionics approach is to examine what natural sensing systems do in response to the same problems, and why they perform the way they do. Engineering frequently points the way to un derstanding biological systems, and the insights gained from the biological systems point the way to further engineering developments. The development of sonar systems provides a good example of this. Porpoises and other water mammals seem to be able to detect and examine underwater objects by echolocation. Some of their behaviour seems curious, until theoretical studies show that they are actually solving a detection problem in an optimum way, given the circumstances. A recent article in Science (R.A. Altes, Computer derivation of some dolphin echolocation signals, Science, 1972, 173, 921-4) provides a case in point. It turns out that peculiar but consistent waveform emitted when the dolphin is placed in a tank which already contains another dolphin is admirably suited for the detection and ranging of the other dolphin. As do bats, porpoises modify their echolocation signals according to the task at hand. Modern theory learns to discover why the modifications are the way they are.

Surveillance Systems

Some perceiving systems are surveillance systems. This means that they are used to detect states of the world and events that occur more or less outside the control of the perceiver. Vision and hearing come under this classification. So does touch, although the voluntary control of the observer is required in the touching process. Touch is used to determine what existing objects may be. Smell also has some surveillance characteristics, although to a very low degree in humans. One can tell be smell that there is a dead cat around somewhere, but not where it is to be found. Taste is not a surveillance sense. Tasting occurs only when the perceiver is eating or drinking something already found and under his control. One cannot discover new objects or detect events through taste. Usually, smell is operative only under the same conditions, and is not a surveillance sense. Kinesthesis and proprioception have a peculiar status, in that they are not themselves surveillance senses, but the data they provide are crucial for the proper operation of the other senses.

Surveillance senses are not necessarily distance senses, although all distance senses are surveillance senses. Touch is not a distance sense. One touches or is touched by only those things which contact the skin. Yet unexpected touches signal potentially important events or objects, and tough can delineate important facts about objects whose existence may well have been determined by the other senses. In the following, touch will be considered as one of the surveillance senses. Its problems and potentialities provide a microcosm of the whole field of perceptual problems, and by itself it provides a fertile region for the study of problems relating to remote sensing systems.

For the rest of this paper, we shall be concerned only with surveillance systems. The more "personal" sensory systems, like taste or the body senses will be ignored as components of perception.

Classes of Interesting Things in the World

There are two major kinds of interesting things in the world. Some things happen, and if you do not see them occur, they have gone forever. Other things stay around, and can be examined at leisure. These classes might be called dynamic and static things respectively, or events and objects. In the Hopi language, events and objects form the only distinction corresponding to our verbs and nouns. An event, or verb, corresponds to things which last roughly as long as a cloud, or less, while objects or nouns correspond to things of more permanence. A lightning flash is an event, while a house is an object.

Interesting events in the world may be classified in another way besides the passive-active dichotomy. Both active and passive conditions in the world might be expected or they might be unexpected. The leopard's snapping of a twig is an unexpected active event; the sight of a new landscape involves a sequence of unexpected passive states; watching an airplane take off marks an expected active sequence of events; looking over your own living room deals with expected passive states.

The four types of environmental "interesting things" correspond to four different modes of perceptual operation. Unexpected active events activate alarm systems; unexpected passive states are mapped, in that the perceiver builds up a model of the new state of the world in which he might have to act; expected active events invoke tracking, as when one watches the airplane go by; and expected passive states serve for new input only when one is looking for something. The four modes of perception are thus "alarm," "mapping," tracking," and "seeking." These four modes find their correspondence in remote sensing systems. They have quite different informational requirements, which means that hardware must be appropriately designed to accomplish the different modes of perception.

The different modes of perception complement as well as correspond to the classifications of interesting things. Except for the tracking mode, which necessarily implies active participation by the observer, active events are passively detected while passive states are actively observed. This complementarity is not happenstance, as may become clearer in the review of the natural surveillance senses that follows.

Human Sensory Systems

The various human senses are usually categorized by the physical stimuli to which their receptors respond. The visual system responds to light, hearing to air vibrations, and so forth. While this a valid categorization which will be followed here, it obscures the fact that natural objects and events usually have effects detected by more than one sensory system, and that the information from all sensory systems is integrated in the action decisions made by the perceiver. It also obscures the logical similarities among the functions of the different senses. With the viewpoint that perception implies information for action, the similarities among the natural senses, and between natural and remote systems become more apparent than the differences.

The main problem facing perceptual systems is to select from the vast amount of available data that small fraction which might be currently of interest towards decisions about future action. To simplify the discussion, the part of the system that makes the policy decisions on action will be called the "central processor." The job of the various sensory systems is then one of minimizing the amount of work the central processor need do to interpret the incoming data, so as to maximize its capacity to work on the useful pieces of information. The central processor is primarily a thinking centre for the control of action, or for the coordination of complicated patterns of input data. The sensory systems may thus be described in terms of the techniques they use to reduce the information load on the central processor.

Vision

The visual system is the human's great information gathering system. It performs all the functions of perception in easily seen ways. Our very language testifies to the comipresence of visual ideas; we "see" answers to questions, we "observe" in auditory experiments, we say "I can't see Mr. X as President." But, in spite of all this, the visual system's most stringent task is the reduction of the information that a badly designed system might pass on to the central processor.

The primary means by which the visual system reduces the data load is simply to ignore almost the entire electromagnetic spectrum. Only the single octave between about 370 mu and 740 mu is detected. Why should this restriction have been determined by evolution, when energy is present over a vast range of frequencies? Firstly, the atmosphere is transparent only to a narrow range of frequencies in the region where the sun emits strongly, and the eye is sensitive to most of this range. There are other transparency windows in the infrared, but mammals are hot enough to radiate appreciable energy at these frequencies. The surface reflectances in the visible range serve pretty well to define the presence of objects. The expense of providing an infrared detector insensitive to its own internally produced radiation in order to detect the slightly stronger infrared reflections off objects in daylight is therefore probably unjustified. The pit viper, on the other hand, is sensitive to infrared. Its habit is to find mammals sleeping under cover in the cold desert night. Its targets are appreciably warmer than their surroundings and are also warmer than the infrared detector of this cold-blooded reptile.

Regions of the spectrum further removed from the visible are probably not utilized for similar reasons. The cost is not justified by the radiative flux available for detecting objects at radio frequencies, and the range of the radiation as well as the available flux does not make gamma-ray spectroscopy attractive for biological organisms. If they want to know important things about chemistry, they taste and smell.

The high frequency and small wavelength of visible light permits the resolution of locations as close as one minute of visual angle. Objects of sizes down to 10-7 meter and smaller can be resolved, far smaller than any object of interest to a mammal. Steps as small as this can be resolved by touch, but no other sensory system approaches the angular resolution of the visual system. The auditory system, for example, can resolve to no better than 1 of arc, 60 times worse than the visual resolution. This means that the pattern resolution of the eye gives a potential information rate 3600 times that of the ear. This high information density presents a formidable challenge to techniques of information reduction.

Most objects in the visual world are static at any given moment. In the four-way classification, most visually interesting things fall into the two passive classes. This provides an immediate simplification for the task of the visual system. Since things are probably not going to change much from moment to moment, the world can be scanned at leisure. A small part can be dealt with at a time, and a model built up from the parts. This model gives the central processor a meaningful field within which it can plan the behaviour of the observer.

In the visual system, this concentration of capacity on a small region has been carried to the extreme of incorporating it in the "hardware" of the eye. The image on the retina is of good quality for a long way out from the centre. The retinal receptor density is as high 20 away from the central fovea as it is within the central fovea, but the density of optic nerve fibres declines sharply away from the fovea. In the central region, each receptor is provided with its own optic nerve fibre, whereas over the whole eye there is an average of one fibre for every 125 receptors. Visual acuity follows a similar trend. Twenty degrees away from the fovea, acuity is something like 1/10 as good as in the fovea, which implies a reduction factor of 1/100 in the information transmitted per unit area. The focal area, which in vision corresponds to the fovea, does the main part of the mapping and tracking, while the poorer pattern vision of the periphery can be used to give relative location information for the successive high resolution images. The coherence of the world model seems to be severely impaired if peripheral vision is inhibited by the use of a cardboard tube.

It requires memory to build up a model of the world by successive focal images individually scanned. Hence we must admit an important connection between memory and the central processor, or at least between memory and the processor responsible for co-ordinating the successive focal images. A single focal image must provide input to the memory either in the form of raw data or as data processed to some level at which objects may be separated from one another. This stored focal image must then be coordinated with the next incoming image, and so forth, until a complete model is stored. When all interesting things are of the "expected passive" category, the stored model would theoretically be adequate for perception of the world, and we may perhaps attribute misperceptions such as failing to see a newly installed stop sign on a familiar route to this manner of operation. It presumably requires less effort on the part of the processor to deal with data already decoded and processed than to interpret an incoming data stream. Hence, when the situation is appropriate, we might expect the processor to use this "memorial model" mode.

"Memorial model" mode may be used in coordination with incoming data stream when one is looking for something. If the item is present in the image, it can probably be quickly found, since most of the decoding will have been done. But if it is not, then the seeker must make successive focal images which can be discarded if they do not contain the target. Seeking differs from mapping only in that the data must be immediately decoded to determine whether a target is present, and memory need not be invoked except insofar as it can be used to direct the scanning path. When one is building geographical maps to be printed, the data is brought home in as much detail as possible, and is all carefully analysed and recorded on a printed page. If one is looking for something that can be found on the map, there is no need to fly over the terrain again. The map is assumed adequate and one can use the "memorial model" mode. But if one is looking for a fishing fleet that has vanished from where it was and not reappeared where expected (Defence in the 70's task list, p.19), one must use the map to direct a new scanning flight, and the scan must be continued only until the fishing fleet has been identified.

Maps and models do not remain adequate. The world changes. Objects move and events occur. When such things happen, the map is invalid, and the central processor would be operating improperly if it relied on the laboriously build model. Events and moving objects are active elements of the environment, and require to be dealt with as they occur. It may be necessary only to update the model to take account of the change, or it may be necessary to react to the event.

At this point, we can reconsider the structure of the retina. Why are there so many receptors in a region with as few optic nerve fibres as exist so 20 away from the fovea? If we look only at the output of the receptors themselves, the pattern information must be almost equally dense over this whole 40 diameter region. Yet by the time all the processing in the retina is complete, only 1/100 of the information from the edge of this region is sent up the optic nerve. It would not be compatible with the apparently good cost-effective adaptation of the perceptual system as a whole to assume that this information is merely discarded.

In addition, we know that the visual peripherty is very sensitive to movement. It seems a reasonable hypothesis that the pattern information available at the receptor level in the visual periphery is largely translated into movement information by processing done right in the retina, and that only the information signalled by movement is transmitted up the optic nerve. This information signals the central processor that something potentially interesting is happening, and that the foveal high resolution region of the eye should be turned towards it. The movement system of the visual periphery is thus an "alarm" system.

The main characteristics of an "alarm" system are well demonstrated by the visual periphery. It must serve as a preprocessor, reducing greatly the information load on the central processor; it must operate passively, awaiting autonomous events in the environment and reacting to them regardless of what the central processor is doing at the time. An alarm system must be capable of drawing the attention of the central processor. When the central processor wants high resolution data about something in the environment, it should be able to get it. On the other hand, it should not be burdened with data it does not need.

Visual tracking does not seem to present a complex functional problem. When an alarm system signals that something is moving, the central processor must decide it is uninteresting, whether it is an event requiring reaction, or whether it signifies potential change in the model of the world. If the latter, or if the movement promises to result later in something requiring reaction, then attention is kept on the "expected active" element in the environment (the moving object) until all necessary map changes have been completed, until the reaction has been made, or until future reaction seems unlikely to be needed and the movement becomes uninteresting to the central processor.

Vision embodies all four functions of perception, alarm, tracking, mapping, and seeking. These functions are defined only be the requirement that the information load on the central processor be kept as low as is compatible with effective behaviour of the whole organism. The same functions occur to greater or lesser degree in the other surveillance senses, although the actual ways of getting rid of unwanted data differ.

Hearing

Acoustic stimuli are compression waves in the atmosphere. Such waves range in length from atomic dimensions to several thousand miles, but we and other mammals hear only a very restricted range. Human hearing ranges in frequency from around 50 Hz to around 20kHz, while some bats and sea mammals can hear as high as 160 kHz. In wavelength, we hear waves as long as 50 ft and as short as about 0.6 inches, while some bats can hear acoustic waves as short as 0.1 inches or a little shorter. Porpoises, in spite of their higher frequency sensitivity, apparently only hear wavelengths longer than about half an inch, much like the human.

In contrast to the visible energy which derives from sunlight and is reflected off passive environmental objects, most acoustical radiation comes from active events in the environment. One object hits another, an airflow develops vortices by passing over an edge, and so forth. Unless something is happening, there is nothing to hear. In contrast again to the passive reflection of light, which can occur at any wavelength, the acoustic wavelengths preferentially emitted by objects interacting depend strongly on their sizes. A large object tends to generate low frequencies, whether by impact or by interacting with an airstream. Furthermore, sound waves are not well reflected unless the reflected object is larger than half the wavelength of the incident sound. The range of wavelengths to which we are sensitive thus corresponds roughly to the range of sizes of the objects we find interesting. The auditory system has apparently used the same device as the visual system, by restricting sensitivity to those frequencies most likely to yield data significant for our behavioural purposes.

Since natural sounds occur independently of the perceiver, and do not usually wait for leisurely observation, the auditory system cannot use mapping and seeking modes very effectively, although they do occur under some circumstances. It is primarily adapted to alarm and tracking modes, because of the evanescent nature of its usual stimuli. Alarm mode is mainly used to reject incoming data as irrelevant, while tracking mode has the opposite function of dealing carefully with all the data from one source.

The commonly cited example which shows the functioning of the alarm mode is the fact that a mother who can sleep through the noise of heavy traffic will wake at the soft cry of her baby. Another example comes from experiments in dichotic listening. When a subject has two conflicting messages presented to the two ears, he can follow one and will lose most of the other. But if his own name is presented to the "unattended" channel, he probably will hear it. He has over the course of his life constructed a preprocessor pattern which responds to his name without the need of his paying attention to the source of the message. The same phenomenon is often observed in hearing a cafeteria paging system. Tracking mode, on the other hand, involves paying attention to one source. The "cocktail party" effect in which one can hear a single conversation out of dozens is an example of the use of tracking mode. In contrast to vision, where the focussing is mainly done in an obvious external way by moving the eye, auditory focussing is an internal matter, not readily accessible to experiment.

Auditory focussing may be done on the basis of location, pitch, or more complex characteristics of the source to which attention must be paid. In simple detection experiments using tonal signals, many studies have demonstrated the existence of a phenomenon known as the "critical band." According to the view presented here, the critical band is an expression of how finely auditory focussing can be directed by the frequency of the target. Noise within the critical band affects the detection of the tonal signal to a far greater extent than does noise outside the critical band. The effect of the relative location of attended source and interfering noise has also been demonstrated experimentally, as has the distinctiveness of noises interfering with an attended voice, or even the cohesion of the semantic content of one message with that of the interfering message.

The human auditory system has at least one great preprocess--the speech system. The speech system illustrates a use of preprocessors other than to operate alarms. It is a categorizer. Arbitrary waveforms come in at a rate of thousands of bits per second, and if the speech processor recognizes them as speech, it categorizes them into phonemes which represent information rates not more than a few tens of bits a second. Categorization of this kind is a powerful way of reducing the information load to the central processor; it has, however, a side effect in that it tends to cut the central processor off from the raw data stream. Experiments on discrimination of waveforms representing the same or neighbouring phonemes show that we have great difficulty in discriminating between different representations of the same phoneme, even though the differences in waveform are greater than readily discriminated differences between waveforms representing different phonemes. It is as if the speech preprocessor pre-empted the incoming waveform and permitted the central processor access only to the categorical output. This, of course, is putting the case too strongly, but it does point up a potential difficulty with the use of preprocessors for categorization as well as alarm functions. They must categorize, to provide an effective alarm, but if their categorical output is habitually accepted as an adequate representation of the input, the central processor's ability to deal with the raw input in an unbiased manner may be affected.

Both bats and dolphins have evolved ingenious techniques to avoid ambiguities in their echo returns. For example, range and speed errors readily convert into each other unless the signal is carefully constructed. Good target acquisition is not best accomplished with the same signal that tells most about the target, and bats and porpoises change their signals after they detent a target.

Why do humans not echolocate very much? Humans can be trained to do rudimentary echolocations, as J.G. Taylor showed at this laboratory. The answer probably is that we have not evolved echolation because we are terrestrial animals, and our prime targets would be found on the ground where their return echoes would be heavily masked by a strong ground return. All echolocating animals live in a three-dimensional world where their targets do not often appear against a background. Even bats have not been shown to be able to detect stationary food on the ground, although they can be exquisitely sensitive to the small movements that signal live bugs. In addition, good visual resolution is much easier to accomplish than is equally good auditory resolution, because of the wavelength of the signal. Our eyes provide most of the information we could otherwise obtain by echolocation, and we have not needed to develop the complicated extra processing equipment needed for effective echolocation. It seems purely a question of cost-effectiveness.

Touch

Touch is in many ways the most interesting of the surveillance senses, although it may not be as important as vision or hearing to the human. It is more closely analogous to the remote surveillance systems currently operational or proposed than are either of the more specialized distance senses. Touch is a complex of heterogeneous sensing devices closely interrelated in space. The same area of skin is sensitive to pressure, vibration, and temperature. These multimodal capabilities present problems akin to those presented by multimodal remote sensing systems in which different spectral regions are simultaneously sensed.

Touch can be used in all perceptual modes, alarm, tracking, mapping and seeking. Passive touch is probably the most immediate alarm system we have. It is permanently open for unexpected events. An unexpected touch could potentially signal immediate danger, and could be ignored only at the peril of the organism's life. An unexpected touch must be analysed and acted on immediately. Hence touch has a pre-emptive quality. Green (personal communication) has found touch to override vision and hearing in an attention sharing task. This is as it should be, from survival considerations.

The alarm function of touch implies preprocessing, in the same way as does the alarm function of vision or hearing. Most of the little touches to which we are subjected--clothes, chairs, and so forth--do not come to the attention of the central processor. They are expected, and discarded at a lower level. But let an intruder come unobserved and unheard on someone immersed in a book. The slightest touch will startle the reader out of his spell. On the other hand, if the intruder is noticed and identified as being friendly, the touch will be ignored or will evoke some socially reciprocal response. The alarm function will not then be called into play.

This last example demonstrates an important feature of alarm systems. They respond to total situations in the real world. If an event sets off an alarm in one modality, it probably will not set off an alarm in another, although attention may well be focussed in each modality on the stimulation to be expected from that event. A sound draws the eye in its direction as effectively as does a peripherally observed movement. This is one of the ways the senses act together.

While passive touch is wide-open, always available for alarm purposes, active touch is used only at the observer's discretion. Of all the senses, active touch relies most heavily for its effectiveness on observer behaviour. It is mainly a mapping sense which allows the observer to build a model of his world: here is a warm patch, there is a soft one, and here a velvety place. Active touch depends on overt exploration. he perceiver directs his fingers at will over the surfaces to be touched, pressing, stroking, and kneading. The toucher discerns properties not accessible to vision, such as temperature and thermal properties, and subsurface structure. Touch provides the solidity that vision cannot give to the world. By touch one can tell glass from plastic, wood-grain vinyl from real wood. The multimodal nature of touch provides the reality behind the visual facade. It is through such uses of multimodal capability that remote sensing systems may hope to penetrate the most clever camouflage.

Active touch requires the body location senses of kinaesthesis and proprioception to be effective. Taylor, Lederman and Gibson (Tactual Texture Perception, Chapter to be published in Carterette and Friedman (eds.) Handbook of Perception) have proposed a model for active touch perception which relates the behavioural feedback loops at different levels of the touching process. Touching strategies develop through the use of behavioural feedback paths, in which kinaesthesis appears very prominently, and these determine the interplay of the various touch modalities so as to permit the construction of a coherent model from a loosely sampled world.

The Body Location Senses

Kinaesthesis and proprioception are the names given to a complex of senses which provide the central processor with data relevant to both the positions and velocities of parts of the body. Together we may call them the body location senses. Although they are not surveillance senses in themselves, they are essential to the proper functioning of the mapping mode in vision and touch. Normally, no attention is paid to the body location senses in themselves, but their data are incorporated directly in the interpretation of mapping data at a lower level of the motor feedback system. It is interesting that Leo and Vitelli (1971) have found that this is a very effective way to build an artificial walking machine. Normally kinaesthetic data are not available to the central processor, since they are used at a lower stage in the control system; but a certain amount can be obtained if the central processor pays attention to the kinaesthetic input.

The body senses are interesting not for their contribution as information sources, but for their interactions with other senses. If one is properly to relate one focal image from a high resolution mapping system with another, he must be able to relate the positions from which the two images were observed. As was discussed under the heading of vision, the eye can relate nearby scenes because the focal part of one remains present in the low resolution periphery of its neighbour. Two neighbouring focal images are easily coordinated by visual information alone, but this is no longer true when the scenes are widely separated, as they might be when an auditory alarm causes a head movement so that the hearer can see what made the noise.

Changes of pattern from one scene to another can not compensate for consistent distortions or aberrations of the mapping device. Internal consistency between successive focal images does not ensure the correctness of the model constructed from those images. Such correction can be accomplished only by a change in viewpoint, and then only if the relationship between viewpoints is well specified. The body location senses provide the specification. This is one reason why fixed points are so important in geographic mapping by aerial photography. Without the ground reference points, it is impossible to tell where and in what orientation the plane was when the pictures were taken. Airplanes are not usually provided with sufficiently precise location devices to give these answers autonomously. Aerial mapping uses exactly the logic that the human perceptual system uses to remove distortions from its map, although the actual technique of establishing the viewpoint locations is different.

Role of the Central Processor: Vigilance and Attention

In the discussion of the individual senses, great stress was laid on the provision of only relevant information to the central processor. The central processor is taken to be the policy maker for actions undertaken by the whole organism. The information it requires is thus the information which may direct its future actions. Only by deliberately focussing on a small portion of the input data stream can it avoid being overwhelmed by data, and it can function best if it focuses on that portion of the data stream most likely to give useful information. Focussing can be called for by alarm systems which operate either independently or under general direction from the central processor, so that a particular part of the data stream is scrutinized only when an alarm has indicated that it might be profitable to do so.

In the absence of an alarm condition, it is probably optimal for the central processor to direct its focus on those parts of the data stream likely on a priori grounds to give interesting information. Since a data stream in which nothing novel happens is unlikely to yield much information in the near future, the best behaviour of the processor should be to shift from one part of the stream to another at a fairly rapid rate. In other words, attention should wander in the absence of alarm signals.

Since the world model becomes less likely to coincide with the true state of the world over time, it should be continually updated. This again dictates a wandering field of attention. Some time probably should be devoted to restructuring the world model and projecting the consequences of possible action, and during this time possibly no attention at all is paid to the incoming data streams. The observer is "lost in thought."

When the normal functioning of the central processor is considered in this way, the phenomenon known as the vigilance decrement occurs when the perceiver is required to devote his attention continuously to a data stream which contains little or no useful new information about the environment that can influence his actions. This abnormal fixedness of attention cannot be long maintained, and performance soon suffers.

If maintenance of attention on an unrewarding data stream is the cause of the vigilance decrement, then possible remedies come to mind. One remedy would be to ensure that the useful information in vigilance tasks is presented to an alarm sense in a manner that will cause it to react and draw attention to the information which should be considered. Another, more difficult, approach would be to try to ensure that the observer always is kept busy at an active task whose accomplishment ensured the correct reaction to the appearance of the signal to be detected. This task would have to be of such a nature as to keep the central processor occupied, and the solution might not work because of the possibly overriding necessity for the processor to update its models.

An experiment is planned to test the first suggested solution in a radar simulation task. We intend to use the visual periphery as the alarm sense, and to stimulate it by presenting the targets as objects moving at a suitable speed. This can be done by the use of speeded historical displays. During any one second, the input from the last hundred seconds is displayed rapidly. A target that has moved in from the edge of the screen will be seen as a rapidly moving streak. During the next second, this streak will be repeated. The flashing and the movement should attract the attention of the observer, no matter what he is doing at the time, so long as the display is in his field of view. If this idea works, it might prove possible for one operator to monitor many displays simultaneously with greater efficiency than he now can attain with one.

Vigilance seems to be a product of the technological age. Most vigilance problems resolve to the fact that a mapping sense is being used for an alarm task, so that attention is kept unprofitably deployed. In the natural world, such task conflicts probably are not very frequent. With proper design, they should not arise in remote sensing systems, since alarm tasks should be presented to alarm perceptual modes. Proper design implies that the hardware designer should put considerable emphasis on preprocessing permitting the hardware to send an alarm signal, or on translation of the input data stream into a form that the human preprocessors can turn into alarm signals when necessary. The idea of speeding the radar display is simply to get the velocity of the interesting targets into a range to which the visual periphery is maximally sensitive.

Search Tasks

A task which has some kinship with the radar vigilance task is the search for a passive object. Here, however, the alarm senses cannot be used directly, since no event occurs to draw the attention to the target objects. Search can be accomplished only by performing what amounts to a mapping task without storing the successive focal images, and continuously evaluating the incoming focal images to see whether a pattern corresponding to the target object exists. The comparison pattern must be stored in memory in order that it can be compared with the incoming data stream. If there is no target, then the incoming data may be discarded unless it is used to update the old map.

Search is a difficult task, in that if the central processor is used to evaluate the incoming map data, it must maintain attention on that particular data stream. Although there is a great deal of incoming information, it does not result in any action decision until the search is successful or terminated. Hence the data might be regarded as unprofitable considered as information for action, and a decrement like the vigilance decrement might be expected, although the reason is slightly different. The solutions worth investigating are at first glance similar to those suggested for the classical vigilance decrement.

Pattern preprocessing is the most obvious and technically the most difficult solution for the search decrement. The comparison pattern may, in effect, be used directly by a preprocessor, so that the preprocessor passes ' go no-go' information rather than raw data. Anything the hardware designer can do to enhance the difference between a potential target and most non-targets should reduce the central processor's information load, particularly if the hardware includes an attention-drawing signal when a potential target is in view. An alternate approach is to provide simulated targets which require action. This approach has the flaw that search tends to terminate when the target has been found, and the appearance of the real target near a simulated target might cause the real target to be missed.

Pattern recognition machines seem to dominate the future of the remote search devices. A pattern recogniser which permitted the observer's attention to wander even half the time would be a very valuable adjunct to a searching system. Only be severely reducing the amount of data to which the human central processor must react can remote searching systems hope to have any reasonable success. Threshold level detectors in infrared scanners are a step in this direction, although they leave much to be desired.

Summary

The enormous amount of information implicit in the environment contains little that is directly relevant to the survival of any individual. The remainder is often of great importance, and it is the function of the perceptual systems to make sure that the central processor which decides on the organisms behavioural policy is properly informed of these interesting items. The main function of perceiving systems is seen in this light as one of reducing the information load on the central processor. As perceptual systems to fulfil this function have evolved, four main modes of operation have differentiated. These may be described as "alarm," "tracking," "mapping" and "search."

The "alarm" perceptual mode is the main device for reducing the information load on the central processor. Systems that operate in this mode pass little or no information to the central processor most of the time, but wait passively for some significant environmental event to occur. When it does so, the alarm system passes a signal to the central processor, which is then able to access the incoming stream of data and establish the true significance of the "alarm condition." This ability of the alarm systems to reject most of the incoming stimulation as irrelevant implies that alarm systems must be able to perform some preprocessing and decision functions, such that they can categorize environmental events into at least two classes, interesting and uninteresting.

In the other three modes of perceptual function, the central processor focuses on some part of the incoming data stream from the receptors, in a process labelled "paying attention." The three attentive modes are differentiated by the manner in which attention is held on target and by the use of memory. In the tracking mode, attention follows and is controlled by some environmental activity, either in progress or anticipated. Memory may not be required. In the mapping mode, attention moves under the autonomous control of the central processor over the environmental data stream, with the object of obtaining a series of high resolution images of the environment, which sequentially and at leisure can be built into a model within which the processor can assess the probable effects of future actions. The mapping function corresponds most closely to the conventional idea of what is meant by "perception." The search function is like mapping in that attention roves under the autonomous control of the central processor, but the successive focal images need not be stored. Instead, the successive images are compared with some memory of a "target" and the question is posed whether or not a target pattern exists in the current data stream. In any actual search, search mode is probably combined with mapping mode, because it takes little extra effort to update the world model using the focal images already decoded for the purposes of the search. In some cases, it is possible that preprocessors with pattern matching capabilities may be used during the search.

The implications of this view of perception for the design of remote sensing systems are immediate. It seems pointless to require a human to perform an alarm task like initial radar target acquisition using a perceptual system evolved for mapping, like the visual fovea. Yet this is most typically the approach taken by designers of information-related systems. The system is designed to give the human the most detailed and veridical picture of the world that can be obtained with the available technology, when what the human needs is a respite from information, broken by alarms that direct him to use the finest picture that can be given him. Pattern recognition devices are of the essence of alarm systems, and can be most important even though they only identify small elements of potentially interesting patterns. A ten-fold reduction in processing load is appreciable, and a thousand-fold possible.


Information for Action 1 October 29, 1971