Broadly speaking, the aims of procognitive systems are to promote and facilitate the acquisition, organization, and use of knowledge. Let us examine these broad aims, and some of the general requirements associated with them, before moving on to more specific discussion of plans and criteria.
The acquisition of knowledge -- the initial apprehension of increments to the fund of knowledge -- involves the recording and representation of events. It involves also a selective activity, directed from within the existing body of knowledge, and analyzing and organizing activities relating the increment to the existing body of knowledge. Both the acquisitive and the interpretive aspects are recognized, and seen to play strongly interactive roles, in "experience" and in "experimentation." However, although the interpretive aspects are included within it, the acquisitive aspects are largely excluded from the presentday concept of library. That is to say, when a library acquires an increment to its holding, it acquires the increment from a publisher, not from "primary nature."
The segmentation of the over-all cognitive process appears to have arisen, not because it was thought to be inherently desirable to turn one's back on the fund of knowledge while seeking out new knowledge to augment it, but because there was no way to make, or let, the acquisition process interact more directly with the processes of organization and maintenance of the main body. In thinking about new systems that may not have to suffer from that lack, we should keep in mind the possibility of developing stronger interactions between the acquisition process and the processes that deal with the knowledge that already exists. The idea is illustrated schematically in Fig. 1.
Fig1 Fig. 1. (a) Schematic representation of the existing relation between acquisition of knowledge through experimentation and the library system. "Nature" is represented by N; the body of knowledge stored in the library, by K. A small part K1 of K is understood in the form of some cognitive structure C1 -- that is located in the experimenter and his laboratory -- by an experimenter who conducts an experiment T1 upon a small part N1 of N. The three lines connecting one figure with another represent an interaction constrained only by the nature of T1. When the experimenter has collected and interpreted his data (not shown), he may write a paper that adds something to K1. (b) Illustrating the elimination of the constraints and limitations imposed by the interposition of the C1 between the T1 and the of diagram a. The experiments may now interact with the whole of K, and particularly with all of K1, using other channels of interaction in addition to those provided in diagram a (and now subsumed under the broader T1 -- K interaction). The advantage of diagram b over diagram a depends, of course, upon the effectiveness of the added arrangements for interaction.
To anchor the foregoing general consideration in a slightly more specific context, let us consider acquisition of knowledge through laboratory experimentation. The laboratory and the library are physically separate and distinct. The only channels for interaction between them are the telephone, the experimenter himself, and the books he borrows from the hbrary and examines in the laboratory. The part of the fund of knowledge that interacts with nature during an experiment, therefore, is only that part that is stored inside the experimenter's head, plus small amounts that come into his head from books he reads or from calls he makes to the library while his experiment is running, or that are implicit in the design of his experimental apparatus. Only after he has collected and analyzed his data does he go back to the library to investigate further their significance in relation to other parts of the body of knowledge. Thus the separation of library from laboratory forces the use of "batching" procedures in the acquisition of knowledge and leads, at best, to the collection -- in isolation from concurrent processes of acquisition, organization, and application -- of large, monolithic masses of data. At worst, the data are collected, not only in isolation from these concurrent processes, but also in isolation from one another, and the result is a chaos of miscellaneous individual cases. The difficulties of integrating the results of many simultaneous research projects that operate with very loose linkage to one another and to the body of knowledge is at present the object of much concern, particularly in the field of pharmaceutical research.
We have referred repeatedly to "the fund of knowledge," "the body of knowledge," and "the corpus." The most concrete schemata that are useful in shaping the concepts associated with those terms are the schemata that represent the strings of alphanumeric characters, and the associated diagrams, graphs, pictures, and so forth, that make up the documents that are preserved in recognized repositories. However, such simple, concrete schemata are not in themselves sufficient. Neuroanatomy and neurophysiology, together with human behavior, provide less definite, but nevertheless necessary, supplementary schemata that enrich the concept. These complex arrangements of neuronal elements and processes accept diverse stimuli, including spoken and printed sentences, and somehow process and store them in ways that support the drawing of inferences and the answering of questions; and though these responses are often imprecise, they are usually more appropriate to actual demands than mere reinstatement of past inputs could ever hope to be.
When we speak of organizing information into knowledge, we assume a set of concepts that involves many such schemata. The raw materials or inputs to the "organizer" are alphanumeric data, geometrical patterns, pictures, time functions, and the like. The outputs of the organized system are expressed in one or more of the input forms, but they are not mere reproductions or translations of particular inputs; they are suggestions, answers to questions, and made-to-order summaries of the kind that a good human assistant might prepare if he had a larger and more accurate memory and could process information faster. Concepts of the organizing process, and of the organization itself, are the objects of several of the studies that will be summarized in later pages.
In organizing knowledge, just as in acquiring knowledge, it would seem desirable to bring to bear upon the task the whole corpus, all at one time -- or at any rate larger parts of it than fall within the bounds of any one man's understanding. This aim seems to call for direct interactions among various parts of the body of knowledge, and thus to support the requirement, suggested in the Introduction, for an active or directly processible store.
One part of the concept of organization, called "memory organization," deals with the design of memory structures and systems, as distinct from structures and systems of information or knowledge. Its aim is to achieve two resonances or congruences: (1) between the memory and the information patterns that are likely to be stored in it, and (2) between the memory and the requests (e.g., questions) that are likely to be directed to it.
Knowledge is used in directing the further advancement and organization of knowledge, in guiding the development of technology, and in carrying out most of the activities of the arts and the professions and of business, industry, and government. That is to say, the fund of knowledge finds almost continual and universal application. Its recursive applications have been mentioned under the headings. Acquisition of Knowledge and Organization of Knowledge. They require more direct lines of information flow than are now available, lines that may be controlled by, but do not flow exclusively through, human beings.
This same need seems even stronger and more evident in some of the nonrecursive uses -- external applications -- of knowledge, particularly in engineering. It should be possible, for example, to transfer an entire system of chemical formulas directly from the general fund of knowledge to a chemical process-control system, and to do so under human monitorship but not through human reading and key pressing. It should be possible for the logistics manager who wants to have in his "data base" the dimensions of the harbors of the world to connect his own information system, through a suitable retrieval filter, to the "Procognitive System of Congress." He should not have to assign a dozen employees to a week of searching, note taking, and card punching.
In general, as Fig. 2 suggests, it should be possible to transfer, directly from the general fund to the mechanism of a specific application, the various complexes or representations of knowledge required to support applications. The transfer should be requested and controlled through a process involving initial prescription, negotiated refinement of description, tests against various stated criteria, and human monitorship. To develop that general approach to application should be one of the main aims for procognitive systems.
Fig2 Fig. 2. (a) Simplified schematic representation illustrating the flow of information in present-day applications of the fund of knowledge K. Two applications. A1, and A2, are represented, each made by a human being H1 working mainly through an application system S1. The thickness of the lines represents the amount of information flow. All the information flows through the human beings. (b) The situation that would prevail if. through the development of a procognitive system, the fund of knowledge were extended into intimate interactions (represented by the flared projections and their interfaces) with human users and their application systems. The dotted lines are control paths. Small amounts of control information are capable of directing the selection, transformation, and transmission of large amounts of substantive information. The human beings now function mainly as executives rather than mainly as relayers of information. For complex applications involving several or many men, schema b should be extended, of course, to provide communication and coordination through S1 and to let upper echelons exert control over lower-echelon channels.
In each of the three areas, acquisition, organization, and application, we are now greatly limited by the constraint that, whenever information flows into, within, or out of the main store of knowledge, it must pass through people. We shall not belabor the severity of the constraint. It is enough to note that a man, reading eight hours a day every work day, at a speed appropriate for novels, could just keep up with new "solid" contributions to a subfield of science or technology. It no longer seems likely that we can organize or distill or exploit the corpus by passing large parts of it through human brains. It is both our hypothesis and our conviction that people can handle the major part of their interaction with the fund of knowledge better by controlling and monitoring the processing of information than by handling all the detail directly themselves.
In order even to test the control-and-monitor approach, it is necessary first to externalize and make explicit the main procedures people employ -- together with other procedures of equal or greater effectiveness -- for dealing with stored information. It would doubtless be extremely difl&cult to accomplish that preliminary step if we included, among the main procedures, complete processes leading to insight and discovery. Eventually men may succeed in describing those "intelligent" processes completely and explicitly. If they do, we should like to incorporate the procedures into procognitive systems. However, the concept here under discussion does not depend upon complete programs for processes of such high sophistication. We are thinking in terms of lower-echelon procedures. The idea is merely to let people control the processing of the information in the body of knowledge by (1) applying named sequences or named hierarchal arrangements of procedures to named texts, graphs, and tables, (2) observing the results, and (3) intervening whenever a change or extension of plan is required.
We envision several different levels of abstraction in the control system and in its languages. At a procedure-oriented level, the system would be capable of implementing instructions such as the following:
- Limit domain A in subsequent processing to paragraphs that contain at least four words of list x or their synonyms in thesaurus y.
- Transform all the sentences of document B to kernel form.
- Search domain C for instances of the form u = v(w) or w = v'(u) in which u and w are any names, v is any function name in list z, and v' appears in list z as the inverse of v.
- If the information that meets the prescription can be displayed in three pages, display it now; otherwise display the number of pages required.
- Select from domain D and add to list t each sentence that deals in any way with an operation upon something that contains, or can contain, something else that is mentioned in the sentence.
- How many documents in the entire store have sections characterized by g profiles that correlate above 0.7 with the g profile of section 3 of document E?
- Change 0.7 in the foregoing to 0.8. How many?
In the foregoing example of instructions in a hypothetical procedure-oriented language, each term in italics is to be regarded as a particular value of a variable; other terms of the suggested class would be equally admissible. Terms such as "limit . . . to," "domain," "subsequent," "processing," "contain," "at least," "of," "their," "or," "synonym," "in," and "transform," would have standard meanings within the system. There would be very many such terms. Only speciaHsts would learn all the terms and their specific meanings. However, the language would offer some flexibility in the use of synonyms and much flexibility in selection of syntactic forms, and it would not take many months to become a specialist. Instruction 1, for example, could equally well be given as:
1a. Exclude henceforth from domain A all paragraphs not containing four or more words that are in list x or that are thesaurus-y synonyms of words that are in list x.
To devise and implement such a language -- successful use of which demands substantive knowledge and clear thinking, but not rigid adherence to complex rules of format -- will require an extrapolation, but an achievable extrapolation, of computer-programming languages.
With the aid of the language and procedures suggested in the preceding discussion, one could move onward to specialized languages, oriented toward particular fields or subfields of knowledge, that would be easier to learn and use. A servomechanisms engineer, for example, might employ a language in which instructions such as the following could be implemented:
- Convert all the Nyquist diagrams in set A to Bode plots.
- How many reports are there that contain transfer functions of human operators in nonlinear control systems?
- How many of the transfer functions are for stochastic inputs?
- Display the transfer functions one at a time on the screen.
- Transfer W. E. Smith's AJAX simulation to my Experiment C data base as simulation 2.
Obviously such a system must contain much substantive knowledge of its fields. A language for servo engineers will have to be developed in large part by servo engineers. Indeed, the only way to bring into being the many field-oriented languages required to support widespread use of procognitive systems will be (1) to attract leading members of the various substantive fields into pioneering work in the regions of overlap between the substantive fields and the information sciences, and (2) to provide them with ready-made component procedures, procedure-oriented languages designed to facilitate the development of field-oriented languages, and machines capable of putting the field-oriented languages to work and thus facilitating substantive research and application as soon as the languages are developed.
In any event, a basic part of the over-all aim for procognitive systems is to get the user of the fund of knowledge into something more nearly like an executive's or commander's position. He will still read and think and, hopefully, have insights and make discoveries, but he will not have to do all the searching himself nor all the transforming, nor all the testing for matching or compatibility that is involved in creative use of knowledge. He will say what operations he wants performed upon what parts of the body of knowledge, he will see whether the result makes sense, and then he will decide what to have done next. Some of his work will involve simultaneous interaction with colleagues and with the fund of stored knowledge. Nothing he does and nothing they do will impair the usefulness of the fund to others. * (* footnote: Except, of course, for the introduction of false information into the authenticated and organized core of the fund -- but the procognitive system will be better protected than the present system is against the introduction of false information, because of its more elaborate editing, correlating, and organizing procedures.) Hopefully, much that one user does in his interaction with the fund will make it more valuable to others.
The set of criteria that should or must be met in the design and development of procognitive systems includes economic elements and elements relating to technical feasibility as well as elements reflecting the needs and desires of potential users. It includes also some elements that will be governed mainly by quasi-philosophical attitudes toward courses to be followed and goals to be sought by man and civilization. Finally, it includes the consideration that there must be a way "to get there from here," whether the course be evolutionary (the expressed preference of many present-day system technologists) or revolutionary.
Economic criteria tend to be dominant in our society. The economic value of information and knowledge is increasing. By the year 2000, information and knowledge may be as important as mobility. We are assuming that the average man of that year may make a capital investment in an "intermedium" or "console" -- his intellectual Ford or Cadillac -- comparable to the investment he makes now in an automobile, or that he will rent one from a public utility that handles information processing as Consolidated Edison handles electric power. In business, government, and education, the concept of "desk" may have changed from passive to active: a desk may be primarily a display-and-control station in a telecommunication-telecomputation system * (* footnote: If a man wishes to get away from it all and think in peace and quiet, he will have merely to turn off the power. However, it may not be economically feasible for his employer to pay him at full rate for the time he thus spends in unamplified cerebration.) and its most vital part may be the cable ("umbilical cord") that connects it, via a wall socket, into the procognitive utility net. Thus our economic assumption is that interaction with information and knowledge will constitute 10 or 20 per cent of the total effort of the society, and the rational economic (or socioeconomic) criterion is that the society be more productive or more effective with procognitive systems than without.
Note that the allocation of resources to information systems in this projection covers interaction with bodies of information other than the body of knowledge now associated with libraries. The parts of the allocation that pay for user stations, for telecommunication, and for telecomputation can be charged in large part to the handling of everyday business, industrial, government, and professional information, and perhaps also to news, entertainment, and education. These more mundane activities will require extensive facilities, and parts of the neolibrary procognitive system may ride on their coattails.
Whether or not, even with such help, the procognitive system can satisfy the economic criterion within our time scale depends heavily upon the future courses of our technology and our social philosophy. As indicated earlier, the technological prospect can be viewed only through uncertain speculation, but the prospect is fairly bright if the main trends of the information technology hold. The same cannot be said for the philosophical prospect because it is not as clear what the trends are.
To some extent, of course, the severity of the criteria that procognitive systems will be forced to meet will depend upon whether the pro- or anti-intellectual forces in our society prevail. It seems unlikely that widespread support for the development of procognitive systems will stem from appreciation of "the challenge to mankind," however powerful that appreciation may be in support of space efforts. The facts that information-processing systems lack the sex-symbolizing and attention-compelling attributes of rockets, that information is abstract whereas the planets and stars are concrete, and that procognitive systems may be misinterpreted as rivaling man instead of helping him -- these facts may engender indifference or even hostility instead of support.
At the present time, in any event, not many people seem to be interested in intimate interaction with the fund of knowledge -- but, of course, not many have any idea what such interaction would be like. Indeed, it would not be like anything in common experience. The only widespread schemata that are relevant at all are those derived from schooling, and they suffer from lack of relevance on precisely the critical point, intimacy of interaction. The few who do have somewhat appropriate schemata for projection of the picture -- who have had the opportunity to interact intimately ("on line" in a good, flexible system) with a computer and its programs and data -- are excited about the prospect and eager to move into the procognitive future, but they are indeed few. Even if their number should grow as rapidly as opportunity for on-line interaction will permit, they will constitute a cadre of useful specialists rather than a broad community of eager supporters.
The foregoing considerations suggest that the economic criterion will be rigidly enforced, that procognitive systems will have to prove their value in dollars before they will find widespread demand. If so, procognitive systems will come into being gradually, first in the richest, densest areas of application, which will be found mainly in government and business, and only later in areas in which the store of information is poor or dilute. Close interaction with the general fund of knowledge, which is on the whole neither rich nor dense, will be deferred, if these assumptions are correct, until developments paid for by special procognitive applications have made the broader effort practicable. Such a "coattail" ride on a piecemeal carrier may not be the best approach for the nation or the society as a whole, but it seems to be the most probable one. In any event, it is beyond the present scope to determine an optimal course through the quasiphilosophical and socioeconomic waters.
The criteria that are clearly within our scope are those that pertain to the needs and desires of users. The main criteria in that group appear to be that the procognitive system:
- Be available when and where needed.
- Handle both documents and facts. * (* footnote: "Facts," used here in a broad sense, refers to items of information or knowledge derived from one or more documents and not constrained to the form or forms of the source passages. It refers also to items of information or knowledge in systems or subsystems that do not admit subdivision into documentlike units.)
- Permit several different categories of input, ranging from authority-approved formal contributions (e.g., papers accepted by recognized journals) to informal notes and comments.
- Make available a body of knowledge that is organized both broadly and deeply -- and foster the improvement of such organization through use.
- Facilitate its own further development by providing tool-building languages and techniques to users and preserving the tools they devise and by recording measures of its own performance and adapting in such a way as to maximize the measures.
- Provide access to the body of knowledge through convenient procedure-oriented and field-oriented languages.
- Converse or negotiate with the user while he formulates his requests and while responding to them. 8. Adjust itself to the level of sophistication of the individual user, providing terse, streamlined modes for experienced users working in their fields of expertness, and functioning as a teaching machine to guide and improve the efforts of neophytes.
- Permit users to deal either with metainformation (through which they can work "at arms length" with substantive information), or with substantive information (directly), or with both at once.
- Provide the flexibiUty, legibiHty, and convenience of the printed page at input and output and, at the same time, the dynamic quality and immediate responsiveness of the oscilloscope screen and light pen.
- Facilitate joint contribution to and use of knowledge by several or many co-workers.
- Present flexible, wide-band interfaces to other systems, such as research systems in laboratories, information-acquisition systems in government, and application systems in business and industry.
- Reduce markedly the difiiculties now caused by the diversity of publication languages, terminologies, and "symbologies."
- Essentially eliminate publication lag.
- Tend toward consolidation and purification of knowledge instead of, or as well as, toward progressive growth and unresolved equivocation. * (* footnote: It may be desirable to preserve, in a secondary or tertiary store, many contributions that do not qualify as "solid" material for the highly organized, rapidly accessible nucleus of the body of knowledge.)
- Evidence neither the ponderousness now associated with overcentralization nor the confusing diversity and provinciality now associated with highly distributed systems. (The user is presumably indifferent to the design decisions through which this is accomplished.)
- Display desired degree of initiative, together with good selectivity, in dissemination of recently acquired and "newly needed" knowledge. To the foregoing criteria, it may be fair to add criteria that are now appreciated more directly by librarians than by the users of libraries. Some of the following criteria are, as they should be, largely implicit in the foregoing list, but it will do no harm to make them explicit.
- Systematize and expedite the cataloguing and indexing * (* footnote: "Indexing" is subsumed under "organization" in our use of the latter term in connection with documents or corpora.) of new acquisitions, forcing conformity to the system's cataloguing standards at the time of "publication" and distributing throughout the system the fruits of all labor devoted to indexing and other aspects of organization.
- Solve the problem of (mainly by eliminating) recovery of documents.
- Keep track of users' interests and needs and implement acquisition and retention policy (policy governing what to hold in local memories) for each local subsystem.
- Record all chargeable uses, and handle bookkeeping and biUing. Also record all charges that the system itself incurs, and handle their bookkeeping and payment.
- Provide special facilities (languages, processors, displays) for use by system specialists and by teams made up of system and substantive specialists in their continual efforts to improve the organization of the fund of knowledge. (This professional, system-oriented work on organization is supplemented by the contributions toward organization made by ordinary users in the course of their substantive interaction with the body of knowledge.
- Provide special administrative and judicial facilities (again languages, processors, displays) for use in arriving at and implementing decisions that affect overall system policies and rules.
The list of criteria ends with two considerations that we think many users will deem extremely important in a decade or two, but few would mention now: 24. Handle formal procedures (computer programs, subroutines, and so forth, written in formal, machineindependent languages) as well as the conventional documents and facts mentioned in criterion 2. 25. Handle heuristics (guidelines, strategies, tactics, and rules of thumb intended to expedite solution of problems) coded in such a way as to facilitate their association with situations to which they are germane.
The foregoing criteria are set forth, we recognize, essentially as absolute desiderata, and not -- as system criteria should be -- as scales of measurement with relative weights, interdependent cutoff points, or other paraphernaha for use in optimization. The reason for stopping so far short of an explicit decision procedure is partly that it is very difficult to set up such a procedure for so complex a system, but mainly that it is too early to foresee the details of interaction among the decision factors. The foregoing lists are intended not to provide a complete mechanism for the evaluation of plans, but merely to invite discussion and emendation and to furnish a context for examination of the "plan" that follows.
The plan to be presented here is not a plan to be implemented by a single organization. It is not a system design or a management plan. Rather, it is a rough outline of researches and developments, many of which will probably be carried out, plan or no plan, during the next several decades. The reason for setting forth such a plan is not to guide research and development, which would be presumptuous, but to provide a kind of checklist or scorecard for use in following the game. If the technology should take care of most of the items in the plan but fall behind on a few, then it might be worth while for an agency interested in the outcome to foster special efforts on the delinquent items.
Moreover, this plan is not a final plan or even a mature plan. Perhaps it should be regarded only as a set of suggestions, made by a small group without expertness in all the potentially contributory disciplines, toward the formulation of a plan for a system to facilitate man's interaction with the store of knowledge. For the sake of brevity, however, let us call it a plan. It will be convenient to discuss it in two parts:
- The structure and functions of the proposed system.
- Approaches to realization of the proposed system through research, technology development, and system development.
The proposed procognitive system has a hierarchical structure of the kind mentioned earlier: system, subsystem, . . . component. It seems at first glance to be hierarchical also in another way: it has a top-echelon or central subsystem, several second-echelon or regional subsystems, many third-echelon or local subsystems, and very many fourth-echelon subsystems or user stations. Actually, however, as Fig. 3 illustrates, there are departures from the simple, treelike paradigm of a true hierarchy. First, for the sake of reliability and what the military calls "survivability," the top-echelon subsystem should be replicated. However, it may not be possible, or even desirable, to give the replicates all the capabilities of the main subsystem. Second, each third-level subsys- tem may be connected to any higher-level subsystem, and to more than one higher-level subsystem at a time. Technically speaking, that makes the structure a lattice instead of a hierarchy. Perhaps it will be best to call it simply a "network."
Fig3 Fig. 3. Over-all structure of the procognitive system. The circles and ellipses represent advanced and specialized computer systems. The squares represent man-computer interfaces, those of echelon 4 being stations or consoles for substantive users of the system. Most of the connections are switchable telecommunication links. Those shown as solid lines represent connections that might be established at a particular moment during operation. The dotted lines are introduced to suggest other connections that could be established. The centers of echelon 1 are concerned primarily with maintaining the total fund of knowledge, those of echelon 2 with organizing the corpora of fields or subfields of knowledge, and those of echelon 3 with the processing required by users in various localities. The user stations of echelon 4 provide input and output (control and display) facilities and perhaps some processing and memory associated with control and display. Except in echelon 1, the number of subsystems envisioned for the projected system is very much greater than the number shown.
The best schema available for thinking about the third and fourth echelons is provided by the multiple-console time-sharing computer systems recently developed, or under development, at Massachusetts Institute of Technology, Cargenie Institute of Technology, System Development Corporation, RAND Corporation, Bolt Beranek and Newman, and a few other places. In order to provide a good model, it is necessary to borrow features from the various time-sharing systems and assemble them into a composite schema. Note that the fourth-echelon sub-systems are user stations and that the third-echelon sub-systems are intended primarily to provide short-term storage and processing capability to local users, not to serve as long-term repositories.
The second-echelon subsystems are structurally more like computer systems than libraries or documentation Acenters, though they function more like libraries. typical second-echelon subsystem is essentially a digital computer * (* footnote: It is possible that, before operationally significant procognitive systems are developed, another kind of information processor will displace, from its prime position in information technology, what we now recognize as the digital computer. It seems to us unlikely that devices of the perceptron type will best fulfill the purposes with which we are here concerned, but other schemata exist and still more arc conceivable, and plenty of time remains for us to be openminded. In any event, the design of digital computers is departing from the Princeton paradigm, and the next decade may see as much diversity of structure among digital computers as the last decade saw homogeneity.) with many processors, memory blocks, and input-output units working in parallel and with a large and advanced memory hierarchy, plus a sophisticated digital communication terminal and stations for use by its own specialists in operating and in organizing. Each second-echelon subsystem handles one or more than one substantive field or subfield * (* footnote: At first, it will be possible only to handle subfields. As technology advances, it may become possible to bring related subfields together and to handle an entire field of knowledge in a single subsystem.) of knowledge. Two or three subsystems may work partly in parallel and partly in complement in the largest and most active fields or subfields.
The top-echelon subsystems are similar in general schema to the second-echelon subsystems. The top echelon is specialized ( 1 ) to preserve the body of knowledge, (2) to add to it progressively the distilled contributions received from second-echelon subsystems, (3) to transfer information to lower-echelon subsystems on request, and (4) to improve the organization of the over-all fund in ways complementary to those pursued in the second-echelon subsystems.
The top-echelon memory is, therefore, extremely large. Its design may have to sacrifice speed to achieve the necessary size. For several decades, indeed, it seems likely that the limitations on memory size will completely dominate the picture, and that there will be little hope of achieving a strongly interpenetrating organization of the over-all body of knowledge. In the interim, the top echelon will be limited essentially to the first three functions.
Until the top echelon can take up function (4) effectively, it may be desirable to "organize around the problem" in the following way: Use the top echelon, in the manner described, to fulfill the first three functions. Create several special second-echelon subsystems to deal with cross-field interactions, limiting them to fields (or subfields) that are judged likely to have important overlaps or significant interconnections. These special second-echelon subsystems may not be able to operate on the entire corpora of the fields or subfields with which they are concerned; they may have to use highly distilled representations. Even with such limitations, however, they should be able to make valuable contributions by fostering homogeneity of practice from field to field, detecting apparent duplications and complementations in related fields, and noting similarities of form or structure in models or other information structures employed in substantively diverse areas.
The number of centers in echelon 1 envisioned for a national * (* footnote: This discussion is focused on a system appropriate for the United States or perhaps for North America. The ways in which the structure of a world-wide system would differ depend critically on the future economics of intercontinental telecommunication.) system is approximately three, as shown in Fig. 3. In echelon 2, the number of centers should correspond roughly to the number of fields (approximately 100) or subfields (approximately 1000) into which knowledge is subdivided for deep analysis and organization. In echelon 3, the number of centers should correspond to the number of localities in which significant interaction with the body of knowledge occurs. "Localities" will be large areas if the economic advantage of large information-processing systems over small ones tends to outweigh the incremental cost (associated with the greater distances in larger areas) of communication between user stations and centers; they will be small if communication costs tend to dominate. Large organizations may maintain their own third-echelon centers and use them in processing proprietary information as well as information from or for the general fund. And the number of third-echelon centers will, of course, depend upon the demand. These considerations make projection of the number of third-echelon centers highly uncertain. It falls somewhere between 20 and 2000. We have already examined some aspects of the fourth-echelon user stations. There will be hundreds of thousands of user stations, though many of them will be used only intermittently.
Ordinarily, a user will dial his own nearby third-echelon center and use its processing and memory facilities. His center will probably be holding some of his personal data or procedures in its store, and, in addition, using the local center will keep down the transmission costs. However, when a user wishes to work with a distant colleague, and to pool his personal data with those of his colleague, he can dial the remote center and request transmission of his data to it. * (* footnote: Other arrangements for cooperative work may prove superior to the one suggested. Our purpose here is merely to note that the need will exist and can be met.)
Perhaps the best way to consolidate the picture that we have been projecting, one part at a time, is to describe a series of interactions, between the system and a user who is working on a substantive problem that requires access to, and manipulation of, the fund of knowledge. Let us choose an example that will exercise the system in several ways -- and try to compensate for the complexity thus necessarily introduced by describing the interaction in detail only in the first episode, and then moving to a higher level of abstraction. Let us, for the sake of brevity, refer to the system as "system" and to the user as "L" And, finally, let us use in the example a fairly straightforward descriptor-based approach to document retrieval, even though that facet of the art should be greatly advanced by 1994, and even though we shall not hesitate in the same example to assume a question-answering capability that is much farther advanced than the document-retrieval capabiHty.
Friday afternoon -- I am becoming interested, let us say, in the prospect that digital computers can be programmed in such a way as to "understand" passages of natural language. (That is a 1964 problem, but let us imagine that I have available in 1964 the procognitive system of 1994.) In preparation for a session of study on Monday, I sit down at my console to place an advance order for study materials. I take this foresighted approach because I am not confident that the subject matter has been organized well in the store of the procognitive system, or even that the material I wish to examine is all in one subfield center.
Immediately before me on my console is a typewriter that is, in its essentials, quite like a 1964 office typewriter except that there is no direct connection between the keyboard and the marking unit. When I press a key, a code goes into the system, and the system then sends back a code (which may or may not be the one I sent), and the system's code activates the marking unit. To the right of typewriter, and so disposed that I can get into myposition to write on it comfortably if I rotate chair a bit, is an input-output screen, a flat surface 11" \times 14" on which the system and I can print, write, and draw to each other. It is easy to move this surface to a position above the typewriter for easy viewing while I type, but, because I like to write and draw my inputs to the system, I usually leave the screen in its horizontal position beside the type-writer. In a penholder beside the screen is a pen that can mark on the screen very much as an ordinary pen marks on paper, except that there is an "erase" mode. The coordinates of each point of each line marked on the screen are sensed by the system. The system then "recognizes" and interprets the marks. Inside the console is a cameraprojector focused upon the screen. Above the chair is a microphone. The system has a fair abihty to recognize speech sounds, and it has a working vocabulary that contains many convenient control words. Unfortunately, however, my microphone is out of order. There is a power switch, a microphone switch, a camera button, and a projector button. That is all. The console is not one of the high-status models with several viewing screens, a page printer, and spoken output.
The power is on, but I have not yet been in interaction with the system. I therefore press a typewriter key -- any key -- to let the system know the station is going into operation. The system types back, and simultaneously displays upon the screen:
14:23 13 November 1964 Are you J. C. R. Licklider?
(The system knows that I am the most frequent, but not the only, user of this console.) I type "y" for yes, and the system, to provide a neat record on the right-hand side of the page, types:
J. C. R. Licklider
and makes a carriage return. (When the system types, the typewriter operates very rapidly; it typed my name in a fifth of a second.) The display on the screen has now lost the "Are you . . ." and shows only the date and name. Incidentally, the typing that originates with me always appears in red; what originates in the computer always appears in black.
At this early stage of the proceedings, I am interacting with the local center, but the local center is also a subsystem of systems other than the procognitive system. Since I wish to use the procognitive system, I type
Procog
and receive the reply:
You are now in the Procognitive System.
To open the negotiation, I ask the procognitive system:
What are your descriptor expressions for: computer processing of natural language computer processing of English natural-language control of computers natural-language programming of computers DIGRESS
At the point at which I wrote "DIGRESS," it occurred to me that I might in a short while be using some of the phrases repeatedly, and that it would be convenient to define temporary abbreviations. The typed strings were appearing on the display screen as well as the paper. (I usually leave the console in the mode in which the information, when it will fit and not cause delay, is presented on both the typewriter and the screen. ) I therefore type:
define temp
On recognizing the phrase, which is a frequently used control phrase, the system locks my keyboard and takes the initiative with:
via typewriter? via screen? define temporarily
I answer by swiveling to the screen, picking up the pen, and pointing to screen on the screen. I then point to the beginning and end of computer processing, then to the c and the p, and then to a little square on the screen labeled "end symbol." (Several such squares, intended to facilitate control by pointing, appear on the screen in each mode.)
In making the series of designations by pointing just described, I took advantage of my knowledge of a convenient format that is available in the mode now active. The first two pointings designate the beginning and end of the term to be defined, and the next pointings, up to "end symbol," spell out the abbreviation. (Other formats are available in the current mode, and still others in other modes.) If my microphone had been working, I should have said "Define cee pee abbreviation this" and drawn a line across computer processing as I said "this." The system would then have displayed on the screen its interpretation of the instruction, and then (after waiting a moment for me to intervene) implemented it. Next, I define abbreviations for "natural language" (nl), "computer" (comp), and "programming" (prog). (Unless instructed otherwise, the system uses the same abbreviation for singular and plural forms and for hyphenated and unhyphenated forms.) And finally, insofar as this digression is concerned, I touch a small square on the screen labeled "end digression," return to the typewriter, and type:
comp understanding of nl comp comprehension of semantic relations?§
The question mark terminates the query, and the symbol § tells the system not to wait for further input from me now.
Because the system's over-all thesaurus is very large, and since I did not specify any particular field or subfield of knowledge, I have to wait while the requested information is derived from tertiary memory. That takes about 10 seconds. In the interim, the system suggests that I indicate what I want left on the display screen. I draw a closed line around the date, my name, and the query. The line and everything outside it disappear. Shortly thereafter, the system tells me:
Response too extensive to fit on screen. Do you wish short version, multipage display, or typewriter-only display?
Being in a better position to type than to point, I type:
s
That is enough to identify my preference. The short version appears on the screen but not the typewriter -- I rather expected it there too):
Descriptor expressions:
- (natural language) ∧ (computer processing of)
- (natural language) ∧ (on-line man-computer interaction)
- (natural language) ∧ (machine translation of)
- (natural language) ∧ (computer programming)
- (computer program) ∧ (semantic net)
- (compiler) ∧ (semantic) Descriptor inclusions:
- (natural language) includes (English) Phrase equivalences:
- (control of computers) ≃ (on-line man-machine interaction)
- (programming of computers) ≃ (computer programming)
- (semantic relations) ≃ (semantic nets) [END]
I am happy with 1 and especially happy with 2. I am curious about 5. 1 think I should look into 4. Number 3 frightens me a little, because I suspect it will lead to too much semirelevant information. I think I recognize 6 as a false lead into the field of program compiling, in which field, although "syntax" seems to be used in essentially the linguist's sense, "semantics" is used in what appears to me to be an inscrutable kind of jargon. Accordingly, I type to the system:
Prepare bibliographies with abstracts (1, 2, 4, 5). Best 25 references each. When will they be available?
Before turning control over to the system, I decide to ask a substantive question, even though I reaUze I do not yet know enough about the subject to inquire intelligently. Using the control word, "QUESTION," to estabhsh the mode, I ask:
QUESTION: Has any digital-computer system responded to questions phrased in ordinary English, or other natural language, by giving correct answers in natural language? §
The system replies first to "When will they be available?"
Requested bibliographies will be available at about 18:00. Do you want them delivered, or will you call, or will you re- quest typeout?
Unfortunately, my office is not located near a pneumatic-tube station. I rely heavily on typeouts made by the system when I am away from the console. I reply:
Type them out as soon as they are available.
The system reminds me:
Please leave your console power on!
Then it replies to the "QUESTION":
Cannot give yes-no answer to QUESTION. Approaches to "yes" are described in the following:
- B. F. Green, A. K. Wolf, C. Chomsky, and K. Laughery, Baseball: An Automatic Question-Answerer, Proc. Western Joint Computer Conf., 19,219-224, 1961.
- S. Klein and R. F. Simmons, A Computational Approach to Grammatical Coding of English Words, J. Assoc. Computing Machinery, 10, 334-347, 1963.
- S. Klein and R. F. Simmons, Syntactic Dependence and the Computer Generation of Coherent Discourse, Mechanical Translation (entering system).
The foregoing must suffice to suggest the nature of the interaction at the level of key pressing and pointing. The console hardware and procedure embody many features, worked out through the years, that maximize convenience and free the user from clerical routine. The formats and procedures are quite flexible. The user learns, through working with the system, what modes and techniques suit him best. Ordinarily, he gives the system rather terse, almost minimal instructions, relying on it to interpret them correctly and to do what he wishes and expects. When it misinterprets him or gets off the track of his thinking, as it sometimes does, he falls back on more explicit expression of commands and queries.
To continue with our example, let us move on to Monday. The reference citations and abstracts are ready for examination. The system has anticipated that I may want to see or process the full texts, and they are now available in secondary memory, having been moved up from the tertiary store along with a lot of other, somewhat less clearly relevant, material. I do not know exactly how much of such anticipatory preparation has gone on within the system, but I know that the pressure of on-line requests is low during the week-end, and I am counting on the system to have done a fair amount of work on my behalf. (I could have expHcitly requested preparatory assembly and organization of relevant material, but it is much less expensive to let the system take the initiative. The system tends to give me a fairly high priority because I often contribute inputs intended to improve its capabilities.) Actually, the system has been somewhat behind schedule in its organization of information in the field of my interest, but over the week-end it retrieved over 10,000 documents, scanned them all for sections rich in relevant material, analyzed all the rich sections into statements in a high-order predicate calculus, and entered the statements into the data base of the question-answering subsystem.
It may be worthwhile to digress here to suggest how the system approached the problem of selecting relevant documents. The approach to be described is not advanced far beyond the actual state of the art in 1964. Certainly, a more sophisticated approach will be feasible before 1994.
All contributions to the system are assigned tentative descriptors when the contributions are generated. The system maintains an elaborate thesaurus of descriptors and related terms and expressions. The thesaurus recognizes many different kinds of relations between and among terms. It recognizes different meanings of a given term. It recognizes logical categories and syntactic categories. The system spends much time maintaining and improving this thesaurus. As soon as it gets a chance, it makes statistical analyses of the text of a new acquisition and checks the tentatively assigned descriptors against the analyses. It also makes a combined syntactic-semantic analysis of the text, and reduces every sentence to a (linguistic) canonical form and also to a set of expressions in a (logical) predicate calculus. If it has serious difficulty in doing any of these things, it communicates with the author or editor, asking help in solving its problem or requesting revision of the text. It tests each unexpectedly frequent term (word or unitary phrase) of the text against grammatical and logical criteria to determine its appropriateness for use as a descriptive term, draws up a set of working descriptors and subdescriptors, sets them into a descriptor structure, and, if necessary, updates the general thesaurus. * (* footnote: New entries into the general thesaurus are dated. They remain tentative until proven through use.)
In selecting documents that may be relevant to a retrieval prescription, the system first sets up a descriptor structure for the prescription. This structure includes all the terms of the prescription that are descriptors or subdescriptors at any level in the thesaurus. It includes, also, thesaurus descriptors and subdescriptors that are synonymous to, or related in any other definite way to, terms of the prescription that are not descriptors or subdescriptors in the thesaurus. All the logical relations and modulations of the prescription are represented in its descriptor structure.
The descriptor structure of a document is comparable to the descriptor structure of a prescription. The main task of the system in screening documents for relevance, therefore, is to measure the degrees of correlation or congruence that exist between various parts of the prescription's structure and corresponding parts, if they exist, of each document's structure. This is done by an algorithm (the object of intensive study during development of the system) that determines how much various parts of one Structure have to be distorted to make them coincide with parts of another. The algorithm yields two basic measures for each significant coincidence: (1) degree of match, and (2) size of matching substructure. The system then goes back to determine, for each significant coincidence, ( 3 ) the amount of text associated with the matching descriptor structure. All three measures are available to the user. Ordinarily, however, he works with a single index, which he is free to define in terms of the three measures. When the user says "best references," the system selects on the basis of his index. If the user has not defined an index, of course, the system defines one for him, knowing his fields of interest, who his colleagues are, how they defined their indexes, and so forth.
We shall not continue this digression to examine the question-answering or other related facilities of the system. Discussions relevant to them are contained in Part II. Let us return now to the conclusion of the example.
I scan the lists of references and read the abstracts. I begin to get some ideas about the structure of the field, and to appreciate that it is in a fairly primitive stage. Evidently, it is being explored mainly by linguists, logicians, psychologists, and computer scientists, and they do not speak a uniform language. My interest is caught most strongly by developments in mathematical syntax. The bibliography contains references to work by Noam Chomsky, Ida Rhodes, A. G. Oettinger, V. E. Giuliano, V. H. Yngve, and others. I see that I was wrong in neglecting machine translation. I correct that error right away. The requested bibliography appears at once; the system had discovered the relevance and was prepared.
The first thing I wish to clear up is whether the syntactic and semantic parts of language are, or should be, handled separately or in combination in computer analysis of text. I give the system a series of questions and commands that includes:
Refer to bibliographies I requested last Friday. Do cited or related references contain explicit definitions of "syntax", "syntactics", or "semantic"? Do syntactic analyses of sentences yield many alternative parsings? Give examples showing alternatives. Give examples illustrating how many. Is there a description of a procedure in which an analysis based on mathematical syntax is used in tandem or in alternation with semantic analysis? Display the description. How long is Oettinger's syntactic analyzer? Do you have it available now? §
It turns out that Oettinger, Kuno and their colleagues have developed a series of syntactic analyzers and that the most recent one has been documented and introduced into the procognitive system (Oettinger and Kuno, 1962). I do not have to bother Oettinger himself -- at least not yet.
I request that the program be retrieved and prepared for execution by asking:
What arguments and control operations does the routine require? What formats? How do I test it? How do I apply the routine to a test sentence?
The system tells me that all I have to do to apply the routine to a short test sentence, now that the system has the routine all ready to go, is to type the sentence; but for long inputs there are rules that I can ask to see. I type a sentence discussed by Kuno and Oettinger (1963):
They are flying planes.
The result pours forth on the screen in the form of a table full of abbreviations. I think I can probably figure out what the abbreviations mean, but it irritates me when the system uses unexplained abbreviations in a field that I am just beginning to study. I ask the system to associate the spelled-out terms with the abbreviations in the table. It does so, in very fine print, and appends a note citing the program write-up that explains the notations. I can barely make out the fine print. Partly to make sure of my reading, and partly to exercise the system (which still has a certain amount of plaything appeal), I touch "IV" on the tree diagram with the stylus, and then hold the stylus a moment on the control spot labeled "magnify." The tree expands around the "IV," enlarging the print, and thereby lets me confirm my uncertain reading of "level one, predicative verb."
My next step is to test a sentence of my own. After that, I ask to see the other programs in the system that are most closely similar in function to the one just examined. The system gives me first a list of the names and abstracts of several syntax programs, and then as I call for them, the write-ups and listings, and it makes each program available for testing. I explore the programs, but not yet very deeply. I wish merely to gain an impression from direct interaction with them and then go back to a mixture of reading and asking questions of the system.
The foregoing is doubtless enough to suggest the nature of the interaction with the fund of knowledge that we think would be desirable. None of the functions involved in the interaction described in the example is very complex or profound. Almost surely the functions can be implemented in software * (* footnote: Computer programs, descriptions of procedure, dictionaries, instructional material, and so forth, as opposed to hardware, which is usually taken to include the processors, memories, display devices, communication equipment, and other such components of the system.) sooner than the hardware required to support them will be available. As the example suggests, we believe that useful information-processing services can be made available to men without the programming of computers to "think" on their own. We believe that much can be accomplished, indeed, without demanding many fundamental insights on the part of the initial designers of the system.
Perhaps we did not rely heavily enough, in the example and in the study, on truly sophisticated contributions from the inanimate components of the system. In respect of that possibility, we adopted a deliberately and admittedly conservative attitude. We expect that computers will be capable of making quite "intelligent" contributions by 1994, to take the date assumed in the example, but we prefer not to count on it. If valuable contributions can be made by "artificial intelligences" of that date, there will be room for them, as well as the men to monitor them, in our basic system schema. On the other hand, if it should turn out that the problems involved in developing significant artificial intelligence are extremely difficult, or that society rejects the whole idea of artificial intelligence as a defiance of God or a threat to man, then it will be good not to have counted on much help from software approaches that are not yet well enough understood to support extrapolation. This conservative attitude seems appropriate for the software area but not for the hardware area.
Our information technology is not yet capable of constructing a significant, practical system of the type we have been discussing. If it were generally agreed, as we think it should be, that such a system is worth striving for, then it would be desirable to have an implementation program. The first part of such a program should not concern itself directly with system development. It should foster advancement of relevant sectors of technology. * (* footnote: Science is also involved, of course, but for the sake of brevity "technology" is used in a very broad sense in this part of the discussion)
Let us assume then -- though without insisting -- that it is in the interest of society to accelerate the advances. What particular things should be done?
One of the first things to do, according to our study, is to break down the barriers that separate the potentially contributory disciplines. Among the disciplines relevant to the development of procognitive systems are (1) the library sciences, including the part of information storage and retrieval associated with the field of documentation, (2) the computer sciences, including both hardware and software aspects and the part of information storage and retrieval associated with computing, (3) the system sciences, which deal with the whole spectrum of problems involved in the design and development of systems, and (4) the behavioral and social sciences, parts of which are somewhat (and should be more) concerned with how people obtain and use information and knowledge. (The foregoing is not, of course, an exhaustive list; it even omits mathematical linguistics and mathematical logic, both of which are fundamental to the analysis and transformation of recorded knowledge.) The barriers that separate the relevant disciplines appear to be strong. There is, of course, some multidisciplinary work, and a little of it is excellent. On the whole, however, the potentially contributory disciplines are not effectively conjoined. One of the most necessary steps toward realization of procognitive systems is to promote positive interaction among them.
A second fundamental step is to determine basic characteristics of the relevance network that interrelates the elements of the fund of knowledge. The information elements of a sentence are interrelated by syntactic structures and semantic Unks. The main syntactic structures are obviously local; they scarcely span the spaces between sentences. Correspondence between syntactic structures is of some help in determining the type and degree of relation between two widely separated segments of text, but the main clues to the relations that interconnect diverse parts of the corpus of recorded information are semantic.
There is, therefore, a need for an effective, formal, analytical semantics. With such a tool, one might hope to construct a network in which every element of the fund of knowledge is connected to every other element to which it is significantly related. Each link might be thought of as carrying a code identifying the nature of the relation. The nature might be analyzed into type and degree. * (* footnote: Here "degree" implies a formalization of the intuitive notion that some relations are direct and immediate (e.g., x is the mother of y) whereas others are indirect and mediate (e.g., x is a member of a club of the same type as a club of which )> is a member). Low degree corresponds to direct and immediate.) Multiple-argument relations would be represented by multiple linkages. We use the term, "relevance network," to stand for this entire concept.
The magnitude of the task of organizing the corpus of recorded information into a coherent body of knowledge depends critically upon the average length of the links of the relevance network. To develop this idea, let us visualize the network as a reticulation of linkages connecting information elements in documents that are arranged spatially in a pattern corresponding to some classification system such as the Dewey Decimal. Now let us determine, for each element i, the number Nij of links of each degree j that connect it to other elements, and determine, at the same time, the total length Lij of all its links of each degree j. The average length of all the links of degree j in the network is
L_j = (\sum_i L_{ij})/(\sum_i N_{ij}) (LaTeX)
If we weight the lengths by an inverse function such as l/j2 of their degrees, we have as an index for the average weighted length of the links:
L_i = \sum_j L_j/j^2 (LaTeX)
In order to determine the foregoing quantities precisely, one would have to carry out much of the task of organizing the body of knowledge, but we are concerned here mainly with the abstract concept, and sampling experiments would, in any event, suffice to make it concrete. If at the outset we could fit the entire corpus into a giant random-access memory, we should not be concerned with the lengths of links. The total number of elements and the total number of links up to some cutoff degree would provide the bases for estimating the magnitude of the task of organizing the body of knowledge. However, as long as we can fit into processible memory only one part of the corpus at a time, it will be critical whether the Hnked elements of the relevance network cluster, and whether the memory will accept a typical cluster. The index L bears on that question. If L turns out to be small, then knowledge does indeed tend to cluster strongly, and part-by-part processing of the corpus will be effective. If L turns out to be large, then far-flung associations are prevalent, and we must await the development of a large memory.
In the foregoing discussion, the index L was based upon "lengths" in a space defined to correspond with a linear classification scheme. Obviously, that assumption, and many other parts of the suggested picture, need to be sharpened. One should not adopt the first paradigm to come to mind, but should explore the implications of various alternative properties and metrics of the relevance space. Moreover, one should regard the lengths of links and the metrics of the space merely as preliminary working conveniences, for all the lengths within a part of the corpus become equal when that part is loaded into a random-access memory, and the distance of that part from the other parts may, for practical purposes, become infinite. It is of paramount importance not to think of relevance as a vague, unanalyzed relation, but rather to try to distinguish among definite types and degrees of relevance. With such development, the concept of relevance networks might progress from its present unelaborated form to a systematic, analytic paradigm for organization of the body of knowledge.
The most necessary hardware development appears to be in the area of memory, which we have already discussed. Procognitive systems will pose requirements for very large memories and for advanced memory organizations. Unless an unexpected breakthrough reconciles fast random access with very large capacity, there will be a need for memories that effect various compromises between those desiderata. They will comprise the echelons of the memory hierarchy we have mentioned. It will be necessary to develop techniques for transferring information on demand, and in anticipation of demand, from the slow, more voluminous levels of the hierarchy to the faster, more readily processible levels.
Insofar as memory media are concerned, current research and development present many possibilities. The most immediate prospects advanced for primary memories are thin magnetic films, coated wires, and cryogenic films. For the next echelons, there are magnetic disks and photographic films and plates. Farther distant are thermoplastics and photosensitive crystals. Still farther away -- almost wholly speculative -- are protein molecules and other quasi-living structures. All these possibilities will be explored by industry without special prodding, but it may in some instances be difficult for industry, unassisted, to move from demonstrations of feasibility in the laboratory into efficient production.
Associative, or content-addressable, memories are beginning to make their appearance in the computer technology. The first generation is, of course, too small and too expensive for applications of the kind we are interested in here, but the basic schema seems highly relevant. One version of the schema has three kinds of registers: a mask register, a comparison register, and many memory registers. All the registers have the same capacity except that each memory register has a special marker cell not found in the mask and comparison registers. The contents of the mask register are set to designate the part of the comparison and memory registers upon which attention is to be focused. The comparison and memory registers contain patterns. Suppose that "golf" falls within the part of the comparison register designated as active by the mask. When the "compare" instruction is given, the marker is set to 1 in the marker cell of every memory register that contains "golf" in the part designated by the mask, and the marker is set to in the marker cell of every other memory register. This is done almost simultaneously in all the memory registers in one cycle of processing. The ordinary, time-consuming procedure of searching for matching patterns is thus short-circuited.
Our earlier discussion of retrieval with the aid of descriptors and thesauri suggested that searching for matching patterns is likely to be a prevalent operation in procognitive systems. Associative memories are therefore likely to be very useful. However, the simple schema just described is not capable of handling directly the highly complex and derivative associations (e.g., A associated Dwith through B and C if E equals F) that will be encountered. It seems desirable, therefore, to explore more advanced associative schemata. These should be studied first through simulation on existing computers. Only when the relative merits of various associative-memory organizations are understood in relation to various information-handling problems, we believe, should actual hardware memories be constructed.
In the body of knowledge, relations of high order appear to prevail over simple associations between paired elements. That consideration suggests that we should not content ourselves with simple associative memories, but should press forward in an attempt to understand and devise high-order relational memories.
Memory, of course, is only part of the picture. With each development in memory structure must come a development in processors. For example, now that "list processing" has been employed for several years, computers are appearing on the market with instruction codes that implement directly most of the manipulations of list structures that were formerly handled indirectly through programming. It will be desirable eventually to have special instructions for manipulating "relational nets" or whatever information structures prove most useful in representing and organizing the fund of knowledge.
Some of the projected devices that promise to facilitate interaction between men and the body of knowledge were described on pp. 45-46. Most of the capabilities that were assumed in the example can be demonstrated now, but only crudely, and one feature at a time. It will require major research and engineering efforts to implement the several functions with the required degrees of convenience, legibility, reliabiUty, and economy. Industry has not devoted as much effort to development of devices and techniques for on-line man-computer interaction as it has to development of other classes of computer hardware and software. It seems likely, indeed, that industry will require more special prodding and support in the displaycontrol area than in the other relevant areas of computer technology.
The design of special-purpose languages is advancing rapidly, but it has a long way to go. There are now sev- eral procedure-oriented languages for the preparation of computer programs (1) to solve scientific problems, (2) to process business data, and (3) to handle military information. Examples are: (1) ALGOL, FORTRAN, MAD, MADTRAN, SMALGOL, BALGOL, and DECAL; (2) COBOL and FACT; and (3) JOVIAL and NELIAC. In addition, there are languages oriented toward (4) exploitation of list processing, (5) simulation techniques, and (6) data bases. Examples are: (4) IPL-V, LISP, KLS, and SLIP; (5) SIMSCRIPT, SIMPAC, CLS, MILITRAN, SOL, SIMULA, and GPSS; and (6) ADAM, COLINGO, and LUCID. Finally, there are languages oriented toward the problems of particular fields of research and engineering, for example, COGOSTRESS and (for civil engineering), and Sketchpad and APT (for mechanical design).
It will be absolutely necessary, if an effective procognitive system is ever to be achieved, to have excellent languages with which to control processing and application of the body of knowledge. There must be at least one (and preferably there should be only one) general, procedure-oriented language for use by specialists. There must be a large number of convenient, compatible field-oriented languages for the substantive users. From the present point of view, it seems best not to have an independent language for each one of the various processing techniques and memory structures that will be employed in the system, but to embed all such languages within the procedure-oriented and field-oriented languages -- as SLIP (for list processing) is embedded within FORTRAN (Weizenbaum, 1963).
To what extent should the language employed in the organization, direction, and use of procognitive systems resemble natural languages such as English? That question requires much study. If the answer should be, "Very closely," the implementation will require much research. Indeed, much research on computer processing of natural language will be required in any event, for the text of the existing corpus is largely in the form of natural language, and the body of knowledge will almost surely have to be converted into some more compact form in the interests of economy of storage, convenience of organization, and effectiveness of retrieval.
In the organization of the corpus, moreover, it will surely be desirable to be able to translate from one natural language to another. Research and development in machine translation is, therefore, relevant to our interests. At present, students of machine translation seem to be at the point of realizing that syntactic analysis and large bihngual dictionaries are not enough, that developments in the field of semantics must be accomplished before smooth and accurate translations can be achieved by machine. Thus machine translation faces the same problem we face in the attempt, upon which we have touched several times, "to organize information into knowledge."
There appear to be two promising approaches to the rationalization of semantics. The first, which we have already mentioned briefly, involves formalization of semantic relations. The second, not yet mentioned, involves (1) the amassing of vast stores of detailed information about objects, people, situations, and the use of words, and (2) the development of heuristic methods of bringing the information to bear on the interpretation of text. As we see it now, researches along both these approaches should be fostered. The first is more likely to lead to compact representations and economic systems. Perhaps, however, only the second will prove capable of handling the "softer" half of the existing corpus.
The central role, in procognitive systems, of multiple access to large computers was emphasized in an earlier section. It seems vitally important to press on with the development of multiple-console computer systems, particularly in organizations in which creative potential users abound. As soon as it is feasible, moreover, multiple-console computer systems should be brought into contact with libraries. Perhaps they should be connected first to the card catalogues. Then they should be used in the development of descriptor-based retrieval systems. Almost certainly, the most promising way to develop procognitive systems is to foster their evolution from multiple-console computer systems -- to arrange things in such a way that much of the conceptual and software development will be carried out by substantive users of the systems.