作者: Louis Rosenfeld / Peter Morville / Jorge Arango
出版社: O'Reilly Media
副标题: For the Web and Beyond
出版年: 2015-10-11
页数: 486
定价: USD 44.99
装帧: Paperback
ISBN: 9781491911686
- I.Introducing Information Architecture
- II.Basic Principles of Information Architecture
- 5.The Anatomy of an Information Architecture
- 6.Organization Systems
- 7.Labeling Systems
- 8.Navigation Systems
- 9.Search Systems
- 10.Thesauri, Controlled Vocabularies, and Metadata
- III.Getting Information Architecture Done
Information architecture
(IA) is a design discipline that is focused on making information findable and understandable.
- Historically, information has shown a tendency to dematerialize, going from having a one-to-one relationship with its containers to being completely detached from its containers (as is the case with our digital information).
- This has had two important effects in our time: information is more abundant than ever before, and we have more ways of interacting with it than ever before.
- Information architecture is focused on making information findable and understandable. Because of this, it is uniquely well suited to address these issues.
- It does this by asking the designer to think about problems through two important perspectives: that our products and services are perceived as places made of information, and that they function as ecosystems that can be designed for maximum effectiveness.
- That said, information architecture doesn’t operate solely at the level of abstractions: for it to be effective, it needs to be defined at various levels.
Let’s start by clarifying what we mean by information architecture:
- The structural design of shared information environments
- The synthesis of organization, labeling, search, and navigation systems within digital, physical, and cross-channel ecosystems
- The art and science of shaping information products and experiences to support usability, findability, and understanding
- An emerging discipline and community of practice focused on bringing principles of design and architecture to the digital landscape
Figure 2-6. The infamous three circles of information architecture
Figure 3-1. The “too-simple” model of information needs
Or, expressed as a simple algorithm:
- User asks a question.
- Something happens (i.e., searching or browsing).
- User receives the answer.
- Fin.
When you’re hoping to make the perfect catch, you usually know what you’re looking for, what to call it, and where you’ll find it—this is called known-item
seeking. An example is when you search the staff directory to find a colleague’s phone number.
When you’re hoping to find a few useful items in your traps, you’re doing something called exploratory seeking
. In this case, you’re not exactly sure what you’re looking for. In fact, whether you realize it or not, you’re looking to learn something from the process of searching and browsing. For example, a user may go to his employer’s human resources site to learn something about retirement plans that the company offers.
When you want everything, you’re performing exhaustive research
. You’re looking for everything available on a particular topic, hoping to leave no stone unturned. In this case, the user often has many ways to express what she’s looking for, and may have the patience to construct her search using all those varied terms. For example, someone who is trying to learn more about a friend’s medical condition might execute multiple searches for “AIDS,” “HIV,” “acquired immuno-deficiency syndrome,” and so forth.
Finally, our failing memories and busy schedules continually force us to engage in refinding pieces
of useful information that we’ve happened upon before. For example, while you’re at work, you might surf for a few minutes and stumble on a great but long explanation of Django Reinhardt’s guitar technique. Naturally, you won’t read it now and risk losing your job. You’ll refind it later instead, or use a “read later” service such as Instapaper to return to it at a more convenient time.
Figure 3-2 illustrates these four different types of information needs.
Figure 3-2. Four common information needs
Searching
, browsing
, and asking
are all methods for finding, and these are the basic building blocks of information-seeking behavior.
There are two other major aspects to seeking behaviors: integration
and iteration
. We often integrate searching, browsing, and asking in the same finding session. Figure 3-3 shows how you might search your corporate intranet for guidelines on traveling abroad.
Figure 3-3. Integrated browsing, searching, and asking over many iterations
In this model (shown in Figure 3-4), users start with an information need, formulate an information request (a query), and then move iteratively through an information system along potentially complex paths, picking bits of information (“berries”) along the way. In the process, they modify their information requests as they learn more about what they need and what information is available from the system.
Figure 3-4. The “berry-picking”
model of how users move through an information system
Another useful model is the “pearl-growing”
approach. Users start with one or a few good documents that are exactly what they need. They want to get “more like this one.”
Corporate websites and intranets often utilize a “two-step”
model. Confronted with a site consisting of links to perhaps hundreds of departmental subsites, users first need to know where to look for the information they need. They might search or browse through a directory until they find a good candidate or two, and then perform the second step: looking for information within those subsites.
- IA starts with people and the reason they use your product or service: they have an information need.
- There are different models of what happens when people look for information.
- The most simple of these is problematic, because it doesn’t accurately represent what actually happens when people have an information need.
- Information needs are like fishing: sometimes people know exactly what they’re looking for, but often they’re casting a wider net.
- People act on these information needs through various information-seeking behaviors.
- There are various research methods that allow us to learn about these behaviors.
When we talk about digital media, we use metaphors that betray a sense of place: we “go” online, “visit” a website, “browse” Amazon.com. Increasingly, these environments are also taking over many of the functions we’ve traditionally associated with physical places: we meet with our friends in WhatsApp, pay our bills in our bank’s website, learn in Khan Academy. As with physical places, we experience them as contexts that differ from one another, supporting different needs.
Figure 4-2. Banks and hospitals serve different information needs; their website navigation structures highlight the differences between them, and you understand the information they present in the context of the roles and functions these organizations serve in society
Figure 4-3. The content-centric “How to Cook Everything” iPad app feels more like a recipe book than like a place
Building architecture aims to produce physical environments that can serve and communicate their social functions
effectively, and information architecture aims to do the same for information environments. The main difference is that instead of defining compositions of forms, spaces, and objects such as walls, roofs, and furniture, information architecture defines compositions of semantic elements such as navigation labels, section headings, and keywords, and produces the design principles, goals, and guidelines that capture the intended feeling of the place (e.g., is this a serious, solitary place, or a fun, social space?).
- The structure of information environments influences more than how we find stuff: it also changes how we understand it.
- We experience information environments as places where we go to transact, learn, and connect with other people, among many other activities.
- When designing information environments, we can learn from the design of physical environments.
- Some organizing principles that carry over to information environments from physical environments include
structure
andorder
,rhythm
,typologies
, andmodularity
andextensibility
.
Organization systems
present the site’s information to us in a variety of ways, such as content categories that pertain to the entire campus (e.g., the top bar and its “Academics” and “Admission” choices), or to specific audiences (the block on the middle left, with such choices as “Future Students” and “Staff”).Navigation systems
help users move through the content, such as with the custom organization of the individual drop-down menus in the main navigation bar.Search systems
allow users to search the content; when the user starts typing in the site’s search bar, a list of suggestions is shown with possible matches for the user’s search term.Labeling systems
describe categories, options, and links in language that (hopefully) is meaningful to users; you’ll see examples throughout the page (e.g., “Admission,” “Alumni,” “Events”).
We refer to this as top-down information
architecture (Figure 5-3), and the Gustavus main page addresses many common “top-down” questions that users have when they land on a site, including:
- Where am I? (1)
- I know what I’m looking for; how do I search for it? (2)
- How do I get around this site? (3)
- What’s important and unique about this organization? (4)
- What’s available on this site? (5)
- What’s happening there? (6)
- How do I engage with them via various other popular digital channels? (7)
- How can I contact a human? (8)
- What’s their address? (9)
- How can I access my account? (10)
Figure 5-3. The Gustavus site’s main page is crammed with answers to users’ questions
This is bottom-up information architecture
; content structure, sequencing, and tagging help you answer such questions as:
- Where am I?
- What’s here?
- Where can I go from here?
Figure 5-5 shows a slightly different example of a bottom-up information architecture: images stored in one of this book’s authors’ iCloud account, as displayed in the iOS Photos app.
Figure 5-5. Image collections in the iOS Photos app
It provides context for the content, and tells us what we can do while we’re here:
- The information architecture tells us where we are (in the Photos app, looking at “Collections,” which are defined as ranges of dates in a particular geographic region).
- It helps us move to other closely related views (e.g., by switching to “Albums,” collections of photos we’ve defined).
- It helps us move through the information hierarchically (e.g., we can choose to view collections of images grouped by the year they were saved, instead of by more granular ranges of dates and locations) and contextually (e.g., by clicking on the city in which they were shot, we can see them arranged spatially over a map).
- It allows us to search the content based on various criteria, such as different time periods and locations.
- It allows us to share the content with others.
Figure 5-6. BBC search results include three “Editor’s Choice” links
What’s different is that the “Editor’s Choice” results are manually created: some people at the BBC decided that “ukraine” is an important term and that some of the BBC’s best content is not news stories, which normally come up at the top of most retrieval sets. So they applied some editorial expertise to identify three highly relevant pages and associated them with the term “ukraine,” thereby ensuring that these three items are displayed when someone searches for “ukraine.” Users might assume these search results are automatically generated, but humans are manually modifying the information architecture in the background; this is another example of invisible information architecture
.
Organization systems
:How we categorize information (e.g., by subject or chronology); see Chapter 6Labeling systems
:How we represent information—for example, using scientific terminology (“Acer”) or lay terminology (“maple”); see Chapter 7Navigation systems
:How we browse or move through information (e.g., clicking through a hierarchy); see Chapter 8Searching systems
:How we search information (e.g., executing a search query against an index); see Chapter 9
Organization systems
:Also known as taxonomies and hierarchies, these are the main way of categorizing or grouping content (e.g., by topic, by task, by audiences, or by chronology); user-generated tags are also a form of organization systemGeneral navigation systems
:Primary navigation systems that help users understand where they are and where they can go within an information environmentLocal navigation systems
:Primary navigation systems that help users understand where they are and where they can go within a portion of an information environment (e.g., a subsite)Sitemaps/tables of contents
:Navigation systems that supplement primary navigation systems; provide a condensed overview of and links to major content areas within the environment, usually in outline formIndices
:Supplementary navigation systems that provide an alphabetized list of links to the contents of the environmentGuides
:Supplementary navigation systems that provide specialized information on specific topics, as well as links to related subsets of contentWalkthroughs and wizards
:Supplementary navigation systems that lead users through sequential sets of steps; may also link to related subsets of contentContextual navigation systems
:Consistently presented links to related content; often embedded in text and generally used to connect highly specialized content within an information environment
Search interface
:The means of entering and revising a search query, typically with information on how to improve your query, as well as other ways to configure your search (e.g., selecting from specific search zones)Query language
:The grammar of a search query; query languages might include Boolean operators (e.g., AND, OR, NOT), proximity operators (e.g., ADJACENT, NEAR), or ways of specifying which field to search (e.g., AUTHOR=“Shakespeare”)Query builders
:Ways of enhancing a query’s performance; common examples include spell checkers, stemming, concept searching, and drawing in synonyms from a thesaurusRetrieval algorithms
:The part of a search engine that determines which content matches a user’s query; Google’s PageRank is perhaps the best-known exampleSearch zones
:Subsets of site content that have been separately indexed to support narrower searching (e.g., searching the tech support area within a software vendor’s site)Search results
:Presentation of content that matches the user’s search query; involves decisions about what types of content should make up each individual result, how many results to display, and how sets of results should be ranked, sorted, and clustered
Headings
:Labels for the content that follows themEmbedded links
:Links within text; these label (i.e., represent) the content they link toEmbedded metadata
:Information that can be used as metadata but must first be extracted (e.g., in a recipe, if an ingredient is mentioned, this information can be indexed to support searching by ingredient)Chunks
:Logical units of content; these can vary in granularity (e.g., sections and chapters are both chunks) and can be nested (e.g., a section is part of a book)Lists
:Groups of chunks or links to chunks; these are important because they’ve been grouped together (e.g., they share some trait in common) and have been presented in a particular order (e.g., chronologically)Sequential aids
:Clues that suggest where the user is in a process or task, and how far he has to go before completing it (e.g., “step 3 of 8”)Identifiers
:Clues that suggest where the user is in an information system (e.g., a logo specifying what site she is using, or a breadcrumb explaining where she is)
Controlled vocabularies and thesauri
:Predetermined vocabularies of preferred terms that describe a specific domain (e.g., auto racing or orthopedic surgery); typically include variant terms (e.g., “brewski” is a variant term for “beer”). Thesauri are controlled vocabularies that generally include links to broader and narrower terms, related terms, and descriptions of preferred terms (aka “scope notes”). Search systems can enhance queries by extracting a query’s synonyms from a controlled vocabulary.Retrieval algorithms
:Used to rank search results by relevance; retrieval algorithms reflect their programmers’ judgments on how to determine relevance.Best bets
:Preferred search results that are manually coupled with a search query; editors and subject matter experts determine which queries should retrieve best bets and which documents merit best bet status.
- You’ll probably need to explain information architecture to others, so it’s important that you help them visualize it.
- You can visualize information architecture from the top down, or from the bottom up.
- There are various ways of categorizing IA components, but here we’ll be looking at four categories: organization systems, labeling systems, navigation systems, and searching systems.
Classification systems are made of language, and language is ambiguous
: words are capable of being understood in more than one way.
Heterogeneity
refers to an object or collection of objects composed of unrelated or unlike parts.
An old-fashioned library card catalog
is relatively homogeneous. It organizes and provides access to books. It does not provide access to chapters in books or collections of books. It may not provide access to magazines or videos. This homogeneity allows for a structured classification system. Each book has a record in the catalog. Each record contains the same fields: author, title, and subject.
Most digital information environments, on the other hand, are highly heterogeneous in many respects. For example, websites often provide access to documents and their components at varying levels of granularity
. A site might present articles and journals and journal databases side by side. Links might lead to pages, sections of pages, or other websites. And websites typically provide access to documents in multiple formats.
The heterogeneous nature of information environments makes it difficult to impose any single structured organization system on the content.
The fact is that labeling and organization systems are intensely affected by their creators’ perspectives. We see this at the corporate level with websites organized according to internal divisions or org charts, with groupings such as marketing, sales, customer support, human resources, and information systems. How does a customer visiting this website know where to go for technical information about a product she just purchased? To design usable organization systems, we need to escape from our own mental models of content labeling and organization.
As a designer, you must be sensitive to your organization’s political environment. In certain cases, you must remind your colleagues to focus on creating an architecture that works for the users. In others, you may need to make compromises to avoid serious political conflict. Politics raise the complexity and difficulty of creating usable information architectures.
Organization systems are composed of organization schemes
and organization structures
.
- An organization scheme defines the shared characteristics of content items and influences the logical grouping of those items.
- An organization structure defines the types of relationships between content items and groups. Both organization schemes and structures have an important impact on the ways information is found and understood.
Organization is closely related to navigation
, labeling
, and indexing
.
- The organization structures of information environments often play the part of the primary navigation system.
- The labels of categories play a significant role in defining the contents of those categories.
- Manual indexing or metadata tagging is ultimately a tool for organizing content items into groups at a very detailed level.
For example, country names are usually listed in alphabetical order. If you know the name of the country you are looking for, navigating the scheme is easy. “Chile” is in the Cs, which are after the Bs but before the Ds. This is called known-item searching
.
Most address book applications organize contacts alphabetically by last name, as shown in Figure 6-2.
Figure 6-2. The OS X Contacts application (image: https://www.apple.com/osx/apps/#contacts)
Press release archives are obvious candidates for chronological organization schemes (see Figure 6-3). The date of announcement provides important context for the release. However, keep in mind that users may also want to browse the releases by title, product category, or geography, or to search by keyword. A complementary combination of organization schemes is often necessary.
Figure 6-3. Press releases in reverse chronological order
Figure 6-4 shows an example of a geographical organization scheme from Craigslist. The user can select her nearest local directory. If her browser supports geolocation, the site navigates directly to it.
Figure 6-4. A geographical organization scheme with geolocation
There’s a simple reason why people find ambiguous organization schemes so useful: we don’t always know what we’re looking for. As we mentioned in Chapter 3, information seeking is often iterative
and interactive
.
While few information environments are organized solely by topic, most should provide some sort of topical access to content. In designing a topical organization scheme, it is important to define the breadth of coverage.
Figure 6-5. A topical taxonomy showing categories and subcategories
Task-oriented schemes organize content and applications into collections of processes, functions, or tasks.
Figure 6-6. Like many apps, Microsoft Word on iOS features a task-oriented organization scheme
Task-oriented schemes are usually embedded within specific subsites or integrated into hybrid task/topic navigation systems, as we see in Figure 6-7.
Figure 6-7. Task, topic, and audience coexist on the Smithsonian home page
Audience-oriented schemes break a site into smaller, audience-specific mini-sites, thereby allowing for clutter-free pages that present only the options of interest to that particular audience. CERN, shown in Figure 6-8, presents an audience-oriented organization scheme that invites users to self-identify.
Figure 6-8. CERN invites users to self-identify
You need not look further than your desktop computer with its folders, files, and trash can or recycle bin for an example.
Figure 6-10. A hybrid organization scheme
In cases where multiple schemes must be presented on one page, you should communicate to designers the importance of preserving the integrity of each scheme. As long as the schemes are presented separately on the page, they will retain the powerful ability to suggest a mental model for users. For example, a look at the main menu in the Stanford University website in Figure 6-11 reveals a topical scheme, an audience-oriented scheme, and a search function. By presenting them separately, Stanford provides flexibility without causing confusion.
Figure 6-11. Stanford provides multiple organization schemes
- You should be aware of, but not bound by, the idea that hierarchical categories should be mutually exclusive.Within a single organization scheme, you will need to balance the tension between exclusivity and inclusivity. Hierarchies that allow cross-listing are known as
polyhierarchical
. Ambiguous organization schemes in particular make it challenging to divide content into mutually exclusive categories. - It is important to consider the balance between breadth and depth in your hierarchy.
Breadth
refers to the number of options at each level of the hierarchy.Depth
refers to the number of levels in the hierarchy.
They consisted of rolls of physical cards, with each card representing an individual contact: a record
in the system. Each record contains several fields
, such as name, address, and telephone number. Each field may contain data specific to that contact. The collection of records is a database
.
Figure 6-15. The printed card Rolodex is a simple database
Most of the heavy-duty databases we use are built upon the relational database model
. In relational database structures, data is stored within a set of relations or tables
. Rows
in the tables represent records
, and columns
represent fields
.
Figure 6-16. A relational database schema (image: http://bit.ly/relational_model).
So why are database structures important to information architects? In a word, metadata
.
For example, the entity relationship diagram
(ERD
) in Figure 6-17 illustrates a structured approach to defining a metadata schema
. Each entity
(e.g., Resource) has attributes
(e.g., Name, URL). These entities and attributes become records and fields.
Figure 6-17. An entity relationship diagram showing a structured approach to defining a metadata schema (courtesy of Peter Wyngaard of Interconnect of Ann Arbor)
A hypertext
system involves two primary types of components: the items or chunks
of information that will be linked, and the links
between those chunks.
Figure 6-18. A network of hypertextual connections
Free tagging
, also known as collaborative categorization, mob indexing, and ethnoclassification, is a simple yet powerful tool.
Figure 6-19. The “Discover” and “Trending” features in Twitter, which allow you to discover new and potentially interesting content, are driven by user-generated tags
Figure 6-20. LinkedIn allows you to “endorse” your contacts as having certain professional skills, from a set of predefined tags
- Our understanding of the world is informed by how we classify things.
- Classifying things is not easy; we have to deal with ambiguity, heterogeneity, differences in perspective, and internal politics, among other challenges.
- We can organize things using exact organization schemes or ambiguous organization schemes.
- Exact organization schemes include alphabetical, chronological, and geographical groupings.
- Ambiguous organization schemes include topical, task-based, audience-based, metaphorical, and hybrid groupings.
- The structure of organization schemes also plays an important role in the design of information environments.
- Social classification has emerged as an important tool for organizing information in shared digital environments.
Contextual links
:Hyperlinks to chunks of information on other pages or to other locations on the same pageHeadings
:Labels that simply describe the content that follows them, just as print headings doNavigation system choices
:Labels representing the options in navigation systemsIndex terms
:Keywords, tags, and subject headings that represent content for searching or browsing
Labels describe the hypertext links within the body of a document or chunk of information, and naturally occur within the descriptive context of their surrounding text.
Because GOV.UK (Figure 7-4) is a site dedicated to providing information to the entire population of the UK, contextual links need to be straightforward and meaningful. GOV.UK’s contextual link labels, such as “Benefits,” “Money and tax,” and “Disabled people,” are representational, and draw on surrounding text and headings to make it clear what type of help you’ll receive if you click through. These highly representational labels are made even clearer by their context: explanatory text, clear headings, and a site that itself has a few straightforward uses.
Figure 7-4. The contextual links on the GOV.UK home page are straightforward and meaningful
On the other hand, contextual links on a blog aren’t necessarily so clear. The author is among friends and can assume that her regular readers possess a certain level of background (or, really, contextual) knowledge.
In Figure 7-5, the author expects us to know who “Dr. Drang” is—perhaps s/he’s been mentioned in this blog before. Or the author knows that we’ll recognize the label “Dr. Drang” as a person, and provides some mysterious context—“Your favorite snowman and mine”—to entice the user to click through.
Figure 7-5. These contextual links aren’t very representational, but that’s acceptable when there is a high degree of trust in the author
Headings
, as shown in Figure 7-6, are often used to establish a hierarchy within content.
Figure 7-6. Layout, typographic treatment, and whitespace help the reader distinguish labels and hierarchy in the Windows Store
To successfully navigate a process, it’s typically necessary for users to complete each step along the way, so heading labels have to be obvious and must also convey sequence. Figure 7-8 shows a page in the process to sign up to become a Google Play Developer, which clearly describes the actions required in each step.
Figure 7-8. Clear sequential labeling in the Google Play Developer signup process
There are no standards, but some common variants exist for many navigation system labels. You should consider selecting one from each of these categories and applying it consistently, as these labels are already familiar to most web users. Here is a nonexhaustive list:
- Main, Main Page, Home
- Search, Find, Browse, Search/Browse
- Site Map, Contents, Table of Contents, Index
- Contact, Contact Us
- Help, FAQ, Frequently Asked Questions
- News, News & Events, News & Announcements, Announcements
- About, About Us, About <company name>, Who We Are
The index of the SFGate website shown in Figure 7-10 is generated from index term labels, which in turn are used to identify content from many different sections of the site.
Figure 7-10. The SFGate site index
For example, a furniture manufacturer’s website might list the following index terms in the <meta> tags of records for its upholstered items:
<meta name="keywords" CONTENT="upholstery, upholstered, sofa, couch, loveseat, love seat, sectional, armchair, arm chair, easy chair, chaise lounge">
A search on “sofa” would then retrieve the page with these index terms even if the term “sofa” doesn’t appear anywhere in the page’s text.
Even so, iconic labels are still a risky proposition in terms of whether or not they can represent meaning. Figure 7-12 shows navigation tiles on the Microsoft Band fitness tracker.
Figure 7-12. Icons from the Microsoft Band’s navigation system (image: https://www.microsoft.com/microsoft-band/en-us)
(They are, respectively: Mail, Run, Calendar, Exercise, Sleep, Messaging, and Finance.)
What can we do to make sure our labels are less ambiguous and more representational? The following two guidelines may help.
Labeling is easier if your content, users, and context are kept simple and focused.
Consistency is affected by many issues:
- Style:Haphazard usage of punctuation and case is a common problem within labeling systems, and can be addressed, if not eliminated, by using style guides. Consider hiring a proofreader and purchasing a copy of Strunk & White.
- Presentation:Similarly, consistent application of fonts, font sizes, colors, whitespace, and grouping can help visually reinforce the systematic nature of a group of labels.
- Syntax:It’s not uncommon to find verb-based labels (e.g., “Grooming Your Dog”), noun-based labels (e.g., “Diets for Dogs”), and question-based labels (e.g., “How Do You Paper Train Your Dog?”) all mixed together. Within a specific labeling system, consider choosing a single syntactical approach and sticking with it.
- Granularity:Within a labeling system, it can be helpful to present labels that are roughly equal in their specificity. Exceptions (such as site indexes) aside, it’s confusing to encounter a set of labels that cover differing levels of granularity—for example, “Chinese restaurants,” “Restaurants,” “Taquerias,” “Fast Food Franchises,” “Burger Kings.”
- Comprehensiveness:People can be tripped up by noticeable gaps in a labeling system. For example, if a clothing retailer’s website lists “trousers,” “ties,” and “shoes,” while somehow omitting “shirts,” we may feel like something’s wrong. Do they really not carry shirts? Or did they make a mistake? Aside from improving consistency, a comprehensive scope also helps people do a better job of quickly scanning and inferring the environment’s content.
- Audience:Mixing terms like “lymphoma” and “tummy ache” in a single labeling system can also throw people off, even if only temporarily. Consider the languages of your environment’s major audiences. If each audience uses a very different terminology, you may have to develop a separate labeling system for each audience, even if these systems are describing exactly the same content.
Arranging labels in a table provides a more condensed, complete, and accurate view of navigation labels as a system
.
Figure 7-13 shows labeling systems from United, Delta, Virgin America, and American Airlines, all competing in the airline business. Just a glance shows how much variation there is in terms of the number of labels (from five to as many as nine). Some use the “My...” approach, and some use brand-specific labels (e.g., “AAdvantage”). Task-based labels (e.g., “Book a trip”) are less common than one would expect, as is the use of a “Home” or “Main” option.
Figure 7-13. Labeling systems from United, Delta, Virgin America, and American Airlines
A good example of a specific controlled vocabulary is the Educational Resources Information Center (ERIC) Thesaurus. This thesaurus was designed, as you’d guess, to describe the domain of education. An entry in the ERIC Thesaurus for “scholarship” is shown in Figure 7-14.
Figure 7-14. Controlled vocabularies and thesauri are rich sources of labels
Try these excellent resources as you hunt for sources of labels:
- We label things all the time.
- Labeling is the most obvious way to show our organization schemes across multiple systems and contexts.
- We must try to design labels that speak the same language as our environment’s users, while also reflecting its content.
- Textual labels are the most common type we encounter in our work; they include contextual links, headings, navigation system options, and index terms.
- Iconic labels are less common, but the widespread adoption of devices with less screen real estate means that they are an important component of many information environments.
- Designing labels is one of the most difficult aspects of information architecture.
- That said, there are various sources of inspiration—such as your existing information environment and search log analysis—that can help inform your labeling choices.
First, we have the global
, local
, and contextual
navigation systems that are integrated within site pages
or app screens
.
Figure 8-1. Global, local, and contextual embedded navigation systems
Second, we have supplemental navigation systems
such as sitemaps
, indexes
, and guides
that exist outside the content-bearing pages. These are shown in Figure 8-2.
Figure 8-2. Supplemental navigation systems
The design of navigation systems takes us deep into the gray area between information architecture
, interaction design
, information design
, visual design
, and usability engineering
, all of which we might loosely classify under the umbrella of user experience design
.
For example, most Mac OS X applications feature a menu bar with a standard organization scheme that includes the application name as the first menu item, and “File” and “Edit” menus as the second and third items, respectively (Figure 8-3).
Figure 8-3. Most applications in Mac OS X feature a menu bar with a standard organization scheme
The navigation system should also present as much as possible of the structure of the information hierarchy in a clear and consistent manner, and indicate the user’s current location.
If you have an existing website, we suggest running a few users through a navigation stress test.
- Ignore the home page and jump directly into the middle of the site.
- For each random page, can you figure out where you are in relation to the rest of the site? What major section are you in? What is the parent page?
- Can you tell where the page will lead you next? Are the links descriptive enough to give you a clue what each is about? Are the links different enough to help you choose one over another, depending on what you want to do?
In Gopherspace, you were forced to move up and down the tree structures of content hierarchies (see Figure 8-5). It was impractical to encourage or even allow jumps across branches (lateral navigation) or between multiple levels (vertical navigation) of a hierarchy.
Figure 8-5. The pure hierarchy of Gopherspace
If the system is so enabled, users can get to anywhere from anywhere. However, as you can see in Figure 8-6, things can get confusing pretty quickly. It begins to look like an architecture designed by M.C. Escher.
Figure 8-6. A hypertextual web can completely bypass the hierarchy
Global navigation bars come in all shapes and sizes. Consider the examples shown in Figure 8-7.
Figure 8-7. Global navigation bars from Dell, Apple, and Acer
Most global navigation bars provide a link to the home page, usually represented as the organization’s logo.
Mega-menus
are like traditional drop-down menus: usually rendered at the top of a page, they provide access to second- and third-level elements when the user clicks on a first-level element. However, mega-menus are much richer than the simple lists of links of yesteryear; they often feature sophisticated typographic layouts, images, and other cues to give the user insight into the content and structure of the system (Figure 8-8).
Figure 8-8. Adidas’s mega-menus give insights into the content and structure of the site
Fat footers
are abridged sitemaps rendered at the bottom of web pages. They provide direct access to the most important sections of the site (Figure 8-9).
Figure 8-9. Microsoft.com is a large site, with multiple subsites and sub-brands; a fat footer on many of the site’s pages gives users a consistent way to get around
Some tightly controlled sites integrate global and local navigation into a consistent, unified system.
Figure 8-10. Local navigation at usatoday.com
In contrast, large sites like GE.com (Figure 8-11) often provide multiple local navigation systems that may have little in common with one another or with the global navigation system.
Figure 8-11. Local navigation at GE.com
Some relationships don’t fit neatly into the structured categories of global and local navigation. This demands the creation of contextual
navigation links specific to a particular page, document, or object. In online stores, these “see also” links can point users to related products and services. On an educational site, they might point to similar articles or related topics.
As you can see in Figure 8-13, Adorama includes contextual navigation links to related products—in this case based on user views—in the layout of each page.
Figure 8-13. External contextual navigation links
In practice, this usually involves representing words or phrases within sentences or paragraphs (i.e., prose) as embedded or “inline” hypertext links. A page from Stanford University’s site, shown in Figure 8-12, provides an example of carefully chosen inline
contextual navigation links.
Figure 8-12. Inline contextual navigation links
Figure 8-14. Navigation can drown out the content
Supplemental navigation systems (shown back in Figure 8-2) include sitemaps, indexes, and guides. These are external to the basic hierarchy of a website and provide complementary ways of finding content and completing tasks.
A typical sitemap (Figure 8-16) presents the top few levels of the information hierarchy. It provides a broad view of the content in the system and facilitates random access to segmented portions of that content via graphical or text-based links.
Figure 8-16. Apple’s sitemap
The design of a sitemap significantly affects its usability. When working with a graphic designer, make sure she understands the following rules of thumb:
- Reinforce the information hierarchy so the user becomes increasingly familiar with how the content is organized.
- Facilitate fast, direct access to the contents of the site for those users who know what they want.
- Avoid overwhelming the user with too much information. The goal is to help, not scare, the user.
In Figure 8-17, The United Nations website presents a comprehensive alphabetical index. Handcrafted links within the index lead directly to destination pages.
Figure 8-17. The UN’s comprehensive alphabetical site index
Comcast’s XFINITY website presents a simple site index alongside a sitemap that mirrors the site’s navigation structure (Figure 8-18).
Figure 8-18. Comcast’s XFINITY site index
Another example is shown in Figure 8-19, where the Centers for Disease Control and Prevention two-step site index features term rotation and see/see-also references.
Figure 8-19. The Center for Disease Control and Prevention’s site index
Yet another interesting example is Michigan State University’s site index, shown in Figure 8-20, which takes hundreds of the site’s best bet results and renders them as an alphabetical list.
Figure 8-20. Michigan State University’s site index
A useful trick in designing an index involves term rotation
, also known as permutation. A permuted index rotates the words in a phrase so that users can find the phrase in two places in the alphabetical sequence. For example, in the CDC index, users will find listings for both “Abuse, Elder” and “Elder Maltreatment.” This supports the varied ways in which people look for information.
The IRS Withholding Calculator, shown in Figure 8-21, provides an example: it consists of a highly editorialized selection of important links wrapped in helpful (and clearly structured) copy.
Figure 8-21. The introduction to the IRS Withholding Calculator
Rules of thumb for designing guides include:
- The guide should be short.
- At any point, the user should be able to exit the guide.
- Navigation (Previous, Home, Next, swiping gestures) should be consistent so that users can easily step back and forth through the guide.
- The guide should be designed to answer questions.
- Screenshots should be crisp, clear, and optimized, with enlarged details of key features.
- If the guide includes more than a few pages, it may need its own table of contents.
Sophisticated configurators, like Motorola’s Moto Maker, shown in Figure 8-22, allow the user to easily traverse complicated decision-making processes.
Figure 8-22. The Moto Maker configurator
Often, users don’t have a clear understanding of the impact of the choices that affect the configuration process. It is desirable to provide them with contextual clues that help them make sense of the various options available. For example, the iOS Apple Store application (Figure 8-23) includes product images that show changes to the product based on the user’s selected color finish, and includes text that explains the impact of more technical options on the product.
Figure 8-23. The iOS Apple Store application on the iPad
Personalization
involves serving up information to the user based upon a model of the behavior, needs, or preferences of that individual. In contrast, customization
involves giving the user direct control over some combination of presentation, navigation, and content options. In short, with personalization, we guess what the user wants, and with customization, the user tells us what he wants.
The reality is that personalization and customization:
- Typically play important but limited roles
- Require a solid foundation of structure and organization
- Are really difficult to do well
- Can make it more difficult to collect metrics and analyze user behavior
It’s when Amazon starts trying to recommend products based on past purchases that the system breaks down (see Figure 8-24).
Figure 8-24. Amazon’s personalized recommendations
For example, Gmail allows the user to set the visibility and order of labels—a critical element in the structuring of the user’s mail in the system—by dragging and dropping them within a global navigation structure (Figure 8-25).
Figure 8-25. Customization in Gmail
Visualization has proven most useful when the user must select among a result set of elements that she knows by their looks, as in the case of shopping for physical goods (Figure 8-26).
Figure 8-26. Google Shopping’s visual search results
Reddit, a content aggregation and discovery service, employs such a voting system—in fact, it is its primary differentiator (Figure 8-27).
Figure 8-27. The sequence in which stories are presented on Reddit’s home page is defined by the up- or down-votes of registered site users
Other systems depend on much richer and more complex social algorithms. For example, many of Facebook’s navigation structures consist of dynamically generated lists of content items: from the sequence of posts that appear in the user’s main timeline to lists of suggested pages and other Facebook users you may know (Figure 8-28).
Figure 8-28. Facebook presents the user with a variety of algorithmically generated lists of navigation links that are influenced by the social graph; the ad selection is also algorithmically determined based on the user’s profile (Facebook knows that Jorge is in the San Francisco Bay area, and that it is Valentine’s Day)
- There are various types of navigation systems; three common ones are global, local, and contextual systems.
- Global navigation systems are intended to be present on every page or screen in the information environment.
- Local navigation systems complement global ones, and allow users to explore the immediate area where they are.
- Contextual navigation systems occur in context of the content being presented in the environment, and support associative learning by allowing users to explore the relationships between items.
- Building context—allowing users to locate their positions within the system—is a critical function of navigation systems.
- There are also various supplemental navigation systems we can use, such as sitemaps, indexes, and guides.
Figure 9-1. The basic anatomy of a search system (image adapted from Search Patterns: Design for Discovery, by Peter Morville and Jeffery Callender)
In Windows 8.1, shown in Figure 9-2, users can select search zones based on the type of content they are looking for (Settings, Files) and—somewhat awkwardly—by its location (Web images, Web videos), with “Web” implying that the “Settings” and “Files” options refer to settings and files on your computer. (Note that “Everywhere” is the default selection.)
Figure 9-2. Search zones in Windows 8.1
Destination
pages contain the actual information you want: sports scores, book reviews, software documentation, and so on. Navigation
pages may include main pages, search pages, and pages that help you browse the environment. The primary purpose of navigation pages is to get you to the destination pages.
Of course, indexing similar content isn’t always easy, because “similar” is a highly relative term. It’s not always clear where to draw the line between navigation and destination pages—in some cases, a page can be considered both. That’s why it’s important to test out navigation/destination distinctions before actually applying them.
The Library of Michigan has three primary audiences: members of the Michigan state legislature and their staffs, Michigan libraries and their librarians, and the citizens of Michigan. The information needed from this site is different for each of these audiences; for example, each has a very different circulation policy.
So we created four indexes: one for each of the three audiences, and one unified index of the entire site in case the audience-specific indexes didn’t do the trick for a particular search. Table 9-1 shows the results from running a query on the word “circulation” against each of the four indexes.
Table 9-1. Query results
Index | Documents retrieved | Retrieval reduced by |
---|---|---|
Unified | 40 | — |
Legislature area | 18 | 55% |
Libraries area | 24 | 40% |
Citizens area | 9 | 78% |
For example, if you’re looking for a doctor to help with your rehabilitation, you might select the “Doctors & Medical Staff” search zone, as shown in Figure 9-3.
Figure 9-3. Executing a search against the “Doctors & Medical Staff” search zone
The search interface of the New York Times provides a useful illustration of filtering by date range (Figure 9-4).
Figure 9-4. There are many ways to narrow your New York Times search by date
In the Yelp business listing shown in Figure 9-5, there are more content components than meet the eye. There is a business name, operating hours, images, a link to the business’s website, and some attributes that are invisible to users. There are also content components that we don’t want to search, such as the reviews and tips toward the bottom of the screen.
Figure 9-5. Yelp’s business listings are jam-packed with various content components, some visible and some not
There is another reason to exploit a document’s structure. Content components aren’t useful only for enabling more precise searches; they can also make the format of search results much more meaningful. In Figure 9-6, Yelp’s search results include category and listing titles (“Boulevard Burger,” “Burgers, Breakfast & Brunch”), snippets of reviews (“My wife & I came in last night for dinner...”), number of reviews, average ratings, and locations.
Figure 9-6. Title, rating, and location are content components displayed for each result
We bring up the topic because it’s important to realize that a retrieval algorithm
is essentially a tool, and just like other tools, specific algorithms help solve specific problems. And as retrieval algorithms are at the heart of search engines, it’s important to note that there is absolutely no single search engine that will meet all of your users’ information needs.
Query builders
are tools that can soup up a query’s performance. They are often invisible to users, who may not understand their value or how to use them. Common examples include:
Spell checkers
:These allow users to misspell terms and still retrieve the right results by automatically correcting search terms. For example, “accomodation” would be treated as “accommodation,” ensuring retrieval of results that contain the correct term.Phonetic tools
:Phonetic tools (the best-known of which is “Soundex”) are especially useful when searching for a name. They can expand a query on “Smith” to include results with the term “Smyth.”Stemming tools
:Stemming tools allow users to enter a term (e.g., “lodge”) and retrieve documents that contain variant terms with the same stem (e.g., “lodging,” “lodger”).Natural language processing tools
:These can examine the syntactic nature of a query—for example, is it a “how to” question or a “who is” question?—and use that knowledge to narrow retrieval.Controlled vocabularies and thesauri
:Covered in detail in Chapter 10, these tools leverage the semantic nature of a query by automatically including synonyms within the query.
Display less information to users who know what they’re looking for, and more information to users who aren’t sure what they want.
A variant on that simple approach is to show users who are clear on what they’re looking for only representational
content components, such as a title or author, to help them quickly distinguish the result they’re seeking. Users who aren’t as certain of what they’re looking for will benefit from descriptive
content components such as a summary, part of an abstract, or keywords to get a sense of what their search results are about.
For example, the Yelp iPad app allows the user to view search results as listings, a location map, or images (Figure 9-11).
Figure 9-11. The Yelp iPad app allows users to select three different ways of viewing search results: as listings, as locations on a map, or as images
We suggest that you let users know the total number of retrieved documents so they have a sense of how many documents remain as they sift through search results. Also consider providing a results navigation system to help them move through the results. In Figure 9-15, Reuters provides such a navigation system, displaying the total number of results and enabling users to move through the result set 10 items at a time.
Figure 9-15. Reuters allows you to jump ahead through screens of 10 results at a time
There are two common methods for listing retrieval results: sorting
and ranking
. Retrieval results can be sorted chronologically by date, or alphabetically by any number of content component types (e.g., by title, by author, or by department). They can also be ranked by a retrieval algorithm (e.g., by relevance or popularity).
Figure 9-16. Baseball-Reference.com displays search results in alphabetical order
Figure 9-17. The Washington Post’s default list ordering is by reverse chronological order...
Relevance-ranking algorithms (there are many flavors) are typically based on one or more of the following:
- How many of the query’s terms occur in the retrieved document
- How frequently those terms occur in that document
- How close together those terms occur (e.g., are they adjacent, in the same sentence, or in the same paragraph?)
- Where the terms occur (e.g., a document with the query term in its title may be more relevant than one with the query term in its body)
- The popularity of the document where the query terms appear (e.g., is it linked to frequently, and are the sources of its links themselves popular?)
Indexing by humans
is another means of establishing relevance. Keyword and descriptor fields can be searched, leveraging the value judgments of human indexers. For example, manually selected recommendations—popularly known as “best bets”—can be returned as relevant results. In Figure 9-19, the first set of results was associated with the query “Ukraine” in advance.
Figure 9-19. A search of the BBC’s site retrieves a set of manually tagged documents as well as automatic results; the recommendations are called “Editor’s Choice” rather than “best bets”
Google is successful in large part because it ranks results by which ones are the most popular. It does so by factoring in how many links there are to a retrieved document. Google also distinguishes the quality of these links: a link from a site that itself receives many links is worth more than a link from a little-known site. This algorithm, which is part of Google’s “secret sauce” for presenting search results, is known as PageRank
.
User ratings can be used as the basis of retrieval result ordering. In the case of Yelp (see Figure 9-20), these ratings—based on users’ reviews of businesses listed in the system—are integral to helping users judge the value of an item, and form the foundation of an entire information economy.
Figure 9-20. User ratings fuel the ranking of these Yelp results
Advertising has become the predominant business model for publishing online, so it is no surprise that pay-for-placement
(PFP
) has become commonplace in many search systems.
How can we cluster results? The obvious ways are, unfortunately, the least useful: we can use existing metadata, like document type (e.g., .doc, .pdf) and file creation/modification date, to allow us to divide search results into clusters. Much more useful are clusters derived from manually applied metadata, like topic, audience, language, and product family. Unfortunately, approaches based on manual effort can be prohibitively expensive.
In Figure 9-21, Forrester contextualizes the query “user experience” with roles such as “Marketing Leadership” and specific date ranges.
Figure 9-21. Forrester contextualizes search results for the query “user experience”
Figure 9-22. Search results in the iOS App Store include a “GET” button (which lists the app’s price when it is not free)
Figure 9-23. The San Francisco Public Library allows users to add search results to three “shelves”: “Completed,” “In Progress,” and “For Later”
Note that the example in Figure 9-23 includes a “Save Search” link in the upper-right corner of the search results display; the user can name saved search sets for later retrieval.
Your system is likely to have the ubiquitous search box, as shown in Figure 9-24.
Figure 9-24. The ubiquitous search box (in this case, from Apple)
Users make assumptions about how search interfaces work, and you may want to test for those as you design your own search system. Some common user assumptions include:
- “I can just type terms that describe what I’m looking for and the search engine will do the rest.”
- “I don’t have to type in those funny AND, OR, or NOT thingies.”
- “I don’t have to worry about synonyms for my term; if I’m looking for dogs, I just type ‘dogs,’ not ‘canine’ or ‘canines.’”
- “Fielded searching? I don’t have time to learn which fields I can search.”
- “My query will search the entire site.”
You certainly could provide a “help” page that explains how to create more advanced, precise queries, but users may rarely visit this page. Instead, look for opportunities to educate users when they’re ready to learn.
For example, if you search the eBay app for “watches” (see Figure 9-25), you’ll likely get a few more results than you’d like.
Figure 9-25. The eBay app’s search results provide opportunities to revise your search...
Perhaps this is too many? If that’s the case, consider revising your search using our souped-up ‘Refine’ interface, which allows you to narrow your search. Or, select from a list of categories to narrow your results further” (see Figure 9-26).
Figure 9-26. ...including the ability to refine your search by specifying various category-specific facets
Unless your system’s search functionality truly requires more than one field—as is the case with many travel-related services—it is best to keep search limited to a single box. (If more than one field is required, it’s important that they be clearly labeled, as illustrated in Figure 9-27.)
Figure 9-27. Kayak’s flight search form features clearly labeled fields
Figure 9-28. Like many airlines, Lufthansa presents a list of airports that match the first few characters the user types into the origin and destination search boxes
For example, the US Congress website allows knowledgeable users to configure extremely sophisticated searches using Boolean operators (Figure 9-29).
Figure 9-29. Congress.gov allows advanced users to build complex searches using Boolean operators
Displaying the initial search within the search box (as in Figure 9-30) can be quite useful: it restates the search that was just executed, and allows the user to modify it without reentering it.
Figure 9-30. In the Netflix Android app, the query is displayed on the results page and can be revised and reexecuted
It’s useful to make clear what content was searched, especially if your search system supports multiple search zones (see Figure 9-31).
Figure 9-31. The iOS iTunes Store app search system shows you where you searched (i.e., “All”), and makes it easy to reach results from other search zones
Explaining “what happened” can include the two guidelines just mentioned, as well as:
- Restating the query
- Describing what content was searched
- Describing any filters that might be in place (e.g., date ranges)
- Showing implicit Boolean or other operators, such as a default AND
- Showing other current settings, such as the sort order
- Mentioning the number of results retrieved
In Figure 9-32, the New York Times site provides an excellent example of explaining to the user what just happened.
Figure 9-32. All aspects of the search are restated as part of these search results
Figure 9-33. Searching leads to browsing: a search for “2001 a space odyssey” on the Barnes & Noble site retrieves categories as well as documents
Figure 9-34. And browsing leads to searching: navigate to the “Movies & TV” section, and you’ll find the search box set to search that zone
It is still useful to provide some instruction on how to narrow search results, as shown in Figure 9-35.
Figure 9-35. Congress.gov provides advice on how to narrow down searches
You can also help users narrow their results by allowing them to search within their current result sets. In Figure 9-36, the initial search for hotels in New York City retrieved over 600 results; we can “filter by hotel name” for particular brands to narrow our retrieval.
Figure 9-36. Priceline.com allows users to search within the result set
At the other end of the spectrum, zero hits is a bit more frustrating for users and challenging for information architects. We suggest you adopt a “no dead ends” policy to address this problem. “No dead ends” simply means that users always have another option, even if they’ve retrieved zero results. The options might include:
- A means of revising the search
- Search tips or other advice on how to improve the search
- A means of browsing (e.g., including the site’s navigation system or sitemap)
- A human contact if searching and browsing won’t work
- Choosing what to index in your information environment is an important step when configuring your search system.
- There are many different types of search algorithms.
- There are also various different ways of presenting results back to the user.
- All of these factors—what to search, what to retrieve, and how to present the results—come together in the search interface.
Metadata
tags are used to describe documents, pages, images, software, video and audio files, and other content objects for the purposes of improved navigation and retrieval. The keywords
attribute of the HTML <meta>
tag used by many websites provides a simple example.
<meta name="keywords" content="information architecture, content management, knowledge management, user experience">
At its simplest, a controlled vocabulary is a list of equivalent terms
in the form of a synonym ring, or a list of preferred terms
in the form of an authority file. Define relationships between terms (e.g., broader, narrower), and you’ve got a classification scheme. Model associative relationships between concepts (e.g., See Also, See Related), and you’re working on a thesaurus. Figure 10-1 illustrates the relationships between different types of controlled vocabularies.
Figure 10-1. Types of controlled vocabularies
A synonym ring (see Figure 10-2) connects a set of words that are defined as equivalent for the purposes of retrieval.
Figure 10-2. A synonym ring
When a user enters a word into the search engine, that word is checked against the text file. If the word is found, then the query is “exploded” to include all of the equivalent words. For example, in Boolean logic:
(kitchenaid) becomes (kitchenaid or "kitchen aid" or blender or "food processor" or cuisinart or cuizinart)
As you might guess, synonym rings can dramatically improve recall
. In one study conducted at Bellcore in the 1980s, the use of synonym rings, or “unlimited aliasing,” within a small test database increased recall from 20% to 80%. However, synonym rings can also reduce precision
.
Strictly defined, an authority file
is a list of preferred terms or acceptable values.
At Drugstore.com, only the brand names are included in the index (see Figure 10-7); equivalent terms like “tilenol” don’t show up. This keeps the index relatively short and uncluttered and, in this example, reinforces the brand names. However, a trade-off is involved. In cases where the equivalent terms begin with different letters (e.g., aspirin and Bayer), there is value in creating pointers:
Aspirin see Bayer
Otherwise, when users look in the index under A for aspirin, they won’t find Bayer. The use of pointers is called term rotation
.
Figure 10-7. Brand index at Drugstore.com
In Figure 10-8, users looking for “Tylenol” on the US Federal Drug Administration website are guided to the generic term “acetaminophen.”
Figure 10-8. A site index with term rotation
We use classification scheme
to mean an arrangement of preferred terms. These days, many people prefer to use taxonomy
instead. Either way, it’s important to recognize that these arrangements can take different shapes and serve multiple purposes, including:
- A frontend, browsable hierarchy that’s a visible, integral part of the user interface
- A backend tool used by authors and indexers for organizing and tagging documents
Consider, for example, the Dewey Decimal Classification
(DDC
). First published in 1876, the DDC is now “the most widely used classification scheme in the world. In its purest form, the DDC is a hierarchical listing that begins with 10 top-level categories and drills down into great detail within each:
000 Computers, information, & general reference
100 Philosophy & psychology
200 Religion
300 Social sciences
400 Language
500 Science
600 Technology
700 Arts & recreation
800 Literature
900 History & geography
Classification schemes can also be used in the context of searching. You can see in Figure 10-10 that Walmart’s search results present “Departments” categories, which reinforces users’ familiarity with Walmart’s classification scheme.
Figure 10-10. Category Matches at Walmart.com
The Oxford English Dictionary defines thesaurus as “a book that lists words in groups of synonyms and related concepts.”
For the purposes of this book, a thesaurus is:
A controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval.
As you can see from Figure 10-11, each preferred term becomes the center of its own semantic network. The equivalence relationship
is focused on synonym management. The hierarchical relationship
enables the classification of preferred terms into categories and subcategories. The associative relationship
provides for meaningful connections that aren’t handled by the hierarchical or equivalence relationships.
Figure 10-11. Semantic relationships in a thesaurus
The core terminology includes the following:
Preferred Term
(PT):Also known as the accepted term, acceptable value, subject heading, or descriptor. All relationships are defined with respect to the Preferred Term.Variant Term
(VT):Also known as entry terms or non-preferred terms, Variant Terms have been defined as equivalent to or loosely synonymous with the Preferred Term.Broader Term
(BT):The Broader Term is the parent of the Preferred Term. It’s one level higher in the hierarchy.Narrower Term
(NT):A Narrower Term is a child of the Preferred Term. It’s one level lower in the hierarchy.Related Term
(RT):The Related Term is connected to the Preferred Term through the associative relationship. The relationship is often articulated through use of See Also. For example, Tylenol See Also Headache.Use
(U):Traditional thesauri often employ the following syntax as a tool for indexers and users: Variant Term Use Preferred Term. For example, Tilenol Use Tylenol. Many people are more familiar withSee
, as in Tilenol See Tylenol.Used For
(UF):This indicates the reciprocal relationship of Preferred Term UF Variant Term(s). It’s used to show the full list of variants on the Preferred Term’s record. For example, Tylenol UF Tilenol.Scope Note
(SN):The Scope Note is essentially a specific type of definition of the Preferred Term, used to deliberately restrict the meaning of that term in order to rule out ambiguity as much as possible.
Figure 10-12. Semantic relationships in a wine thesaurus
PubMed provides a simpler public interface with free access to citations, but without access to the full text of the journal articles.
Let’s say we’re studying African sleeping sickness. We enter that phrase into the PubMed search engine and are rewarded with the first 20 results out of 5,758 total items found (Figure 10-13).
Figure 10-13. Search results on PubMed
In fact, we didn’t search the full-text articles at all. Instead, we searched the metadata records for these articles, which include a combination of abstracts and subject headings (Figure 10-14).
Figure 10-14. Sample record with abstract in PubMed
When we select another item from our search results, we find a record with subject headings (“MeSH Terms”) but no abstract (Figure 10-15).
Figure 10-15. Sample record with index terms in PubMed
Take a look at the MeSH Browser, an interface for navigating the structure and vocabulary of MeSH (Figure 10-16).
Figure 10-16. The MeSH Browser
The MeSH Browser enables us to navigate by browsing the hierarchical classification schemes within the thesaurus or by searching. If we try a search on “African sleeping sickness,” we’ll see why the article “Wolbachia. A tale of sex and survival” was retrieved in our search. “African sleeping sickness” is actually an entry term for the preferred term or MeSH heading, “Trypanosomiasis, African” (see Figure 10-17). When we searched PubMed, our variant term was mapped to the preferred term behind the scenes.
Figure 10-17. MeSH record for trypanosomiasis (top and bottom of page)
It would be nice, for example, to turn all of those MeSH terms in our sample record into live links and provide enhanced searching and browsing capabilities, similar to those provided by Amazon, as shown in Figure 10-18.
Figure 10-18. Amazon’s use of structure and subject headings for enhanced navigation
One of the advantages to using a thesaurus is that you have tremendous power and flexibility to shape and refine the user interface over time. You can’t take advantage of all the capabilities at once, but you can user-test different features, learning and adjusting as you go.
Figure 10-19. Types of thesauri
A classic thesaurus
is used at the point of indexing and at the point of searching. Indexers use the thesaurus to map variant terms to preferred terms when performing document-level indexing. Searchers use the thesaurus for retrieval, whether or not they’re aware of the role it plays in their search experience. Query terms are matched against the rich vocabulary of the thesaurus, enabling synonym management, hierarchical browsing, and associative linking. This is the full-bodied, fully integrated thesaurus we’ve referred to for much of this chapter.
However, building a classic thesaurus is not always necessary or possible. Consider a scenario in which you have the ability to develop a controlled vocabulary and index documents, but you’re not able to build the synonym-management capability into the search experience. Perhaps another department owns the search engine and won’t work with you, or perhaps the engine won’t support this functionality without major customization.
Whatever the case, you’re able to perform controlled vocabulary indexing, but you’re not able to leverage that work at the point of searching and map users’ variant terms to preferred terms.
An indexing thesaurus positions you nicely to take the next step up to a classic thesaurus.
A searching thesaurus leverages a controlled vocabulary at the point of searching but not at the point of indexing. For example, when a user enters a term into the search engine, a searching thesaurus can map that term onto the controlled vocabulary before executing the query against the full-text index. The thesaurus may simply perform equivalence term explosion, as we’ve seen in the case of synonym rings, or it may go beyond the equivalence relationship, exploding down the hierarchy to include all narrower terms (traditionally known as “posting down”). These methods will obviously enhance recall at the expense of precision.
You also have the option of giving more power and control to the users—asking them whether they’d like to use any combination of preferred, variant, broader, narrower, or associative terms in their queries. When integrated carefully into the search interface and search result screens, this can effectively arm users with the ability to narrow, broaden, and adjust their searches as needed.
Figure 10-20. The equivalence relationship
Our goal is to group terms defined as “equivalent for the purposes of retrieval.” This may include synonyms, near-synonyms, acronyms, abbreviations, lexical variants, and common misspellings. For example:
Preferred term
Apple Watch Sport
Variant terms (equivalents)
Apple Watch, iWatch, Smart watch, Smartwatch, Wearable computer, Galaxy Gear, Moto 360
Figure 10-21. The hierarchical relationship
There are three subtypes of hierarchical relationship:
Generic
:This is the traditional class–species relationship we draw from biological taxonomies. Species B is a member of Class A and inherits the characteristics of its parent. For example, Bird NT Magpie.Whole-part
:In this hierarchical relationship, B is a part of A. For example, Foot NT Big Toe.Instance
:In this case, B is an instance or example of A. This relationship often includes proper names. For example, Seas NT Mediterranean Sea.
Figure 10-22. The associative relationship
Table 10-1. Examples of relationship subtypes
Relationship subtype | Example |
---|---|
Field of Study and Object of Study | Cardiology RT Heart |
Process and its Agent | Termite Control RT Pesticides |
Concepts and their Properties | Poisons RT Toxicity |
Action and Product of Action | Eating RT Indigestion |
Concepts Linked by Causal Dependence | Celebration RT New Year’s Eve |
Table 10-2. Issues covered in the ANSI/NISO thesaurus standard
Topic | Our interpretation and advice |
---|---|
Grammatical form | The standard strongly encourages the use of nouns for preferred terms. This is a good default guideline, because users are better at understanding and remembering nouns than verbs or adjectives. However, in the real world, you’ll encounter lots of good reasons to use verbs (i.e., task-oriented words) and adjectives (e.g., price, size, variety, color) in your controlled vocabularies. |
Spelling | The standard notes that you can select a “defined authority,” such as a specific dictionary or glossary, or you can choose to use your own “house style.” You might also consider the most common spelling forms employed by your users. The most important thing here is that you make a decision and stick to it. Consistency will improve the lives of your indexers and users. |
Singular and plural form | The standard recommends using the plural form of “count nouns” (e.g., cars, roads, maps). Conceptual nouns (e.g., math, biology) should remain in singular form. Search technology has rendered this less important than in the past. Once again, consistency is the goal in this case. |
Abbreviations and acronyms | The guidelines suggest to default to popular use. For the most part, your preferred terms will be the full words. But in cases such as RADAR, IRS, 401K, MI, and TV, it may be better to use the acronym or abbreviation. You can always rely on your variant terms to guide users from one form to the other (e.g., Internal Revenue Service See IRS). |
Figure 10-23. Hierarchy and polyhierarchy
Figure 10-24. Polyhierarchy in MEDLINE
At the footer of most articles in the Wikipedia website is a box with links to the higher levels in the hierarchy that list that particular article (Figure 10-25).
Figure 10-25. Polyhierarchy in Wikipedia
Figure 10-26. Single hierarchy versus multiple (faceted) hierarchies
The mega-menu shown in Figure 10-27 presents various ways to browse, providing multiple paths to the same information.
Figure 10-27. Faceted classification at Wine.com
The Advanced Wine Search, shown in Figure 10-28, provides the ability to combine facets into the rich type of query we usually express in natural language.
Figure 10-28. Advanced Wine Search at Wine.com
Note that not only are we able to leverage facets in the search, but we can also use the facets to sort results. Wine.com has added ratings from several magazines (RP = Robert Parker’s The Wine Advocate, WS = Wine Spectator) as yet another facet.
Figure 10-29. Flexible search and results display
- Thesauri, controlled vocabularies, and metadata operate on the backend of an information environment to enable a more seamless and satisfying experience on the frontend.
- Metadata tags are used to describe documents, pages, images, software, video and audio files, and other content objects for the purposes of improved navigation and retrieval.
- Controlled vocabularies are subsets of natural language; they include synonym rings, authority files, classification schemes, and thesauri.
- These systems allow you to structure and map language so that people can more easily find information.
- Faceted classification and polyhierarchy allow you to make information available in more than one way, allowing people to find their own routes to the stuff they’re looking for.
Figure 11-1. The process of information architecture development
- The
research
phase begins with a review of existing background materials and meetings with the strategy team, aimed at gaining a high-level understanding of the goals and business context, the existing information architecture, the content, and the intended audiences. It then quickly moves into a series of studies, employing a variety of methods to explore the information ecology. - This research provides a contextual understanding that forms the foundation for development of an information architecture
strategy
. From a top-down perspective, this strategy defines the highest two or three levels of the information environment’s organization and navigation structures. From a bottom-up perspective, it suggests candidate document types and a rough metadata schema. This strategy provides a high-level framework for the information architecture, establishing a direction and scope that will guide the project through implementation. - Design is where you shape a high-level strategy into an information architecture, creating detailed sitemaps, wireframes, and metadata schema that will be used by graphic designers, programmers, content authors, and the production team. The
design
phase is obviously where most of the work of information architecture is done. That said, quantity cannot drive out quality. Poor design execution can ruin the best strategy. Implementation
is where your designs are put to the test as the system is built, tested, and launched. This phase involves organizing and tagging documents, testing and troubleshooting, and developing documentation and training programs to ensure that the information architecture can be maintained effectively over time.- And last but not least is
administration
, the continuous evaluation and improvement of the system’s information architecture. Administration includes the daily tasks of tagging new documents and weeding out old ones. It also requires monitoring usage and user feedback, identifying opportunities to improve through major or minor adjustments. Effective administration can make a good information environment great.