Chapter 14 Choosing Names
Selecting names for variables, methods, and other entities is one of the most underrated aspects of software design. Good names are a form of documentation: they make code easier to understand. They reduce the need for other documentation and make it easier to detect errors. Conversely, poor name choices increase the complexity of code and create ambiguities and misunderstandings that can result in bugs. Name choice is an example of the principle that complexity is incremental. Choosing a mediocre name for a particular variable, as opposed to the best possible name, probably won’t have much impact on the overall complexity of a system. However, software systems have thousands of variables; choosing good names for all of these will have a significant impact on complexity and manageability.
为变量,方法和其他实体选择名称是软件设计中被低估的方面之一。良好的名字是一种文档形式:它们使代码更易于理解。它们减少了对其他文档的需求,并使检测错误更加容易。相反,名称选择不当会增加代码的复杂性,并造成可能导致错误的歧义和误解。名称选择是复杂度是递增的原理的一个示例。为特定变量选择一个平庸的名称,而不是最好的名称,这可能不会对系统的整体复杂性产生太大影响。但是,软件系统具有数千个变量。为所有这些选择好名字将对复杂性和可管理性产生重大影响。
Sometimes even a single poorly named variable can have severe consequences. The most challenging bug I ever fixed came about because of a poor name choice. In the late 1980’s and early 1990’s my graduate students and I created a distributed operating system called Sprite. At some point we noticed that files would occasionally lose data: one of the data blocks suddenly became all zeroes, even though the file had not been modified by a user. The problem didn’t happen very often, so it was exceptionally difficult to track down. A few of the graduate students tried to find the bug, but they were unable to make progress and eventually gave up. However, I consider any unsolved bug to be an intolerable personal insult, so I decided to track it down.
有时,即使是一个名称不正确的变量也会产生严重的后果。我曾经修复过的最具挑战性的错误是由于名称选择不当造成的。在 1980 年代末和 1990 年代初,我的研究生和我创建了一个名为 Sprite 的分布式操作系统。在某个时候,我们注意到文件偶尔会丢失数据:即使用户未修改文件,数据块之一突然变为全零。该问题并不经常发生,因此很难追踪。一些研究生试图找到该错误,但他们未能取得进展,最终放弃了。但是,我认为任何未解决的错误都是无法忍受的个人侮辱,因此我决定对其进行跟踪。
It took six months, but I eventually found and fixed the bug. The problem was actually quite simple (as are most bugs, once you figure them out). The file system code used the variable name block for two different purposes. In some situations, block referred to a physical block number on disk; in other situations, block referred to a logical block number within a file. Unfortunately, at one point in the code there was a block variable containing a logical block number, but it was accidentally used in a context where a physical block number was needed; as a result, an unrelated block on disk got overwritten with zeroes.
花了六个月的时间,但我最终找到并修复了该错误。这个问题实际上很简单(就像大多数错误一样,一旦找出它们)。文件系统代码将变量名块用于两个不同的目的。在某些情况下,块是指磁盘上的物理块号。在其他情况下,块是指文件中的逻辑块号。不幸的是,在代码的某一点上有一个包含逻辑块号的块变量,但是在需要物理块号的情况下意外地使用了它。结果,磁盘上无关的块被零覆盖。
While tracking down the bug, several people, including myself, read over the faulty code, but we never noticed the problem. When we saw the variable block used as a physical block number, we reflexively assumed that it really held a physical block number. It took a long process of instrumentation, which eventually showed that the corruption must be happening in a particular statement, before I was able to get past the mental block created by the name and check to see exactly where its value came from. If different variable names had been used for the different kinds of blocks, such as fileBlock and diskBlock, it’s unlikely that the error would have happened; the programmer would have known that fileBlock couldn’t be used in that situation.
在跟踪该错误时,包括我自己在内的几个人阅读了错误的代码,但我们从未注意到问题所在。当我们看到可变块用作物理块号时,我们反身地假设它确实拥有物理块号。经过很长时间的检测,最终显示出腐败一定是在特定的语句中发生的,然后我才能越过该名称所创建的思维障碍,并查看其价值的确切来源。如果对不同种类的块(例如 fileBlock 和 diskBlock)使用了不同的变量名,则错误不太可能发生;程序员会知道在那种情况下不能使用 fileBlock。
Unfortunately, most developers don’t spend much time thinking about names. They tend to use the first name that comes to mind, as long as it’s reasonably close to matching the thing it names. For example, block is a pretty close match for both a physical block on disk and a logical block within a file; it’s certainly not a horrible name. Even so, it resulted in a huge expenditure of time to track down a subtle bug. Thus, you shouldn’t settle for names that are just “reasonably close”. Take a bit of extra time to choose great names, which are precise, unambiguous, and intuitive. The extra attention will pay for itself quickly, and over time you’ll learn to choose good names quickly.
不幸的是,大多数开发人员没有花太多时间在思考名字。他们倾向于使用想到的名字,只要它与匹配的名字相当接近即可。例如,块与磁盘上的物理块和文件内的逻辑块非常接近;这肯定不是一个可怕的名字。即使这样,它仍然要花费大量时间来查找一个细微的错误。因此,您不应该只选择“合理接近”的名称。花一些额外的时间来选择准确,明确且直观的好名字。额外的注意力将很快收回成本,随着时间的流逝,您将学会快速选择好名字。
When choosing a name, the goal is to create an image in the mind of the reader about the nature of the thing being named. A good name conveys a lot of information about what the underlying entity is, and, just as important, what it is not. When considering a particular name, ask yourself: “If someone sees this name in isolation, without seeing its declaration, its documentation, or any code that uses the name, how closely will they be able to guess what the name refers to? Is there some other name that will paint a clearer picture?” Of course, there is a limit to how much information you can put in a single name; names become unwieldy if they contain more than two or three words. Thus, the challenge is to find just a few words that capture the most important aspects of the entity.
选择名称时,目标是在读者的脑海中创建一幅关于被命名事物的性质的图像。一个好名字传达了很多有关底层实体是什么,以及同样重要的是,不是什么的信息。在考虑特定名称时,请问自己:“如果有人孤立地看到该名称,而没有看到其声明,文档或使用该名称的任何代码,他们将能够猜到该名称指的是什么?还有其他名称可以使画面更清晰吗?” 当然,一个名字可以输入多少信息是有限制的。如果名称包含两个或三个以上的单词,则会变得笨拙。因此,面临的挑战是仅找到捕获实体最重要方面的几个单词。
Names are a form of abstraction: they provide a simplified way of thinking about a more complex underlying entity. Like other forms of abstraction, the best names are those that focus attention on what is most important about the underlying entity while omitting details that are less important.
名称是一种抽象形式:名称提供了一种简化的方式来考虑更复杂的基础实体。像其他形式的抽象一样,最好的名字是那些将注意力集中在对底层实体最重要的东西上,而忽略那些次要的细节。
Good names have two properties: precision and consistency. Let’s start with precision. The most common problem with names is that they are too generic or vague; as a result, it’s hard for readers to tell what the name refers to; the reader may assume that the name refers to something different from reality, as in the block bug above. Consider the following method declaration:
良好名称具有两个属性:精度和一致性。让我们从精度开始。名称最常见的问题是名称太笼统或含糊不清。结果,读者很难说出这个名字指的是什么。读者可能会认为该名称所指的是与现实不符的事物,如上面的代码错误所示。考虑以下方法声明:
/**
* Returns the total number of indexlets this object is managing.
*/
int IndexletManager::getCount() {...}
The term “count” is too generic: count of what? If someone sees an invocation of this method, they are unlikely to know what it does unless they read its documentation. A more precise name like getActiveIndexlets or numIndexlets would be better: with one of these names, readers will probably be able to guess what the method returns without having to look at its documentation.
术语“计数”太笼统了:计数什么?如果有人看到此方法的调用,除非他们阅读了它的文档,否则他们不太可能知道它的作用。像 getActiveIndexlets 或 numIndexlets 这样的更精确的名称会更好:使用这些名称之一,读者可能无需查看其文档就能猜测该方法返回的内容。
Here are some other examples of names that aren’t precise enough, taken from various student projects:
以下是来自其他学生项目的一些名称不够精确的示例:
-
A project building a GUI text editor used the names x and y to refer to the position of a character in the file. These names are too generic. They could mean many things; for example, they might also represent the coordinates (in pixels) of a character on the screen. Someone seeing the name x in isolation is unlikely to think that it refers to the position of a character within a line of text. The code would be clearer if it used names such as charIndex and lineIndex, which reflect the specific abstractions that the code implements.
建立 GUI 文本编辑器的项目使用名称 x 和 y 来引用字符在文件中的位置。这些名称太笼统了。他们可能意味着很多事情;例如,它们也可能代表屏幕上字符的坐标(以像素为单位)。单独看到名称 x 的人不太可能会认为它是指字符在一行文本中的位置。如果使用诸如 charIndex 和 lineIndex 之类的名称来反映代码实现的特定抽象,该代码将更加清晰。
-
Another editor project contained the following code:
另一个编辑器项目包含以下代码:
// Blink state: true when cursor visible. private boolean blinkStatus = true;
The name blinkStatus doesn’t convey enough information. The word “status” is too vague for a boolean value: it gives no clue about what a true or false value means. The word “blink” is also vague, since it doesn’t indicate what is blinking. The following alternative is better:
名称 blinkStatus 无法传达足够的信息。“状态”一词对于布尔值来说太含糊了:它不提供关于真值或假值含义的任何线索。“闪烁”一词也含糊不清,因为它并不表示闪烁的内容。以下替代方法更好:
// Controls cursor blinking: true means the cursor is visible, // false means the cursor is not displayed. private boolean cursorVisible = true;
The name cursorVisible conveys more information; for example, it allows readers to guess what a true value means (as a general rule, names of boolean variables should always be predicates). The word “blink” is no longer in the name, so readers will have to consult the documentation if they want to know why the cursor isn’t always visible; this information is less important.
名称 cursorVisible 传达了更多信息;例如,它允许读者猜测一个真值的含义(通常,布尔变量的名称应始终为谓词)。名称中不再包含“ blink”一词,因此,如果读者想知道为什么光标不总是可见,则必须查阅文档。此信息不太重要。
-
A project implementing a consensus protocol contained the following code:
一个实施共识协议的项目包含以下代码:
// Value representing that the server has not voted (yet) for // anyone for the current election term. private static final String VOTED_FOR_SENTINEL_VALUE = "null";
The name for this value indicates that it’s special but it doesn’t say what the special meaning is. A more specific name such as NOT_YET_VOTED would be better.
此值的名称表示它是特殊的,但没有说明特殊含义是什么。使用更具体的名称(例如 NOT_YET_VOTED)会更好。
-
A variable named result was used in a method with no return value. This name has multiple problems. First, it creates the misleading impression that it will be the return value of the method. Second, it provides essentially no information about what it actually holds, except that it is some computed value. The name should provide information about what the result actually is, such as mergedLine or totalChars. In methods that do actually have return values, then using the name result is reasonable. This name is still a bit generic, but readers can look at the method documentation to see its meaning, and it’s helpful to know that the value will eventually become the return value.
在没有返回值的方法中使用了名为 result 的变量。这个名字有多个问题。首先,它会产生误导性的印象,即它将作为方法的返回值。其次,除了它是一些计算值外,它实际上不提供有关其实际持有内容的任何信息。该名称应提供有关实际结果是什么的信息,例如 mergedLine 或 totalChars。在实际上确实具有返回值的方法中,使用名称结果是合理的。该名称仍然有点通用,但是读者可以查看方法文档以了解其含义,这有助于知道该值最终将成为返回值。
img Red Flag: Vague Name img
If a variable or method name is broad enough to refer to many different things, then it doesn’t convey much information to the developer and the underlying entity is more likely to be misused.
如果变量或方法的名称足够广泛,可以引用许多不同的事物,那么它不会向开发人员传达太多信息,因此底层实体很可能会被滥用。
Like all rules, the rule about choosing precise names has a few exceptions. For example, it’s fine to use generic names like i and j as loop iteration variables, as long as the loops only span a few lines of code. If you can see the entire range of usage of a variable, then the meaning of the variable will probably be obvious from the code so you don’t need a long name. For example, consider the following code:
像所有规则一样,有关选择精确名称的规则也有一些例外。例如,只要循环仅跨越几行代码,就可以将通用名称(如 i 和 j)用作循环迭代变量。如果您可以看到一个变量的整个用法范围,那么该变量的含义在代码中就很明显了,因此您不需要长名称。例如,考虑以下代码:
for (i = 0; i < numLines; i++) {
...
}
It’s clear from this code that i is being used to iterate over each of the lines in some entity. If the loop gets so long that you can’t see it all at once, or if the meaning of the iteration variable is harder to figure out from the code, then a more descriptive name is in order.
从这段代码中很明显,我正被用来迭代某个实体中的每一行。如果循环时间太长,以至于您无法一次看到全部内容,或者如果很难从代码中找出迭代变量的含义,那么应该使用更具描述性的名称。
It’s also possible for a name to be too specific, such as in this declaration for a method that deletes a range of text:
名称也可能太具体,例如在此声明中删除一个文本范围的方法:
void delete(Range selection) {...}
The argument name selection is too specific, since it suggests that the text being deleted is always selected in the user interface. However, this method can be invoked on any range of text, selected or not. Thus, the argument name should be more generic, such as range.
参数名称的选择过于具体,因为它建议始终在用户界面中选择要删除的文本。但是,可以在任意范围的文本(无论是否选中)上调用此方法。因此,参数名称应更通用,例如范围。
If you find it difficult to come up with a name for a particular variable that is precise, intuitive, and not too long, this is a red flag. It suggests that the variable may not have a clear definition or purpose. When this happens, consider alternative factorings. For example, perhaps you are trying to use a single variable to represent several things; if so, separating the representation into multiple variables may result in a simpler definition for each variable. The process of choosing good names can improve your design by identifying weaknesses.
如果您发现很难为精确,直观且时间不长的特定变量命名,那么这是一个危险信号。这表明该变量可能没有明确的定义或目的。发生这种情况时,请考虑其他因素。例如,也许您正在尝试使用单个变量来表示几件事;如果是这样,将表示形式分成多个变量可能会导致每个变量的定义更简单。选择好名字的过程可以通过识别弱点来改善您的设计。
img Red Flag: Hard to Pick Name img
If it’s hard to find a simple name for a variable or method that creates a clear image of the underlying object, that’s a hint that the underlying object may not have a clean design.
如果很难为创建基础对象清晰图像的变量或方法找到简单的名称,则表明基础对象可能没有简洁的设计。
The second important property of good names is consistency. In any program there are certain variables that are used over and over again. For example, a file system manipulates block numbers repeatedly. For each of these common usages, pick a name to use for that purpose, and use the same name everywhere. For example, a file system might always use fileBlock to hold the index of a block within a file. Consistent naming reduces cognitive load in much the same way as reusing a common class: once the reader has seen the name in one context, they can reuse their knowledge and instantly make assumptions when they see the name in a different context.
名誉的第二个重要属性是一致性。在任何程序中,都会反复使用某些变量。例如,文件系统反复操作块号。对于每种常见用法,请选择一个用于该目的的名称,并在各处使用相同的名称。例如,文件系统可能总是使用 fileBlock 来保存文件中块的索引。一致的命名方式与重用普通类的方式一样,可以减轻认知负担:一旦读者在一个上下文中看到了该名称,他们就可以重用其知识并在不同上下文中看到该名称时立即做出假设。
Consistency has three requirements: first, always use the common name for the given purpose; second, never use the common name for anything other than the given purpose; third, make sure that the purpose is narrow enough that all variables with the name have the same behavior. This third requirement was violated in the file system bug at the beginning of the chapter. The file system used block for variables with two different behaviors (file blocks and disk blocks); this led to a false assumption about the meaning of a variable, which in turn resulted in a bug.
一致性具有三个要求:首先,始终将通用名称用于给定目的;第二,除了给定目的外,切勿使用通用名称;第三,确保目的足够狭窄,以使所有具有名称的变量都具有相同的行为。在本章开头的文件系统错误中违反了此第三项要求。文件系统使用块来表示具有两种不同行为的变量(文件块和磁盘块);这导致对变量含义的错误假设,进而导致错误。
Sometimes you will need multiple variables that refer to the same general sort of thing. For example, a method that copies file data will need two block numbers, one for the source and one for the destination. When this happens, use the common name for each variable but add a distinguishing prefix, such as srcFileBlock and dstFileBlock.
有时您将需要多个变量来引用相同的一般事物。例如,一种复制文件数据的方法将需要两个块号,一个为源,一个为目标。发生这种情况时,请对每个变量使用通用名称,但要添加一个可区分的前缀,例如 srcFileBlock 和 dstFileBlock。
Loops are another area where consistent naming can help. If you use names such as i and j for loop variables, always use i in outermost loops and j for nested loops. This allows readers to make instant (safe) assumptions about what’s happening in the code when they see a given name.
循环是一致性命名可以提供帮助的另一个领域。如果将诸如 i 和 j 之类的名称用于循环变量,则始终在最外层循环中使用 i,而在嵌套循环中始终使用 j。这使读者可以在看到给定名称时对代码中发生的事情做出即时(安全)假设。
Not everyone shares my views about naming. Some of the developers of the Go language argue that names should be very short, often only a single character. In a presentation on name choice for Go, Andrew Gerrand states that “long names obscure what the code does.”1 He presents this code sample, which uses single-letter variable names:
并非所有人都同意我对命名的看法。一些使用 Go 语言的开发人员认为,名称应该非常简短,通常只能是一个字符。在关于 Go 的名称选择的演示中,Andrew Gerrand 指出“长名称模糊了代码的作用。” 1 他介绍了此代码示例,该示例使用单字母变量名:
func RuneCount(b []byte) int {
i, n := 0, 0
for i < len(b) {
if b[i] < RuneSelf {
i++
} else {
_, size := DecodeRune(b[i:])
i += size
}
n++
}
return n
}
and argues that it is more readable than the following version, which uses longer names:
并认为它比以下使用更长名称的版本更具可读性:
func RuneCount(buffer []byte) int {
index, count := 0, 0
for index < len(buffer) {
if buffer[index] < RuneSelf {
index++
} else {
_, size := DecodeRune(buffer[index:])
index += size
}
count++
}
return count
}
Personally, I don’t find the second version any more difficult to read than the first. If anything, the name count gives a slightly better clue to the behavior of the variable than n. With the first version I ended up reading through the code trying to figure out what n means, whereas I didn’t feel that need with the second version. But, if n is used consistently throughout the system to refer to counts (and nothing else), then the short name will probably be clear to other developers.
就个人而言,我发现第二版本比第一版本更难阅读。如果有的话,与 n 相比,名称计数为变量的行为提供了更好的线索。在第一个版本中,我最终通读了代码,试图弄清楚 n 的含义,而第二个版本中我并没有这种需要。但是,如果在整个系统中一致地使用 n 来引用计数(而没有其他内容),那么其他开发人员可能会清楚知道该短名称。
The Go culture encourages the use of the same short name for multiple different things: ch for character or channel, d for data, difference, or distance, and so on. To me, ambiguous names like these are likely to result in confusion and error, just as in the block example.
Go 文化鼓励在多个不同的事物上使用相同的短名称:ch 用于字符或通道,d 用于数据,差异或距离,等等。对我来说,像这样的模棱两可的名称很可能导致混乱和错误,就像在示例中一样。
Overall, I would argue that readability must be determined by readers, not writers. If you write code with short variable names and the people who read it find it easy to understand, then that’s fine. If you start getting complaints that your code is cryptic, then you should consider using longer names (a Web search for “go language short names” will identify several such complaints). Similarly, if I start getting complaints that long variable names make my code harder to read, then I’ll consider using shorter ones.
总的来说,我认为可读性必须由读者而不是作家来决定。如果您使用简短的变量名编写代码,并且阅读该代码的人很容易理解,那么很好。如果您开始抱怨代码很含糊,那么您应该考虑使用更长的名称(在网络上搜索“ go language short name”(使用语言简称)会识别出几种此类抱怨)。同样,如果我开始抱怨长变量名使我的代码难以阅读,那么我会考虑使用较短的变量名。
Gerrand makes one comment that I agree with: “The greater the distance between a name’s declaration and its uses, the longer the name should be.” The earlier discussion about using loop variables named i and j is an example of this rule.
Gerrand 发表一个我同意的评论:“名称声明与使用之间的距离越大,名称就应该越长。” 前面有关使用名为 i 和 j 的循环变量的讨论是此规则的示例。
Well chosen names help to make code more obvious; when someone encounters the variable for the first time, their first guess about its behavior, made without much thought, will be correct. Choosing good names is an example of the investment mindset discussed in Chapter 3: if you take a little extra time up front to select good names, it will be easier for you to work on the code in the future. In addition, you will be less likely to introduce bugs. Developing a skill for naming is also an investment. When you first decide to stop settling for mediocre names, you may find it frustrating and time-consuming to come up with good names. However, as you get more experience you’ll find that it becomes easier; eventually, you’ll get to the point where it takes almost no extra time to choose good names, so you will get the benefits almost for free.
精心选择的名称有助于使代码更明显。当某人第一次遇到该变量时,他们对行为的第一次猜测是正确的。选择好名字是第 3 章讨论的投资思维方式的一个示例:如果您花一些额外的时间来选择好名字,那么将来您将更容易处理代码。此外,您不太可能引入错误。培养命名技巧也是一项投资。当您第一次决定停止为平庸的名字定居时,您会发现想出好名字的过程既令人沮丧又耗时。但是,随着您获得更多的经验,您会发现它变得更加容易。最终,您将几乎不需要花费额外的时间来选择好名字,因此您几乎可以免费获得好处。