Ross Esmond

Code, Prose, and Mathematics.

portrait of myself, Ross Esmond
Written — Last Updated

Documentation Proximity Principle

The probability that documentation will be updated to account for changes to code depends on how close the documentation is to the software it describes. A comment next to a line of code is likely to be updated when the line is changed, whereas a comment in a wiki that exists entirely separate from the code base is unlikely to be updated with any minor code change. This effect is precipitated by the time it takes the developer to find the corresponding documentation to the code being changed. A comment several lines above the target code takes almost no time to find, but a single comment in a wiki might take the developer a comparatively prohibitive amount of time to find, as they may have to search through unrelated documentation before discovering it. This issue may therefore be alleviated if the code houses Breadcrumbs to the documentation, like if there exists a comment with a link to the section of the wiki that requires updating.

Affecting the rate of degradation

We may construct a rough approximation of the number of errors in documentation based on the rate that the documentation requires pdates, $r$, the time since the documentation was first written, $t$, and the probability that the developers will fail to update the documentation, $p$. The number of errors in any documentation is then $$e = prt.$$ Since, to the best of my knowledge, we cannot hinder the passage of time, we can only reduce the errors in our documentation by either reducing the rate at which it requires updates, or by increasing the chances that a developer will perform the required update. The rate of required changes is, of course, dependent on the rate that the software is updated, but if the developers update the software infrequently, they likely expect to update the documentation just as infrequently. It is rare that you find software where a majority of the development time is spent on documentation, and if that is the case, the developers are likely dedicated enough to accurate documentation to counteract the Documentation Proximity Principle.

The rate that documentation requires updates, $r$, is dependant on the type of documentation that you are writing and what aspect of the software it captures. If the documentation captures details of the software precisely—for example, the signature of a method—then it must update with any change to the parameters or return value. If, however, the documentation only captures a high-level overview of the design of the software, then minor software changes will not necessitate a documentation update. It is not advisable to change what documentation you write in order to reduce the rate of errors, but you can use the nature of the documentation to determine the necessary proximity of the documentation.

Of course, the most obvious conclusion of this article is that you may lower the rate of documentation errors by placing the documentation closer to the code, or by making the path to find the corresponding documentation more obvious.

Levels of proximity

Proximity to code may be stratified into several categories. The first is self-documenting code, which may not seem like proper documentation, but I am including it as it serves as a direct alternative to documentation. Next, we have code comments, which may be split into inline comments and annotations. Finally, we have any documentation that exists entirely separate from the code, which may be in readme text files in the repository, wiki entries in a secondary repository, or on a documentation site. As a general rule, documentation should always be put as close to the code as is reasonable.

Self-documenting code

Self-documented code carefully labels its code to better describe its purpose. To write self-documented code, you should consider splitting code out into named sections such that you may label each section with a descriptive, human-readable title. Nested expressions may be separated into immutable variables so that the purpose of the expression is captured as a variable name. Whereas blocks of statements may be separated into their own function so that the purpose of the statements is captured as a function name. Functions have the added benefit of allowing for annotation comments that may further describe the purpose of the code, which we will discuss in a later section.

Self-documenting code does not always require that you write the functions or variables ad hoc. Choosing to build code from higher-level libraries or with ready-made domain-specific languages also produces better self-documenting code. HTML5, for instance, introduced self-documenting tags with their push toward the Semantic Web. Instead of building pages almost entirely with div tags, HTML5 affords programmers access to the header, article, nav, and section tags, such that elements describe their own purpose. These tags have the added benefit of being easier for the browser and screen readers to understand, in addition to being more human-readable.

Comments

Inline comments are made amongst the code they document, usually above or at the end of the line. Self-documenting strategies are generally better than inline comments, but there are reasons to opt for a comment. An obvious reason is that not all comments fit into a variable name. If, for example, you want to explain that you used 15 digits of pi because that’s how many NASA uses, that would be difficult to fit into a variable name, but would be perfect for a comment. This approach fits with the advice that many developers give about comments, that they shouldn’t document what the code is doing, but why the code is doing it. Comments may also be used to introduce Breadcrumbs into the code, when a programmatic method is too burdensome.

Annotation comments exist in most general-purpose programming languages and allow the developer to attach machine-readable documentation to software entities. This documentation may be used in three primary ways. It may be automatically scraped to produce documentation using a Documentation Generator. It may be scraped by the IDE to be presented to developers as necessary. And, of course, it may be read by maintainers from the original source file, as with inline comments.

Using Documentation Generators is an excellent way to improve the proximity of documentation to code, as it allows for wiki-style documentation to be housed next to the code which each section documents. This makes the process of finding the documentation for any particular method, class, or component trivially easy, as the documentation couldn’t be closer. It is difficult to come up with a reason //not// to use a Documentation Generator for any documentation with a one-to-one relationship to the codebase. Readme’s and wiki entries may then be used to write higher-level overviews of the design and usage of the software, which would not have a specific place as a code comment.

Many IDE’s are able to coopt these comments into their Intelligent Code Completion system so that the documentation is presented on-demand as developers interact with the documented entities. A comment on a method, for instance, may then be presented alongside the method name during code completion, or when a developer hovers over a reference to the method. This parenthetical presentation then alleviates the need to search for the documentation page on the web or to visit the source of the method for insight into the purpose of the method.

As with all comments, annotations are still available to be read directly from the source, creating a hybrid approach to documentation that functions even without advanced tools to find and reformat the comments.

Long-form documentation

Separate documentation should be reserved for descriptions of the software without a one-to-one correlation to the software. Changes to the software might still necessitate a change to the documentation, but less frequently than with any form of in-file documentation. This form of documentation is best suited for high-level overviews on design and software usage, like tutorials. Of course, if the tutorial references a function that is deprecated, you will still need to update the tutorial. Therefore, it is often best to only reference stable parts of the codebase in long-form documentation, such that you only need to update the documentation on known breaking changes. You may also use static analysis or specialized code runners like rundoc for Python to verify the correctness of the documentation. These tools tend to be necessary since developers will frequently forget to update separate documentation due to its distance to the code.

Long-form documentation also has its own levels of proximity to the code. In particular, a readme file in the same folder as the code being documented is still more likely to be updated than an online wiki, but any distinct file is likely to be forgotten by developers on occasion. To increase the likelihood that a maintainer will find the documentation, you may wish to leave Breadcrumbs to the documentation itself, though this often means that the documentation would have been better as an annotation.