Ross Esmond

Code, Prose, and Mathematics.

portrait of myself, Ross Esmond
Written

Deterministic Data Structure

A Deterministic Data Structure (DDS) is one whose values have not yet been computed but may be realized by a deterministic process that depends only on the location of the value in the data structure, such that the complete data structure is determined at creation. A list of sequential prime numbers is a deterministic data structure if the process to compute and add the next prime is built to always select the next highest prime than the last item in the list. Since the uncomputed values are deterministic, the data structure represents the realized portion of a larger, hypothetical, complete data structure, which may be infinitely large.

The hypothetical data structure is then split between extant data in memory and missing data. The deterministic process to compute missing data is referred to as realizing the data. Extant data is allowed to be deleted, as long as its absence does not affect the determinism of realization. Importantly, the process to decide when to realize or delete data is allowed to be nondeterministic, which allows for control code that attempts to balance performance and memory conservation concerns.

As a memoization cache

One use for a DDS is as a hidden cache for the results of a deterministic function. If the computation of a function is expensive or slow, memoization may be used to avoid redundant computations. Since the function is deterministic, subsequent computations with equivalent inputs will have equivalent results, and so the function need not be run to realize these results as long as they were saved. Conventional memoization only saves the last computed result, which is either returned or discarded depending on whether or not the inputs of consecutive calls match. This style of memoization will miss many redundant calls, however, as any call before the very last is lost. A smarter process to retain results may use the frequency of access to those values, the age of their last access, or the memory cost of the values themselves, but the structure of retained values will be best maintained as a deterministic data structure.