Ross Esmond

Code, Prose, and Mathematics.

portrait of myself, Ross Esmond
Written — Last Updated

Data Languages

Data Languages are programming languages purpose-designed to hold information. Some of the most prominent examples of data languages are JSON, XML, YAML, TOML, and RON.

XML

XML has been in use for a long time, which means that many programmers will be familiar with it and it has established tooling, but it is not human friendly. It uses verbose and cluttered syntax, which makes it difficult to write or read by hand.

JSON

JSON fixes some of these problems by having one type of nested data—properties—and using brackets for nesting. There are, however, still three main issues with JSON. First, the specification does not allow for comments, which is an absolute deal breaker in some environments. Second, strings are only single line, which causes long form text to be unfriendly to readers. And finally, deeply nested addresses can be very difficult to find. If I’m looking for foo.bar.baz, I might have to scan through lines one by one, keeping very careful track of my current location and the occurrence of brackets in order to find my desired target. The structure also makes it difficult to use normal text search to find the desired location in one go, though you can use repeated text searches if you work patiently and deliberately.

YAML

YAML fixes two of the three issues I have with JSON. It has comments, multiline strings, but its addresses are still based on nesting. It also introduces new problems similar to my problem with attributes in XML. There are too many features in YAML that make it unnecessarily complex. A particularly pervasive example is the ability to use sequences (arrays) as keys. For most projects, finding a use for all of these features would be a chore, and given the fact that YAML, as a data language, needs to be consumed by a program to be useful, these added features actually get in the way of using YAML for most projects.

TOML

TOML is my favorite configuration language. It fixes all of my issues with JSON. In addition to comments and multiline strings, the tables in TOML (objects in JSON) can specify their address fully, and the TOML parser will simply merge the objects later if converting to JSON. This means that I can specify [foo.bar.baz] directly in code to be searched for later. TOML also has an extremely minimal syntax—one which could be learned in under ten minutes by most developers.

RON

RON, which stands for Rusty Object Notation, is a data language which mimics the object notation of the Rust programming language. In short, RON is to Rust as JSON is to JavaScript. RON, like YAML, has comments and multiline strings, while maintaining the address problem. Unlike YAML, however, RON is a well-designed, well-documented, consistent language. RON also provides some primitive data types that are missing in JSON, like optionals, tuples, and structs. I would happily use RON if provided the opportunity.

Data Languages with Programming Components

Some data languages include elements commonly associated with general-purpose programming languages, like variables and string templates. JT Archie argues in favor of these languages in their article Programming Language over Data Languages. These languages include Jsonnet and CUE.

conclusion

I will likely continue to use JSON regularly in my career, if for no other reason than it being the native data language for the web. But I would prefer another option in almost every other context. If I needed a data store I would prefer to use sqlite, as I know it will be fast, and come with excellent tooling for that purpose. And if I needed a human readable config file, I would certainly prefer to use TOML, or even just a multiline text file if my needs were simple enough.