Ross Esmond

Code, Prose, and Mathematics.

portrait of myself, Ross Esmond
Written — Last Updated

Complexities of Typesetting

Typesetting is the act of positioning the letters of a written work for legibility and visual appeal. Before the printing press was invented, typesetting was performed by transcribers and was a skill akin to handwriting. Of course, today most typesetting is performed automatically by computers, but they are in many ways disadvantaged at dealing with some of the complexities of typesetting. This problem was so apparent that Donald Knuth spent decades developing LaTeX in large part to improve upon computer controlled typesetting. These complexities may be summarized as four items

  • Justification
  • Hyphenation
  • Weighing justification against hyphenation.
  • Weighing across entire paragraphs.

Justification

Text justification is the horizontal positioning of characters such that the left and right edge of lines align with the left and right margins, creating a more appealing presentation of the text. Virtually all books have justified text, as it is widely considered to be the supperior option. Digital documents, on the other hand, are often left or right aligned, depending on the language. This can be attributed to two complexities: the algorithmic complexity of text justification, and the difficulty in creating a rich-text editor with justified content.

Left aligned text is exceedingly simple to position. Each word may be placed one after the other without ever needing to adjust its location. Once a words size has been determined, it may need to be positioned on the next line of text, but the position will always stick. Justified content, on the other hand, requires that an entire lines worth of text be rendered so that every word be positioned correctly because aligning the text on both sides of a line requires inter-word spaces. The amount of space that must be allocated to the spaces between words depends on the remaining space after the words are measured, which requires that positioning take place retroactively. This becomes even more complex when you implement inter-character spacing—adjustable space between the characters of words. Inter-word spacing is preferred to inter-character, as the space is already expected by the reader, but inter-character space may be used to adjust the severity of the space between words.

If you’re using a modern browser, the text on the left is using inter-character text spacing, whereas the text on the right is using inter-word text spacing.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Inter-character spacing is far more unpleasant than inter-word, and should only be used to alleviate adverse affects of other, worse alignment choices. For this reason, a computer alogithm would need to weigh different options against each other in order to determine the degree to which each feature should be leveraged in order to achieve a pleasant alignement. An algorithm must then measure the text to be positioned, count the number of spaces on the line, and split spacing amoungst both the word spaces and between the letters.

Hyphenation

TODO