Paul Caton: Six terms fundamental to modelling transcription

Department of Digital Humanities Lunchtime Seminar.  This is a preview of a paper he will deliver at the Digital Humanities Conference at Lausanne later this month.

Paul uses the HSM model[1] to understand transcriptions. He defines a series of abstractions, which can help us understand the process of transcription, the objects involved in the process and its agents.


A physical manifestation of an object which contains marks.


An alteration on a surface performed by an agent, i.e. scratches, prints, etc. Marks are perceptible by an agent.


Process by which an agent attempts to discover and establish a type sequence of marks in a surface. Readings can be entirely speculative. It assigns token status to marks. The reading agent must comprehend the concept of writing. A positive result state occurs when an agent assigns token status to at least one mark with certainty greater than 0. A negative result occurs when no marks are assigned token status by an agent with certainty greater than 0. A zero result state is when the agent has no certainty either way as to whether a mark is or is not a token.

Token sequence

Must at least have 1 token. A token sequence is not right or wrong, it just exists. Transcription is dependent on the token sequence produced by the reading. A T token sequence is the result of a process of transcription.


An exemplar is a combination of a surface and marks where the an act of reading is attempted, and is the basis for a transcription. The status of being an exemplar is relative; i.e. if one person makes a transcription FOO of exemplar BAR, and then another person wants to make a transcription BAZ of FOO, then FOO has the status of exemplar with respect to BAZ.


When a positive reading result occurs, token sequence is identified, and the token sequence is recognised as type, then, and only then, a surface-mark combination can be considered a document. An agents attempts a reading when it is believed that a surface-mark combination is a document, or at least there is the possibility that it might be a document.

Paul observes that the process of transcription must involve intention to be different from reproduction or copying.

A document is not necessary for the act of transcription to occur, only the intention of recognising token sequences from marks must occur.


