Paul Caton: Six terms fundamental to modelling transcription

Department of Digital Humanities Lunchtime Seminar.  This is a preview of a paper he will deliver at the Digital Humanities Conference at Lausanne later this month.

Paul uses the HSM model[1] to understand transcriptions. He defines a series of abstractions, which can help us understand the process of transcription, the objects involved in the process and its agents.


A physical manifestation of an object which contains marks.


An alteration on a surface performed by an agent, i.e. scratches, prints, etc. Marks are perceptible by an agent.


Process by which an agent attempts to discover and establish a type sequence of marks in a surface. Readings can be entirely speculative. It assigns token status to marks. The reading agent must comprehend the concept of writing. A positive result state occurs when an agent assigns token status to at least one mark with certainty greater than 0. A negative result occurs when no marks are assigned token status by an agent with certainty greater than 0. A zero result state is when the agent has no certainty either way as to whether a mark is or is not a token.

Token sequence

Must at least have 1 token. A token sequence is not right or wrong, it just exists. Transcription is dependent on the token sequence produced by the reading. A T token sequence is the result of a process of transcription.


An exemplar is a combination of a surface and marks where the an act of reading is attempted, and is the basis for a transcription. The status of being an exemplar is relative; i.e. if one person makes a transcription FOO of exemplar BAR, and then another person wants to make a transcription BAZ of FOO, then FOO has the status of exemplar with respect to BAZ.


When a positive reading result occurs, token sequence is identified, and the token sequence is recognised as type, then, and only then, a surface-mark combination can be considered a document. An agents attempts a reading when it is believed that a surface-mark combination is a document, or at least there is the possibility that it might be a document.

Paul observes that the process of transcription must involve intention to be different from reproduction or copying.

A document is not necessary for the act of transcription to occur, only the intention of recognising token sequences from marks must occur.


Sperberg-McQueen, C. M., Claus Huitfeldt, and Allen Renear (2001). Meaning and interpretation of markup. Markup Languages: Theory & Practice 2.3: 215–234. On the Web at

Huitfeldt, Claus, and C. M. Sperberg-McQueen (2008). What is transcription? Literary & Linguistic Computing 23.3: 295–310.

Caton, Paul (2009). Lost in Transcription: Types, Tokens, and Modality in Document Representation. Paper given at Digital Humanities 2009, University of Maryland, College Park, June 2009.

Sperberg-McQueen, C. M.. Claus Huitfeldt, and Yves Marcoux (2009). What is transcription? Part 2. Talk given at Digital Humanities, College Park, Maryland. Slides on the Web at

Huitfeldt, Claus, Yves Marcoux, and C. M. Sperberg-McQueen (2010). Extension of the type/token distinction to document structure. Paper presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 – 6, 2010. In Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5. doi:10.4242/BalisageVol5.Huitfeldt01. On the Web at

Caton, Paul (2012). On the Term ‘Text’ in Digital Humanities. Literary & Linguistic Computing. 28.2: 209–220.

Caton, Paul (2013). Pure transcriptional encoding. Paper given at Digital Humanities 2013, Lincoln, Nebraska.

Sperberg-McQueen, C. M., Yves Marcoux, and Claus Huitfeldt (2014).  Transcriptional implicature: a contribution to markup semantics. Paper to be given at Digital Humanities 2014, Lausanne, Switzerland.

  1. Transcription model based on work by Huitfeldt and Sperberg-McQueen (2008) and continued jointly with Marcoux (2009, 2010).  ↩

Leave a Reply

Your email address will not be published. Required fields are marked *