Category Archives: DH projects

This category comprises projects in the DH world that we run into, maybe like, or think are at least worthwhile pointing out for discussion.

Working to define “factoid prosopography”

The Department of Digital Humanities (DDH) (previously the Centre for Computing in the Humanities) at King’s College London first developed the idea of factoid prosopography back in about 1995. It grew out of a need then to think about how to do prosopography in a structured data (database) environment.  Since these early days, CCH/DDH has been involved in 6 prosopographical projects (latest finished in 2016) that consciously took up the factoid approach.  Furthermore, the idea of a factoid prosopography has generated interest from historians around the world who wish to do their own projects independently from DDH.

Although papers have been published on the idea of the factoid in prosopography (see Bradley and Short 2005 and Pasin and Bradley 2015), they provide only a rather high level overview of what the structural implications for a factoid prosopography might be.  Hence, this new site, developed by me, at:

http://factoid-dighum.kcl.ac.uk/

entitled Factoids: A site that introduces Factoid Prosopography.

Why have I created this site? I was not the originator of the idea of the factoid (this was developed by Gordon Gallacher and Dion Smyth in 1995, before I was at KCL).  However, I am the only one at DDH to have been involved in work (ranging from junior developer in the early days, to senior developer and then co-investigator more recently) in all six of the prosopographical projects that took a factoid approach and that involved DDH.  Based on this sense of continuity alone, it seemed natural for me to be the right person to describe the factoid approach as we have developed it at DDH.

The Factoids site contains three sections:

  1. a brief document that describes what I think factoid prosopography is all about,
  2. links to the various projects that CCH/DDH (and in particular, I) have been involved in that describe themselves as factoid prosopographies (along with two prosopographies done with the participation of CCH/DDH and myself that do not!), and
  3. a first attempt at a formal ontology (called the “Factoid Prosopography Ontology”: FPO) that is meant to capture what seems to me to be some of the important formalisms that have underpinned the CCH/DDH factoid prosopographies.

Item 3 deserves a brief explanation. This formal ontology is expressed as an RDFS/OWL ontology (and is, in fact, distributed through GITHub).  Why is this useful? Well, as one of the earliest thinkers about computer ontologies put it, an ontology is “an explicit specification of a conceptualization” (Gruber 1995, p. 908), and through it, as Noy and McGuinnes say, allows one:

  • To share common understanding of the structure of information among people or software agents
  • To enable reuse of domain knowledge
  • To make domain assumptions explicit
  • To separate domain knowledge from operational knowledge
  • To analyze domain knowledge (Noy and McGuinness, undated)

Prosopography data from different projects, by its very nature, is likely to gradually link together.  A structured data prosopography of Anglo-Saxon England is likely to connect through common persons and places with prosopographies for, say, Scandinavia that explore material of the same time period.  The more consistent the structure that they share, the more straightforward and stronger the connection that can be made between them.  As Gruber says, “[w]e use common ontologies to describe ontological commitments for a set of agents so that they can communicate about a domain of discourse without necessarily operating on a globally shared theory.  We say that an agent commits to an ontology if its observable actions are consistent with the definitions in the ontology.” (p 908)

By proposing FPO in the languages of the Semantic Web, RDFS and OWL, I was able to think of FPO as formulation of a formal core for factoid prosopography, a core that could be naturally expanded using the range of techniques that RDFS and OWL enable to meet the differing needs of various projects that implement it.

I think of FPO as still being rather preliminary, and as the title of this blog suggests, FPO should be most definitely thought of as work in progress. Indeed, for this reason I have assigned it a version number of 0.2 to it! Furthermore, although other projects can choose to commit to the view of factoid prosopography that FPO represents, DDH’s view of what formal structure enables and constitutes factoid prosopography, as presented in FPO, needn’t be the only possible view, of course.  Others are most definitely free to take up some part of the idea of factoid to suit the needs of their own project and yet implement a quite different approach to modelling their prosopography.  However, I think it fair to say that DDH has had perhaps the longest, and the most, experience with working successfully on the factoid approach, and for that reason alone it is worthwhile presenting, in some detail, what DDH’s views on these matters might be.  This is what the new Factoids site and the FPO prototype ontology are aiming to achieve.

References

Bradley, John and Harold Short (2005). “Texts into databases: the Evolving Field of New-style Prosopography” in Literary and Linguistic Computing Vol. 20 Suppl. 1:3-24.

Gruber, Thomas R. (1995). “Toward principles for the design of ontologies used for knowledge sharing?” In International Journal of Human-Computer Studies. Vol. 43, No. 5–6, November 1995, pp. 907-928.

Noy, Natalya F. and Deborah L. McGuinness (undated). “Ontology Development 101: A Guide to Creating Your First Ontology”. Online http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html

Pasin, Michele and John Bradley (2015). “Factoid-based prosopography and computer ontologies: Towards an integrated approach”. In Digital Scholarship in the Humanities. Vol 30 No. 1, pp. 86-97.  published online June 29, 2013 doi:10.1093/llc/fqt037.

Registration Opens for DigiPal V: Wednesday 2nd September 2015

Dear all,

It is with great delight that the DigiPal team at the Department of Digital Humanities (King’s College London) invite you to attend the fifth DigiPal Symposium at King’s on Wednesday 2nd September 2015.

As usual, the focus of the Symposium will be the computer-assisted study of medieval handwriting and manuscripts. Papers will cover on-line learning resources for palaeography, crowdsourcing Ælfric, image processing techniques for  studying manuscripts, codicology, the Exon Domesday book and medieval Scottish charters.

Speakers will include:

  •  Ben Albritton (Stanford): “Digital Abundance, or: What Do We Do with All this Stuff?”
  • Francisco J. Álvarez López (Exeter/King’s College London): “Scribal Collaboration and Interaction in Exon Domesday: A DigiPal Approach”
  • Stewart Brookes (King’s College London): “Charters, Text and Cursivity: Extending DigiPal’s Framework for Models of Authority”
  • Ainoa Castro Correa (King’s College London): “VisigothicPal: The Quest Against Nonsense”
  • Orietta Da Rold (Cambridge): “‘I pray you that I may have paupir, penne, and inke’: Writing on Paper in the Late Medieval Period”
  • Christina Duffy (British Library): “Effortless Image Processing: How to Get the Most Out of your Digital Assets with ImageJ”
  • Kathryn Lowe (Glasgow)
  • Maayan Zhitomirsky-Geffet (Bar-Ilan University) and Gila Prebor (Bar-Ilan University): “Towards an Ontopedia for Hebrew Manuscripts”
  • Leonor Zozaya: “Educational Innovation: New Digital Games to Complement the Learning of Palaeography”
  • Plus a roundtable with Arianna Ciula (Roehampton), Peter Stokes (King’s College London) and Dominique Stutzmann (Institut de recherche et d’histoire des textes).

Registration is free and includes refreshments and sandwiches.
It’s easy: just sign-up with Eventbrite: https://digipal-v.eventbrite.com

For further details, please visit http://www.digipal.eu/blog/digipal2015/

And, in case that wasn’t enough palaeography for one early September, the following day there’s also the  “The Image of Cursive Handwriting: A One Day Workshop”, with David Ganz, Teresa Webber, Irene Ceccherini, David Rundle and Marc Smith. To register, visit http://www.modelsofauthority.ac.uk/blog/cursivity-workshop/

Very much looking forward to seeing you in September, at one or both events,

Stewart Brookes and Peter Stokes

Dr Stewart J Brookes
Department of Digital Humanities
King’s College London

Digital Classicist seminar by MA DH students (Friday July 3)

The Pedagogical Value of Postgraduate Involvement in Digital Humanities Departmental Projects

Francesca Giovannetti, Asmita Jain, Ethan Jean-Marie, Paul Kasay, Emma King, Theologis Strikos, Argula Rublack, Kaijie Ying (King’s College London)

Digital Classicist London & Institute of Classical Studies seminar 2015

Friday July 3rd at 16:30, in Room 212, 26-29 Drury Lane, King’s College London, WC2B 5RL

The SNAP (Standards for Networking Ancient Prosopographies) Project at King’s College London, funded by the Arts and Humanities Research Council (AHRC) under the Digital Transformations big data scheme, seeks to act as a centralized portal for the study of ancient prosopographies. It links together dispersed, heterogeneous prosopographical datasets into a single collection. It will model a simple structure using Web and Linked data technologies to represent relationships between databases and to link from references in primary texts to authoritative lists of persons and names. By doing so it particularly addresses the issue of overlapping data between different prosopographical indexes. It has used as its starting point three large datasets from the classical world – the Lexicon of Greek Personal Names, Trismegistos, and the Prosopographia Imperii Romani – and aims to eventually be a comprehensive focal point for prosopographical information about the ancient world.

A team of voluntary postgraduate students from the department of Digital Humanities at King’s College London has been involved in the further development of certain parts of the project, which build upon the skills learnt in the offered Masters Degrees. These include coding tasks with Python, RDF, SPARQL queries and improvements to the final HTML pages as well as administrative tasks such as communicating and negotiating with potential contributors for the expansion of the dataset.

This initiative provides the students with the opportunity to apply these skills to a large scale project beyond the usual scope of the assignments related to the Masters Degrees. It gives the opportunity to experience how a team of digital humanists work towards a common objective. This offers a more well-rounded perspective of how the different components involved in a digital humanities project interact with and mutually support each other. The talk will be analysing the pedagogical value of these initiatives for postgraduate students approaching the work world or continued academic study.

ALL WELCOME

The seminar will be followed by wine and refreshments.

Invitation to Explore the Digital Humanities

The following survey from Clare Hooper (IT Innovation Centre, Southampton), who spoke in the Digital Humanities seminar this afternoon, will contribute to her ongoing work analysing the disciplinary and thematic contributions to DH from a combination of quantitative study of published papers and response from experts. Full survey at https://www.isurvey.soton.ac.uk/14422

An Invitation to Explore the Digital Humanities

Can you spare time to help our understanding of the Digital Humanities? I’m doing a disciplinary analysis of research contributions in DH. As part of the work, I’m seeking expert input on what disciplines are represented by certain keywords. I’d be most grateful for your input.

If you have any questions, please contact me, Clare Hooper, via email: cjh@it-innovation.soton.ac.uk. Please also let me know if you’d like to be kept informed about the results of this work.

Many thanks for your time!

—Clare Hooper

Digital Codex Mendoza online

Ernesto Miranda, a former student on the MA in Digital Humanities here at DDH, has just published a digital edition of the Codex Mendoza, a sixteenth-century manuscript that is now one of our most important sources for pre-Hispanic culture in Mexico. The project began life as an assignment for one of his MA modules, ‘Material Culture of the Book’, for which students had to plan how they would digitise a book or set of books. After graduating, Ernesto took his plan to Mexico’s National Institute of Anthropology and History, the Bodleian Library in Oxford, University of California Press, and King’s College London, and convinced them all to help him actually do it.

The edition is now freely available both online and as an app on the iTunes store, and has already been featured in the New York Times (among others). It allows you not only to view the pages of this famous and fascinating manuscript, but also to see in situ transcriptions, translations and supplementary material. See, for instance, this page on daily life (drag your mouse over the image to see the translation), or this one with annotations on territorial expansion.

Part of the project press-release is quoted below which gives some more background to the project. But now go, explore and enjoy!

The digital resource was created in collaboration with Bodleiain Library, Oxford, (where it has been held since 1659), King’s College London and University of California Press. It was developed in 2014, under the curatorial direction of Frances Berdan and Baltazar Brito.

The Codex Mendoza was created under the orders of Viceroy Antonio de Mendoza in 1542 to evoke an economic, political, and social panorama of the recently conquered lands. It has 72 illustrated pages glossed in Nahuatl, and 63 correspondent pages with Spanish glosses.

The Digital Codex Mendoza is part of INAH’s effort to highlight the importance of Mexican Codices for national history. This effort began in September, 2014, with the opening of the unprecedented exhibition, Códices de México, Memorias y Saberes, where 44 codices were shown for the first time to the general public. Codices are extremely sensitive documents in terms of preservation, so very few people have access to them. This is why the exhibition and the digital edition of codices held outside Mexico, such as Digital Codex Mendoza, are so important.

This effort is the first of a series that will virtually repatriate essential Mexican documents. It serves as a milestone regarding academic digital editions in Mexico and Latin America. Through this work the Instituto Nacional de Antropología e Historia (INAH), or National Institute of Anthropology and History, demonstrates the broad-based utility of this type of edition and the need to seek new forms of representation for such complex systems of knowledge. At the same time, the effort furthers the permanent calling of the INAH to study, preserve, and spread awareness of the cultural patrimony of the Mexican people, and create new ways of engagement with cultural heritage.

DH2014: SNAP:DRGN poster

Standards for Networking Ancient Prosopographies: Data and Relation in Greco-Roman Names

(This poster was also presented in the Ontologies for Prosopography pre-conference workshop on Tuesday July 8.)

In the poster session on Thursday July 10, this was up for two hours, was photographed several times, and Sebastian Rahtz and myself mostly chatted with many people who came over and expressed an interest in it. At least two, possibly three, of these people will turn out to be new project partners who we wouldn’t have known about otherwise, so I call this a win!

If you want to see the poster in full-size, you can find it on the wall in Drury Lane, in the corridor opposite room 220.

SNAP:DRGN consultation workshop

Last week we held the first workshop of the SNAP:DRGN (Standards for Networking Ancient Prosopographies: Data and Relations in Greco-roman Names) project, here in King’s College London.

As announced in our press release the SNAP:DRGN project aims to recommend standards for linking together basic identities and information about the entities in various person-databases relating to the ancient world, with a view to facilitating the production of a federated network including millions of ancient person-records, compatible with the Linked Ancient World Data graph. At this workshop (see Workshop slides and recap) we presented our preliminary proposals, data models and ontology for feedback to a representative group of scholars from both the classical prosopography/onomaastics and Linked Open Data communities. We also spoke to several people with large datasets that might be contributed to the SNAP graph.

It was decided that SNAP:DRGN will attempt to address recommendations to five key use-cases of networked prosopographical data:

  1. Putting prosopographical data online, including stable URIs and openly-accessible data and metadata in standard formats (not defined by us).
  2. Contibuting a summary of said data, including identifiers for all persons and a simplifed subset of core identifying information about each entity, to the SNAP graph to that it can be built upon and referred to by other projects.
  3. Annotating SNAP entities to establish alignment and identify co-references between related datasets.
  4. Marking-up online documents to identify personal names within them to persons identified in the SNAP graph and its constituent databases.
  5. Adding relationships between persons, both within and between databases: person X is the daughter of person Y; person A in one database was killed in battle by person B in another database.

The SNAP:DRGN project will continue to work on the “Cookbook“, the summary of recommendations and examples for these five use-cases, over the coming months, in the run-up to adding several new datasets to the graph. We are also experimenting in a modest way with tool and implementations for working with the vast graph of ancient persons created: named entity recognition (NER) workflows for finding new personal names in texts; co-reference resolution for finding overlap and links between datasets; search and browse tools and APIs. This work will be reported on the SNAP:DRGN blog, and in conferences and seminars throughout the year.

People of Medieval Scotland database launch

A public launch of the AHCR-funded Peoples of Medieval Scotland (PoMS) database by a Scottish Cabinet Secretary was held at the Sir Charles Wilson Lecture Theatre at the University of Glasgow on Wednesday, 5 September. The official launch by the Education and Lifelong Learning Secretary Michael Russell last week capped work carried out over 5 years by John Bradley and Michele Pasin from KCL’s Department of Digital Humanities (DDH) with historians from 3 institutions. Dauvit Broun, the lead historian at the University of Glasgow on the PoMS project said at the public launch that the database “demonstrated [a] potential to transform History as a public discipline” through “the new techniques of digital humanity”, noting that it has been a “privilege and a pleasure” to work with the team’s “exceptional people”.

screenshot

One of the highlights of the launch was the brand new PoMS Labs section. This is an innovative and thought-provoking area of the site that features tools and visualizations aimed at letting users gain new perspectives on the database materials. Fore example, such tools allow to you browse incrementally the network of relationships linking persons/institutions to other persons/institutions; to compare the different roles played by two agents played in the context of their common events; or to browse iteratively transactions and the witnesses associated to them.

screenshot

In general, PoMS Labs aims at addressing the needs of both non-expert users (e.g., learners) – who could simultaneously access the data and get a feeling for the meaningful relations among them – and experts alike (e.g., academic scholars) – who could be facilitated in the process of analysing data within predefined dimensions, so to highlight patterns of interest that would be otherwise hard to spot. For this reasons, the Labs have been welcomed warmly by both the academics present at the launch, and the minister, who felt that this kind of tools could revolutionise the teaching of history in schools.

More info:

Seminar: the Role of Digital Humanities in a Natural Disaster

Screen Shot 2012 05 24 at 15 11 40

As part of the New Directions in the Digital Humanities series this week we had a very inspiring presentation from Dr Paul Millar, Associate Professor and Head of the Department of English, Cinema and Digital Humanities, the University of Canterbury (NZ).

The talk focused on the CEISMIC project, with which Millar and his team intended to ‘crowdsource’ a digital resource to preserve the record of the earthquakes’ impacts, document the long-term process of recovery, and discover virtual solutions to issues of profound heritage loss. Continue reading Seminar: the Role of Digital Humanities in a Natural Disaster

Tagore digital editions and Bengali textual computing

Professor Sukanta Chaudhuri yesterday gave a very interesting talk on the scope, methods and aims of ‘Bichitra’ (literally, ‘the various’), the ongoing project for an online variorum edition of the complete works of Rabindranath Tagore in English and Bengali. The talk (part of this year’s DDH research seminar) highlighted a number of issues I personally wasn’t much familiar with, so in this post I’m summarising them a bit and then highlighting a couple of possible suggestions.

Sukanta Chaudhuri is Professor Emeritus at Jadavpur University, Kolkata (Calcutta), where he was formerly Professor of English and Director of the School of Cultural Texts and Records. His core specializations are in Renaissance literature and in textual studies: he published The Metaphysics of Text from Cambridge University Press in 2010. He has also translated widely from Bengali into English, and is General Editor of the Oxford Tagore Translations.

Rabindranath Tagore (1861 – 1941), the first nobel laureate of Asia, was arguably the most important icon of modern Indian Renaissance. This recent project on the electronic collation of Tagore texts, called ‘the Bichitra project’, is being developed as part of the national commemoration of the 150th birth anniversary of the poet (here’s the official page). This is how the School of Cultural Texts and Records summarizes the project’s scope:

The School is carrying out pioneer work in computer collation of Tagore texts and creation of electronic hypertexts incorporating all variant readings. The first software for this purpose in any Indian language, named “Pathantar” (based on the earlier version “Tafat”), has been developed by the School. Two pilot projects have been carried out using this software, for the play Bisarjan (Sacrifice) and the poetical collection Sonar Tari (The Golden Boat). The CD/DVDs contain all text files of all significant variant versions in manuscript and print, and their collation using the ”Pathantar” software. The DVD of Sonar Tari also contains image files of all the variant versions. These productions are the first output of the series “Jadavpur Electronic Tagore”.
Progressing from these early endeavours, we have now undertaken a two-year project entitled “Bichitra” for a complete electronic variorum edition of all Tagores works in English and Bengali. The project is funded by the Ministry of Culture, Government of India, and is being conducted in collaboration with Rabindra-Bhavana, Santiniketan. The target is to create a website which will contain (a) images of all significant variant versions, in manuscript and print, of all Tagores works; (b) text files of the same; and (c) collation of all versions applying the “Pathantar” software. To this end, the software itself is being radically redesigned. Simultaneously, manuscript and print material is being obtained and processed from Rabindra-Bhavana, downloaded from various online databases, and acquired from other sources. Work on the project commenced in March 2011 and is expected to end in March 2013, by which time the entire output will be uploaded onto a freely accessible website.

 

A few interesting points

 

  • Tagore, as Sukanta noted, “wrote voluminously and revised extensively“. From a DH point of view this means that creating a comprehensive digital edition of his works would require a lot of effort – much more than what we could easily pay people for, if we wanted to mark up all of this text manually. For this reason it is fundamental to find some type of semi-automatic methods for aligning and collating Tagore’s texts, e.g. the ”Pathantar” software. Follows a screenshot of the current collation interface.

    Tagore digital editions

  • The Bengali language, which is used by Tagore, is widely spoken in the world (it is actually one of the most spoken languages, with nearly 300 million total speakers). However this language poses serious problems for a DH project. In particular, the writing system is extremely difficult to parse using traditional OCR technologies: its vowel graphemes are mainly realized not as independent letters but as diacritics attached to its consonant letters. Furthermore clusters of consonants are represented by different and sometimes quite irregular forms, thus learning to read is complicated by the sheer size of the full set of letters and letter combinations, numbering about 350 (from wikipedia).
  • One of the critical points that emerged during the discussion had to do with the visual presentation of the results of the collation software. Given the large volume of text editions they’re dealing with, and the potential vast amount of variations between one edition and the others, a powerful and interactive visualization mechanism seems to be strongly needed. However it’s not clear what are the possible approaches on this front..
  • Textual computing, Sukanta pointed out, is not as developed in India as it is in the rest of the world. As a consequence, in the context of the “Bichitra” project widely used approaches based on TEI and XML technologies haven’t really been investigated enough. The collation software mentioned above obviously marks up the text in some way; however this markup remains hidden to the user and much likely it is not compatible with other standards. More work would thus be desirable in this area – in particular within the Indian continent.
  • Food for thought

     

  • On the visualization of the results of a collation. Some inspiration could be found in the type of visualizations normally used in version control software systems, where multiple and alternative versions of the same file must be tracked and shown to users. For example, we could think of the visualizations available on GitHub (a popular code-sharing site), which are described on this blog post and demonstrated via an interactive tool on this webpage. Here’s a screenshot:Github code visualization

    The situation is striking similar – or not? Would it be feasible to reuse one of these approaches with textual sources?
    Another relevant visualization is the one used by popular file-comparison softwares (eg File Merge on a Mac) for showing differences between two files:

    File Merge code visualization

  • On using language technologies with Bengali. I did a quick tour of what’s available online, and (quite unsurprisingly, considering the reputation Indian computer scientists have) found several research papers which seem highly relevant. Here’s a few of them:- Asian language processing: current state-of-the-art [text]
    Research report on Bengali NLP engine for TTS [text]
    – The Emile corpus, containing fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages [homepage]
    A complete OCR system for continuous Bengali characters [text]
    Parsing Bengali for Database Interface [text]
    Unsupervised Morphological Parsing of Bengali [text]
  • On open-source softwares that appear to be usable with Bengali text. Not a lot of stuff, but more than enough to get started (the second project in particular seems pretty serious):- Open Bangla OCR – A BDOSDN (Bangladesh Open Source Development Network) project to develop a Bangla OCR
    Bangla OCR project, mainly focused on the research and development of an Optical Character Recognizer for Bangla / Bengali script
  •  

    Any comments and/or ideas?