Tag Archives: seminar

Seminar: the Role of Digital Humanities in a Natural Disaster

Screen Shot 2012 05 24 at 15 11 40

As part of the New Directions in the Digital Humanities series this week we had a very inspiring presentation from Dr Paul Millar, Associate Professor and Head of the Department of English, Cinema and Digital Humanities, the University of Canterbury (NZ).

The talk focused on the CEISMIC project, with which Millar and his team intended to ‘crowdsource’ a digital resource to preserve the record of the earthquakes’ impacts, document the long-term process of recovery, and discover virtual solutions to issues of profound heritage loss. Continue reading Seminar: the Role of Digital Humanities in a Natural Disaster

Seminar: the National Archives Online

Screen Shot 2012 03 12 at 15 21 23

Last Thursday (8th March), as part of the New Direction in the DH seminar we hosted a very interesting talk by Emma Bayne and Ruth Roberts about the most recent digital developments of the National Archives online presence.

Discovery and the Digital Challenge

Emma Bayne and Ruth Roberts talked about the changes to the National Archives online services. This includes the development of a new service – the Discovery Service. This is based on a new architecture and allows improved access to the National Archives Catalogue and digitised material. Features include a new taxonomy-based search and an API to allow bulk downloads of data.
They also discussed about some of the challenges facing the National Archives in delivering large quantities of digital images of records online – moving from a gigabyte scale to a petabyte scale in a short period of time.

Recording of the seminar:

If you’re interested you can listen again to the seminar (~1hr) by clicking here.

Relevant Links:

National archives main site
National Archives Labs
Discovery Service

DDH Internal Research Seminar: Tablet apps, or the future of Digital Scholarly Editions

At yesterday’s (23 November) Internal Research Seminar, Elena Pierazzo and Miguel Vieira presented Tablet apps, or the future of Digital Scholarly Editions, a preview of the paper that they will give tomorrow at the study-day “The Future of the Book“.

The paper discussed those opportunities that tablet device could offer for the digital publication of scholarly editions. This work stemmed from Patricia Searl’s MA dissertation, who completed her Digital Humanities MA at DDH last year.

The main issue arises from the apparent lack of use of digital scholarly editions published on the web. The speakers found particularly worrying the fact that these editions are never part of undergraduate syllabi, even though they usually offer high quality scholarly texts with free, open access.

Tablet devices are user-friendly, portable and create a stronger sense of ownership compared to websites. This makes for an experience closer to reading from a book, but would it be true for digital scholarly editions? Would it work for editions that need sophisticated ways of presenting historical evidence and editorial work? The presenters believe that the eBook model would probably not be sufficient, but the “App” paradigm might.

 

Enhanced eBooks already exploit this idea by introducing a highly interactive, almost ludic component to the digital edition. Nonetheless, none of these apps have been connected to scholarly work so far. The speakers noticed, for example, how it is impossible to find an editor of T.S. Eliot’s The Waste Land enhanced eBook (see image).

Finally, the paper also discussed those issues that would be familiar to any smartphone or tablet user, such as cross-device compatibility, keeping up-to-date with new OSs and heavily controlled app “markets”. These issues influence the true user reach, but first of all complicate development quite substantially (even more than, for example, dealing with cross-browser issues).

The paper was followed by a lively discussion. There was general agreement that scholarly editing should get involved in tablet computing; the best way of doing so, however, is yet to be fully understood and provides fertile ground for an exciting new research area.

The DDH Internal Research Seminar series aims at giving a space to DDH staff to present their research and discuss them in an informal environment.

Tagore digital editions and Bengali textual computing

Professor Sukanta Chaudhuri yesterday gave a very interesting talk on the scope, methods and aims of ‘Bichitra’ (literally, ‘the various’), the ongoing project for an online variorum edition of the complete works of Rabindranath Tagore in English and Bengali. The talk (part of this year’s DDH research seminar) highlighted a number of issues I personally wasn’t much familiar with, so in this post I’m summarising them a bit and then highlighting a couple of possible suggestions.

Sukanta Chaudhuri is Professor Emeritus at Jadavpur University, Kolkata (Calcutta), where he was formerly Professor of English and Director of the School of Cultural Texts and Records. His core specializations are in Renaissance literature and in textual studies: he published The Metaphysics of Text from Cambridge University Press in 2010. He has also translated widely from Bengali into English, and is General Editor of the Oxford Tagore Translations.

Rabindranath Tagore (1861 – 1941), the first nobel laureate of Asia, was arguably the most important icon of modern Indian Renaissance. This recent project on the electronic collation of Tagore texts, called ‘the Bichitra project’, is being developed as part of the national commemoration of the 150th birth anniversary of the poet (here’s the official page). This is how the School of Cultural Texts and Records summarizes the project’s scope:

The School is carrying out pioneer work in computer collation of Tagore texts and creation of electronic hypertexts incorporating all variant readings. The first software for this purpose in any Indian language, named “Pathantar” (based on the earlier version “Tafat”), has been developed by the School. Two pilot projects have been carried out using this software, for the play Bisarjan (Sacrifice) and the poetical collection Sonar Tari (The Golden Boat). The CD/DVDs contain all text files of all significant variant versions in manuscript and print, and their collation using the ”Pathantar” software. The DVD of Sonar Tari also contains image files of all the variant versions. These productions are the first output of the series “Jadavpur Electronic Tagore”.
Progressing from these early endeavours, we have now undertaken a two-year project entitled “Bichitra” for a complete electronic variorum edition of all Tagores works in English and Bengali. The project is funded by the Ministry of Culture, Government of India, and is being conducted in collaboration with Rabindra-Bhavana, Santiniketan. The target is to create a website which will contain (a) images of all significant variant versions, in manuscript and print, of all Tagores works; (b) text files of the same; and (c) collation of all versions applying the “Pathantar” software. To this end, the software itself is being radically redesigned. Simultaneously, manuscript and print material is being obtained and processed from Rabindra-Bhavana, downloaded from various online databases, and acquired from other sources. Work on the project commenced in March 2011 and is expected to end in March 2013, by which time the entire output will be uploaded onto a freely accessible website.

 

A few interesting points

 

  • Tagore, as Sukanta noted, “wrote voluminously and revised extensively“. From a DH point of view this means that creating a comprehensive digital edition of his works would require a lot of effort – much more than what we could easily pay people for, if we wanted to mark up all of this text manually. For this reason it is fundamental to find some type of semi-automatic methods for aligning and collating Tagore’s texts, e.g. the ”Pathantar” software. Follows a screenshot of the current collation interface.

    Tagore digital editions

  • The Bengali language, which is used by Tagore, is widely spoken in the world (it is actually one of the most spoken languages, with nearly 300 million total speakers). However this language poses serious problems for a DH project. In particular, the writing system is extremely difficult to parse using traditional OCR technologies: its vowel graphemes are mainly realized not as independent letters but as diacritics attached to its consonant letters. Furthermore clusters of consonants are represented by different and sometimes quite irregular forms, thus learning to read is complicated by the sheer size of the full set of letters and letter combinations, numbering about 350 (from wikipedia).
  • One of the critical points that emerged during the discussion had to do with the visual presentation of the results of the collation software. Given the large volume of text editions they’re dealing with, and the potential vast amount of variations between one edition and the others, a powerful and interactive visualization mechanism seems to be strongly needed. However it’s not clear what are the possible approaches on this front..
  • Textual computing, Sukanta pointed out, is not as developed in India as it is in the rest of the world. As a consequence, in the context of the “Bichitra” project widely used approaches based on TEI and XML technologies haven’t really been investigated enough. The collation software mentioned above obviously marks up the text in some way; however this markup remains hidden to the user and much likely it is not compatible with other standards. More work would thus be desirable in this area – in particular within the Indian continent.
  • Food for thought

     

  • On the visualization of the results of a collation. Some inspiration could be found in the type of visualizations normally used in version control software systems, where multiple and alternative versions of the same file must be tracked and shown to users. For example, we could think of the visualizations available on GitHub (a popular code-sharing site), which are described on this blog post and demonstrated via an interactive tool on this webpage. Here’s a screenshot:Github code visualization

    The situation is striking similar – or not? Would it be feasible to reuse one of these approaches with textual sources?
    Another relevant visualization is the one used by popular file-comparison softwares (eg File Merge on a Mac) for showing differences between two files:

    File Merge code visualization

  • On using language technologies with Bengali. I did a quick tour of what’s available online, and (quite unsurprisingly, considering the reputation Indian computer scientists have) found several research papers which seem highly relevant. Here’s a few of them:- Asian language processing: current state-of-the-art [text]
    Research report on Bengali NLP engine for TTS [text]
    – The Emile corpus, containing fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages [homepage]
    A complete OCR system for continuous Bengali characters [text]
    Parsing Bengali for Database Interface [text]
    Unsupervised Morphological Parsing of Bengali [text]
  • On open-source softwares that appear to be usable with Bengali text. Not a lot of stuff, but more than enough to get started (the second project in particular seems pretty serious):- Open Bangla OCR – A BDOSDN (Bangladesh Open Source Development Network) project to develop a Bangla OCR
    Bangla OCR project, mainly focused on the research and development of an Optical Character Recognizer for Bangla / Bengali script
  •  

    Any comments and/or ideas?