Category Archives: Tools and Software

Discussions, descriptions, and recommendations of software relevant to digital humanists.

Registration Opens for DigiPal V: Wednesday 2nd September 2015

Dear all,

It is with great delight that the DigiPal team at the Department of Digital Humanities (King’s College London) invite you to attend the fifth DigiPal Symposium at King’s on Wednesday 2nd September 2015.

As usual, the focus of the Symposium will be the computer-assisted study of medieval handwriting and manuscripts. Papers will cover on-line learning resources for palaeography, crowdsourcing Ælfric, image processing techniques for  studying manuscripts, codicology, the Exon Domesday book and medieval Scottish charters.

Speakers will include:

  •  Ben Albritton (Stanford): “Digital Abundance, or: What Do We Do with All this Stuff?”
  • Francisco J. Álvarez López (Exeter/King’s College London): “Scribal Collaboration and Interaction in Exon Domesday: A DigiPal Approach”
  • Stewart Brookes (King’s College London): “Charters, Text and Cursivity: Extending DigiPal’s Framework for Models of Authority”
  • Ainoa Castro Correa (King’s College London): “VisigothicPal: The Quest Against Nonsense”
  • Orietta Da Rold (Cambridge): “‘I pray you that I may have paupir, penne, and inke’: Writing on Paper in the Late Medieval Period”
  • Christina Duffy (British Library): “Effortless Image Processing: How to Get the Most Out of your Digital Assets with ImageJ”
  • Kathryn Lowe (Glasgow)
  • Maayan Zhitomirsky-Geffet (Bar-Ilan University) and Gila Prebor (Bar-Ilan University): “Towards an Ontopedia for Hebrew Manuscripts”
  • Leonor Zozaya: “Educational Innovation: New Digital Games to Complement the Learning of Palaeography”
  • Plus a roundtable with Arianna Ciula (Roehampton), Peter Stokes (King’s College London) and Dominique Stutzmann (Institut de recherche et d’histoire des textes).

Registration is free and includes refreshments and sandwiches.
It’s easy: just sign-up with Eventbrite:

For further details, please visit

And, in case that wasn’t enough palaeography for one early September, the following day there’s also the  “The Image of Cursive Handwriting: A One Day Workshop”, with David Ganz, Teresa Webber, Irene Ceccherini, David Rundle and Marc Smith. To register, visit

Very much looking forward to seeing you in September, at one or both events,

Stewart Brookes and Peter Stokes

Dr Stewart J Brookes
Department of Digital Humanities
King’s College London

Registration Opens for “Digital Approaches to Hebrew Manuscripts” at KCL…


We are delighted to announce the programme for On the Same Page: Digital Approaches to Hebrew Manuscripts at King’s College London. This two-day conference will explore the potential for the computer-assisted study of Hebrew manuscripts; discuss the intersection of Jewish Studies and Digital Humanities; and share methodologies. Amongst the topics covered will be Hebrew palaeography and codicology, the encoding and transcription of Hebrew texts, the practical and theoretical consequences of the use of digital surrogates and the visualisation of manuscript evidence and data. For the full programme and our Call for Posters, please see below.

Registration for the conference is free. As places are limited, we recommend registering at an early point to avoid disappointment. To register, please click on this link:

Refreshments will be provided, but attendees should make their own arrangements for lunch.

Very much looking forward to seeing you in May,

Stewart Brookes, Debora Matos, Andrea Schatz and Peter Stokes

Organised by the Departments of Digital Humanities and Theology & Religious Studies (Jewish Studies)
Co-sponsor: Centre for Late Antique & Medieval Studies (CLAMS), King’s College London

Call for Posters
Are you involved in an interesting project in the wider field of Jewish Studies? Would you like to have a presence at the conference even though you’re not giving a paper? If so, then you might like to consider submitting a poster which summarises the objectives, significance and outcomes of your research project. We’ll display posters throughout the conference and if you attend with your poster, then you can talk about your work with attendees during the lunch breaks. Display space is limited, so please send a brief summary (max. 100 words) of your research/project to The deadline for the receipt of submissions is Thursday 30th April 2015. Notice of acceptance will be sent as soon as possible after that date.

Conference Programme 

Monday 18th May 2015

8.45 – Coffee and registration

9.15 – Welcome

  • Stewart Brookes and Débora Matos (King’s College London)

9.30 – Keynote lecture

  • Chair: Andrea Schatz (King’s College London)
  • Colette Sirat (École Pratique des Hautes Études): The Study of Medieval Manuscripts in a Technological World

10.30 – Coffee/Tea

11.00 – Session 1: Digital Libraries: From Manuscripts to Images

  • Chair: tbc
  • Ilana Tahan (British Library): The Hebrew Manuscripts Digitisation Project at the British Library: An Assessment
  • César Merchán-Hamann (Bodleian Library): The Polonsky Digitisation Project: Hebrew Materials
  • Emile Schrijver (Bibliotheca Rosenthaliana/University of Amsterdam): The Real Challenges of Mass Digitization for Hebrew Manuscript Research

12.30 – Lunch break

13.30 – Session 2: (Roundtable): Digital Images: Scale and Scope

  • Chair: Jonathan Stökl (King’s College London)
  • Rahel Fronda (University of Oxford): From Micrography to Macrography: Digital Approaches to Hebrew Script
  • Ilana Wartenberg (UCL): Digital Images in the Research of Medieval Hebrew Scientific Treatises
  • Estara Arrant (University of Oxford): Foundations, Errors, and Innovations: Jacob Mann’s Genizah Research and the Use of Digitised Images in Hebrew Manuscript Analysis
  • Dalia-Ruth Halperin (Talpiot College of Education, Holon): Choreography of the Micrography

15.00 – Coffee/Tea

15.30 – Session 3: Digital Space: Joins and Links

  • Chair: Paul Joyce (King’s College London)
  • Sacha Stern (UCL): The Calendar Dispute of 921/2: Assembling a Corpus of Manuscripts from the Friedberg Genizah Project
  • Israel Sandman (UCL): Manuscript Images: Revealing the History of Transmission and Use of Literary Works
  • Judith Kogel (CNRS, Paris): How to Use Internet and Digital Resources to Identify Hebrew Fragments

17.00 – Keynote lecture

  • Chair: Stewart Brookes (King’s College London)
  • Judith Olszowy-Schlanger (École Pratique des Hautes Études): The Books Within Books Database and Its Contribution to Hebrew Palaeography

Tuesday 19th May 2015

9.15 – Keynote lecture

  • Chair: Peter Stokes (King’s College London)
  • Malachi Beit-Arié (Hebrew University of Jerusalem): The SfarData Codicological Database: A Tool for Dating and Localizing Medieval Codices, Historical Research and the Study of Book Production – Methodology and Practice

10.15 – Session 4: Digital Palaeography: Tools and Methods

  • Chair: Julia Crick (King’s College London)
  • Débora Matos (King’s College London): Building Digital Tools for Hebrew Palaeography: The SephardiPal Database
  • Stewart Brookes (King’s College London): A Test-Case for Extending SephardiPal: The Montefiore Mainz Mahzor

11.15 – Coffee/Tea

11.45 – Session 5: Digital Corpora: Analysis and Editing

  • Chair: Eyal Poleg (Queen Mary University of London)
  • Ben Outhwaite (Cambridge University Library): Beyond the Aleppo Codex: Why the Hebrew Bible Deserves a Better Internet
  • Daniel Stökl Ben Ezra (École Pratique des Hautes Études), co-author Hayim Lapin (University of Maryland): A Digital Edition of the Mishna: From Images to Facsimile, Text and Grammatical Analysis
  • Nachum Dershowitz (Tel Aviv University), co-author Lior Wolf (Tel Aviv University): Computational Hebrew Manuscriptology

13.15 – Lunch break

14.30 – Keynote lecture

  • Chair: Débora Matos (King’s College London)
  • Edna Engel (The Hebrew Palaeography Project, Israel): Hebrew Palaeography in the Digital Age

15.30 – Session 6: Data and Metadata

  • Chair: tbc
  • Sinai Rusinek (The Polonsky Academy at the Van Leer Jerusalem Institute): Digitally Reading from Right to Left
  • Yoed Kadary (Ben Gurion University): The Challenges of Metadata Mining in Digital Humanities Projects

16.30 – Concluding roundtable

17.00 – Refreshments

The conference convenors would like to thank the Departments of Digital Humanities and Theology & Religious Studies as well as the Faculty of Arts & Humanities and the Centre for Late Antique and Medieval Studies at King’s College London for their generous support. With thanks to the Free Library of Philadelphia Rare Book Department for permission to use the image from Lewis O 140 (The Masoretic Bible of Portugal). Photograph courtesy of Débora Matos.




Improve performances with jQuery best practices

Nowadays we include jQuery almost by default in most of our DDH projects. It offers so much to both designers and developers that it would be very difficult to complete a project without it.

As with most libraries, jQuery includes an incredibly large collection of elements and behaviours, more than we will ever use in one single development.

  • The good bits: it’s ready to use, highly customisable and, most more often than not, cross browser compatible.
  • The bad bit: might not be as performant as a custom built JS library.

Performance is key to a successful website, even more so when dealing with a large amount of data.

Although we might not want to go down the route of custom built JS libraries (time consuming), we can adopt some good practices to keep performances at their best while enjoying every bit of this jQuery magic.

This document covers some of the common standards worth looking at: jQuery Coding Standards & Best Practices.

Kiln development: convenience features

Until recently, Kiln has felt to me like a bare bones developer’s tool, providing a simple but powerful framework for doing whatever XML publishing you might want, without any hand holding beyond some common elements added on to Cocoon. That has changed, somewhat, to the extent that now a new user, completely unfamiliar with XSLT, is guided in a new project to a site displaying TEI as HTML, complete with faceted search, without touching any code.

This, coupled with the new tutorial, hopefully makes Kiln a much more attractive proposition to those who are not so technically inclined (or who have not been forced to use and learn it by working at DDH!). Now I’ve gone back to adding in elements that aid the developer, and I’ll describe three of those.

Continue reading Kiln development: convenience features

People of Medieval Scotland database launch

A public launch of the AHCR-funded Peoples of Medieval Scotland (PoMS) database by a Scottish Cabinet Secretary was held at the Sir Charles Wilson Lecture Theatre at the University of Glasgow on Wednesday, 5 September. The official launch by the Education and Lifelong Learning Secretary Michael Russell last week capped work carried out over 5 years by John Bradley and Michele Pasin from KCL’s Department of Digital Humanities (DDH) with historians from 3 institutions. Dauvit Broun, the lead historian at the University of Glasgow on the PoMS project said at the public launch that the database “demonstrated [a] potential to transform History as a public discipline” through “the new techniques of digital humanity”, noting that it has been a “privilege and a pleasure” to work with the team’s “exceptional people”.


One of the highlights of the launch was the brand new PoMS Labs section. This is an innovative and thought-provoking area of the site that features tools and visualizations aimed at letting users gain new perspectives on the database materials. Fore example, such tools allow to you browse incrementally the network of relationships linking persons/institutions to other persons/institutions; to compare the different roles played by two agents played in the context of their common events; or to browse iteratively transactions and the witnesses associated to them.


In general, PoMS Labs aims at addressing the needs of both non-expert users (e.g., learners) – who could simultaneously access the data and get a feeling for the meaningful relations among them – and experts alike (e.g., academic scholars) – who could be facilitated in the process of analysing data within predefined dimensions, so to highlight patterns of interest that would be otherwise hard to spot. For this reasons, the Labs have been welcomed warmly by both the academics present at the launch, and the minister, who felt that this kind of tools could revolutionise the teaching of history in schools.

More info:

Digital Humanities Software Developers Workshop

This post was co-edited by Geoffoy Noel and Miguel Vieira.

The first workshop for Digital Humanities Software Developers took place at the University of Cologne from the 28th to the 29th of November. The main aim of the workshop was to bring together the DH developers, discuss ideas, and work collaborative in projects. Around 40 participants attended to the workshop, with a large majority of developers and some researchers. Continue reading Digital Humanities Software Developers Workshop

You SPILt my code: a modular approach to creating web front ends

One of the projects I’ve been working on since starting at DDH in May is a review of the front-end development framework we’re currently using to build websites, sUPL or the Simple Unified Presentation Layer. The aim of sUPL was to be a lightweight markup scheme — lightweight both in terms of using minimal HTML markup and short class and ID names (commonly used to apply CSS styles and to trigger Javascript-based interactivity).

Whilst sUPL had served the department well for a number of projects I wanted to update it to reflect recent changes in the front-end development world and also to put the emphasis back on the “simple” in sUPL. After reading around and trying out a number of existing front-end frameworks (e.g.  as Blueprint,  YUI, HTML5 Boilerplate, OOCSS and 320 and Up) I felt our own framework should be updated along the following lines:

  • Be written it in HTML5, to make use of new structural elements and prepare the ground for the use of HTML5 APIs;
  • Move away from terse class names to longer but more “human readable” ones;
  • Employ the Object Oriented CSS (OCSS) methodology of maximising the reuse of CSS code by only applying CSS styles to classes, not IDs;
  • Using the OOCSS concept of “objects”, that is that is reusable chunks of HTML, CSS and Javascript code to build common design patterns.

Welcome to SPIL: the Simple Presentation and Interface Library

There are quite a few frameworks out there so why create another one? Most of the existing frameworks have been created for highly specific purposes (e.g. YUI) or they are more generic (e.g. the HTML5 Boilerplate). SUPL’s successor, SPIL (Simple Presentation and Interface Library) can be thought of as a toolkit (or Lego!)  for constructing web pages and applications, providing both a generic structure for page layout and the ability to “plug in” interface design patterns which will work “out of the box”.


SPIL makes use of the new HTML5 structural elements such as header, footer, section, nav and aside, loading in the Modernizr Javascript library to provide support for older, less capable browsers such as Internet Explorer pre-version 9. Of course relying on Javascript to provide this functionality may not always be appropriate, so SPIL provides some alternative markup in the form of reliable old-fashioned divs should you want to use XHTML1.0. For instance if we were marking up a primary navigation element in HTML5 we would use:

<nav class=”primary”> …. </nav>

But should we want to stick with XHTML we could use:

<div class=”nav primary”> … </div>


The development of SPIL has been heavily influenced by Object Oriented CSS (OOCSS) — both the concept and the CSS library. OOCSS encourages the reuse of code in order to enhance performance and keep down CSS file size (also approximating the DRY — Do Not Repeat Yourself  — concept in software engineering). One way to do this is to only style on classes — which can be used any number of times on a page — and not IDs — which can only be used once and also interfere with style inheritance. Class names can be chained together to combine styling effects, reusing predefined styles.

A useful concept in OOCSS is that of “modules”. Although SPIL’s implementation differs from that of the OOCSS library, the ideas are very similar. For instance, we can create a module object for a common design pattern, a tabbed display that can be plugged straight into a page template:

<div class=”mod tabs”>
 <ul class=”tabControls inline”>
  <li class=”tabControlHeader”><a href=”#tab1”>Tab 1</a></li>
  <li class=”tabControlHeader”><a href=”#tab2”>Tab 2</a></li>
 <div class=”tabPanes”>
 <section class=”tabPaneItem” id=”tab1”>Tab 1 content</section>
 <section class=”tabPaneItem” id=”tab2”>Tab 2 content</section>

The structure for this module within the identifying div is built around what jQuery Tool’s implementation of tabs would expect but the class names could also be applied to other implementations. To use this code with jQuery Tools we would simply include a line in our Javascript file, e.g.:

$(".tabControls").tabs(".tabPanes > section");

An advantage of taking a modularised approach to code is we can start to build a library of predefined code snippets that can be slotted in place by anyone involved in interface building, from UI designers and programmers wanting to create a functional prototype through to front-end developers working on the final site build.

Development of SPIL

SPIL is being developed iteratively alongside new web projects within DDH. We’re feeding the work straight into an open source project which we hope will be available for release soon. If you have any comments or if there’s anything you’d like to see in the framework, why not less us know via the comments?

Geocoding your data

In many projects, the collection of data results in a list of items whose distribution can be shown spatially. The process of Geocoding assigns a location (or set of locations) to an item of data; perhaps the site of a battle, the source of a text or the home of notable person. Such visualisations allow for new perspectives on the relationships of data , spatial or otherwise. A long winded way of geocoding would be to simpy go through the data, record by record, and assign it a set of coordinates using a third party resource. If the list is short or if a very precise location is needed then this may be a practical solution, however it easy to underestimate how long it takes to go through what may at first appear to be a short list.

Alternatively, data can be captured directly into a Geographic Information System (GIS) such as ArcGIS or the freely available Quantum GIS, ensuring a location point is recorded with each record added to the data set, though this may be impractical in dark, dusty archive or may not lend itself well to your workflow. Often projects don’t require the sort of pinpoint accuracy that might be needed in scientific project and regional or town level locations are suitable and even preferable.

In many cases these research records end up as spreadsheets with a single column dedicated to recording location and given the ease with which data formats can be converted and imported into other systems, this is an efficient way of systematically recording and organising.

If you have the know how, and perhaps some special data requirements, building your own solution for geocoding is fairly straightforward. Given a list of post codes for example you will quickly be able to create a geocoded dataset with a simple data table join. Matching multiple fields, expecting multiple matches and ranking the results is more tricky.

Fortunately, there seem to be an ever increasing number of freely available resources with which to geocode your data. The Ordnance Survey of Great Britain last year released post code point data and a national gazetteer that can be used to resolve a placename to coordinates. Going beyond the UK, Nominatim is a geolocation service offered by OpenStreetMap, and the Google maps API also offers geocoding (though this is usually limited to restricted number of requests per day). Pamela Fox has created a great Google gadget for use with Google docs that will take your spreadsheet, query the Google geocoding service and return a list of coordinates for you to incorporate into new columns. A especially nice feature is that even when you (inevitably) have a few records left over that couldn’t be matched, they can be physically dragged and dropped on a familiar Google map and these too are given coordinates.

The problem with making the best use of these resources is that very little thought is given to how location is recorded at the moment of capture so that it might be made use of programmatically at a later stage, usually with the location field being treated as free text rather than a discrete data type. The spreadsheet column for place or location may contain extraneous words and punctuation which prevent automatic matching. The hierachy of location information is rarely considered. Sometimes, secondary or tertiary possible candidate locations are recorded in the same column. In order to use any automatic geocoding process there is usually a need for extensive data cleaning which must often be done at least partially manually.

To avoid this situation a few simple guidleines should be considered before embarking on a spreadsheet data acquisition that you anticipate may be geocoded.

  1. Always record the best location you can regardless of your requirements – this will give you far more options for geocoding later on. If you have post code use it, a house number even better
  2. Always split the location components across several columns – Don’t mix in cities with villages and colloquial names. Have a hierarchy in mind, split it across several columns and stick to it. e.g. House, Road, Town, County, Country. You don’t have to fill in every field for each record but keep the schema consistent. Don’t worry about presentational considerations as these values can be concatenated in another column and the data will be far more easily manipulable in this form
  3. Don’t merge several locations in one field – If, as is often the case there is more than one candidate for a location, record them in seperate columns. A GIS technician will be able to associate several points to one record if necessary. If you are worried about the spreadsheet becoming unwieldy, put these columns to the far right of the sheet as they may not be needed very often
  4. Avoid abbreviations and colloquial names – they are hard to match up in geocoding exercises
  5. Avoid punctuation – Question marks and exclamations will mess up the match and even humble commas should be avoided as many formats will use them as column delimiters
  6. Avoid ambiguity – Add more detail than you might think immediately necessary and spare a thought for the poor geocoder who, less familiar than yourself with the dataset, may need to choose from one of the 11 different Newports in the UK or more than 30 worldwide!
  7. Keep it on one line – Other data fields may naturally lend themselves to multiple row entries, but try to stick to the rule of one row, one record. If you need more space in a cell, turn on word wrapping and make the cell higher and wider.

Following these guidelines will allow you to make the best use of your data when using tolls like Google docs gadget above. You can try geocoding on different columns, or combinations of columns, or using the best available column in each record.



Tagore digital editions and Bengali textual computing

Professor Sukanta Chaudhuri yesterday gave a very interesting talk on the scope, methods and aims of ‘Bichitra’ (literally, ‘the various’), the ongoing project for an online variorum edition of the complete works of Rabindranath Tagore in English and Bengali. The talk (part of this year’s DDH research seminar) highlighted a number of issues I personally wasn’t much familiar with, so in this post I’m summarising them a bit and then highlighting a couple of possible suggestions.

Sukanta Chaudhuri is Professor Emeritus at Jadavpur University, Kolkata (Calcutta), where he was formerly Professor of English and Director of the School of Cultural Texts and Records. His core specializations are in Renaissance literature and in textual studies: he published The Metaphysics of Text from Cambridge University Press in 2010. He has also translated widely from Bengali into English, and is General Editor of the Oxford Tagore Translations.

Rabindranath Tagore (1861 – 1941), the first nobel laureate of Asia, was arguably the most important icon of modern Indian Renaissance. This recent project on the electronic collation of Tagore texts, called ‘the Bichitra project’, is being developed as part of the national commemoration of the 150th birth anniversary of the poet (here’s the official page). This is how the School of Cultural Texts and Records summarizes the project’s scope:

The School is carrying out pioneer work in computer collation of Tagore texts and creation of electronic hypertexts incorporating all variant readings. The first software for this purpose in any Indian language, named “Pathantar” (based on the earlier version “Tafat”), has been developed by the School. Two pilot projects have been carried out using this software, for the play Bisarjan (Sacrifice) and the poetical collection Sonar Tari (The Golden Boat). The CD/DVDs contain all text files of all significant variant versions in manuscript and print, and their collation using the ”Pathantar” software. The DVD of Sonar Tari also contains image files of all the variant versions. These productions are the first output of the series “Jadavpur Electronic Tagore”.
Progressing from these early endeavours, we have now undertaken a two-year project entitled “Bichitra” for a complete electronic variorum edition of all Tagores works in English and Bengali. The project is funded by the Ministry of Culture, Government of India, and is being conducted in collaboration with Rabindra-Bhavana, Santiniketan. The target is to create a website which will contain (a) images of all significant variant versions, in manuscript and print, of all Tagores works; (b) text files of the same; and (c) collation of all versions applying the “Pathantar” software. To this end, the software itself is being radically redesigned. Simultaneously, manuscript and print material is being obtained and processed from Rabindra-Bhavana, downloaded from various online databases, and acquired from other sources. Work on the project commenced in March 2011 and is expected to end in March 2013, by which time the entire output will be uploaded onto a freely accessible website.


A few interesting points


  • Tagore, as Sukanta noted, “wrote voluminously and revised extensively“. From a DH point of view this means that creating a comprehensive digital edition of his works would require a lot of effort – much more than what we could easily pay people for, if we wanted to mark up all of this text manually. For this reason it is fundamental to find some type of semi-automatic methods for aligning and collating Tagore’s texts, e.g. the ”Pathantar” software. Follows a screenshot of the current collation interface.

    Tagore digital editions

  • The Bengali language, which is used by Tagore, is widely spoken in the world (it is actually one of the most spoken languages, with nearly 300 million total speakers). However this language poses serious problems for a DH project. In particular, the writing system is extremely difficult to parse using traditional OCR technologies: its vowel graphemes are mainly realized not as independent letters but as diacritics attached to its consonant letters. Furthermore clusters of consonants are represented by different and sometimes quite irregular forms, thus learning to read is complicated by the sheer size of the full set of letters and letter combinations, numbering about 350 (from wikipedia).
  • One of the critical points that emerged during the discussion had to do with the visual presentation of the results of the collation software. Given the large volume of text editions they’re dealing with, and the potential vast amount of variations between one edition and the others, a powerful and interactive visualization mechanism seems to be strongly needed. However it’s not clear what are the possible approaches on this front..
  • Textual computing, Sukanta pointed out, is not as developed in India as it is in the rest of the world. As a consequence, in the context of the “Bichitra” project widely used approaches based on TEI and XML technologies haven’t really been investigated enough. The collation software mentioned above obviously marks up the text in some way; however this markup remains hidden to the user and much likely it is not compatible with other standards. More work would thus be desirable in this area – in particular within the Indian continent.
  • Food for thought


  • On the visualization of the results of a collation. Some inspiration could be found in the type of visualizations normally used in version control software systems, where multiple and alternative versions of the same file must be tracked and shown to users. For example, we could think of the visualizations available on GitHub (a popular code-sharing site), which are described on this blog post and demonstrated via an interactive tool on this webpage. Here’s a screenshot:Github code visualization

    The situation is striking similar – or not? Would it be feasible to reuse one of these approaches with textual sources?
    Another relevant visualization is the one used by popular file-comparison softwares (eg File Merge on a Mac) for showing differences between two files:

    File Merge code visualization

  • On using language technologies with Bengali. I did a quick tour of what’s available online, and (quite unsurprisingly, considering the reputation Indian computer scientists have) found several research papers which seem highly relevant. Here’s a few of them:- Asian language processing: current state-of-the-art [text]
    Research report on Bengali NLP engine for TTS [text]
    – The Emile corpus, containing fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages [homepage]
    A complete OCR system for continuous Bengali characters [text]
    Parsing Bengali for Database Interface [text]
    Unsupervised Morphological Parsing of Bengali [text]
  • On open-source softwares that appear to be usable with Bengali text. Not a lot of stuff, but more than enough to get started (the second project in particular seems pretty serious):- Open Bangla OCR – A BDOSDN (Bangladesh Open Source Development Network) project to develop a Bangla OCR
    Bangla OCR project, mainly focused on the research and development of an Optical Character Recognizer for Bangla / Bengali script

    Any comments and/or ideas?