All posts by jamie

Why rewrite EATS using Topic Maps?

There are a few reasons I am rewriting EATS as a Topic Maps application. The first impetus was wanting to have the option of treating some or all of the infrastructural data (such as authorities, entity types, and name types) as entities in the system. As it stands, there is a strict separation between the two: infrastructural data allows one to make statements about entities, but it is not possible to make statements about the infrastructural data.

The most obvious use case here is with authorities. If I am using EATS to keep track of books and publishers, and I myself am a publisher, I will want to be both an authority and an entity, and it would be silly if those two elements are not linked (if not actually the same).

Now, I could easily extend the existing EATS to link these two, and the same for the other pieces of infrastructural data, but it’s hardly an elegant solution. The EATS model is tied directly into the underlying database storage model, making this sort of extension invasive and ugly. By building the EATS model as a topic map, I’m adding in a layer that isolates the EATS view of the world from the specifics of storing the data. This frees me to be able to extend or modify the model arbitrarily.

For example, I want to be able to handle the part of Through the Looking-Glass, and What Alice Found There where the Knight talks about the name of the song being called one thing, while it’s name is another, and the song itself is some third identifier. This is easy to do if, as in a topic map, almost everything is a topic and the EATS system simply needs to be told that a particular topic is typed as an entity, and can therefore be edited and displayed as such. Marvellous!

Another reason for using Topic Maps is that it has a defined procedure for handling automatic merging of topic maps. This is very useful for a system that is all about collecting information from multiple sources. If my EATS system has a record for an entity, and your EATS system has a record for that same entity, it would be great if I could just merge in the information from your record (or all the records from your system) and sort everything out – remove duplicate information, etc. Again, I could implement this in the old EATS, but it would be an EATS-specific thing — I wouldn’t have the benefits of being able to use other people’s code, merge in non-EATS data, and the like.

Finally, in doing this rewrite, I’ve been forced to take a step back and reconsider the application as a whole. This has lead to some insights (authority records are useless, and the way they are used in old EATS is flat out wrong) and refinements (I’m not going to bother with anything other than IRIs as identifiers). In turn, that will simplify usage and hopefully make it easier to interlink EATS systems and whatever they might point to.

Topic Maps with Django

As part of a rewrite of the Entity Authority Tool Set, I have written an implementation of the Topic Maps API in Django, cunningly titled TMAPI in Django.

As it stands, it passes its 288 unit tests, but has no UI, since I only need it for the internal use of EATS. It would of course be useful to have both a visualisation and an editing interface added to it, but I won’t be doing it any time soon without some inducement.

I mention it here as a spur to encourage CCH (and anyone else in the DH community) to start seriously working with these sorts of technologies.

Programming and Digital Humanities

In August this year, there were two almost consecutive threads on the Humanist mailing list that I found rather disturbing. The first, with the subject “getting involved”[0], seemed to reach a semi-consensus among its participants that digital humanists should be able to do some programming, at the least. In the second, with the subject “designing an academic DH department?”[1], people gave their views on the ideal makeup of such a department. Here’s part of one response, from amsler@cs.utexas.edu[2]:

I would see it as involding two clusters of people. The digital
humanists and the computer technologists / engineers who were employed within the digital humanities group as dedicated to that group itself. Roughly, I’d see a ratio of 1 computer / engineering professional to 5 or so digital humanists…

[I]n the traditional university environment, where you’d have separate computer science/engineering departments and a digital humanities department […] you’d wind up with the digital humanities having just computer support staff and the computer science/engineering departments having the truly creative people. That is, the best minds in computing & engineering wouldn’t be thinking about digital humanities ideas unless for some reason the computer science / engineering departments happened to pick up someone with those “outside” interests.

So this department would have five digital humanists getting the one creative computer scientist to implement all of their ideas, while s/he also does his/her own research and implementation? What exactly makes the humanists “digital”, then? And why would a creative computer scientist want to be the dogsbody of the group?

And from Darren Harkness[3]:

For an active DH department of 12-24 scholars, I would likely recommend a minimum of two developers, ideally split between highly structured languages such as Java and Python and less structured languages like PHP and perl as a way to cover most of your faculty’s needs. I would likely recruit a senior Java developer and a junior web developer with good research skills.

Am I simply reading these messages incorrectly, or is actual development/programming not what digital humanists do, in spite of the rhetoric? The ratios, in particular, don’t speak of the technical people being considered as anything like equal colleagues. This feels, to me, like a big problem.

[0] Both this and the later thread are available in the Humanist archive, though why a mailing list in this day and age doesn’t have threaded, dated, and searchable archives is beyond me. The first post in the “getting involved” thread is dated 19 August 2010.

[1] First post dated 1 September 2010.

[2] Message-Id: <20100825214431.B9F356395B@woodward.joyent.us>

[3] Message-Id: <20100827004713.D2F8D63E68@woodward.joyent.us>