Seminar: Text Mining for Digital Humanities

Text Mining for Digital Humanities

Professor Timo Honkela (presented by Tuula Pääkkönen)
National Library of Finland, Helsinki
Tuesday, 11 November 2014, 6.00 pm
Anatomy Museum, Strand Building 6th Floor,
King’s College London, Strand London WC2R 2LS
With the increased availability of texts in electronic form, text mining has become commonplace as an attempt to extract interesting, relevant and/or novel information from text collections in an automatic or a semi-automatic manner. Text mining tasks include, for example, categorization, clustering, topic modelling, named entity recognition, taxonomy and conceptual model creation, sentiment analysis, and document summarization. The majority of text mining research has focused on corpora that have been born digital. However, for humanities and social sciences, the digitisation and analysis of originally printed or handwritten documents is essential. These documents may contain even a large proportion of OCR errors which has to be taken into account in the subsequent analytical processes. In this presentation, text mining of historical documents is discussed in some detail. Attention is paid to the  methodological challenges caused by the noisy data, and to the future possibilities related to multilinguality and context-sensitive analysis of large collections.
From the beginning of 2014, professor Timo Honkela works at the Department of Modern Languages, University of Helsinki, and the National Library of Finland, Center for Preservation and Digitisation in the area of digital humanities. Before this he was the head of the Computational Cognitive Systems research group at Aalto University School of Science. With close to 200 scientific publications, Honkela has a long experience in applying statistical machine learning methods for modeling linguistic and socio-cognitive phenomena. Specific examples include leading the development of the GICA method for analyzing subjectivity of understanding, an initiating role in the development of the Websom method for visual information retrieval and text mining, and collaboration with professor George Legrady in creating Pockets Full of Memories, an interactive museum installation. Lesser known work include statistical analysis of Shakespeare’s sonnets, historical interviews, and climate conference talks, and analysis of philosophical and religious conceptions.
(Unfortunately, at the last minute Prof Honkela finds himself unable to be with us for his presentation.  Thus, it will instead be given by his colleague Tuula Pääkkönen).


Leave a Reply

Your email address will not be published. Required fields are marked *