Dataset refining and simplified queries

A current project to better understand the evolution of the Celtic language is making use of interactive maps that bring together many different data sets for comparison.

The project is making use of and refining existing data that often exists in a flat tabular form and creating a more scalable relational database that can be used to perform more complex queries than was previously possible.

For instance, one data set contains a list of burial sites with details including the nature of the burial, the type of pottery and other artifacts found in association with the individuals at the site and, where possible, the gender and approximate age of each individual.

When these data were originally collected a spread sheet was used to classify each site but often the entries followed a free text format and it was perhaps not considered that this form of recording would be hard to use programmatically in the future. For instance, a typical summary description of the remains found at a site recorded in a spread sheet cell might be:

“adult male burial asso with adult female and baby”.

Whilst this is descriptive and can be easily understood by a human, it is hard to make use of in a structured query. For this reason much effort has been expended to normalise the spread sheet data into a flexible relational database form. In the previous example, whilst we a still retain one site record, we link this site to three separate buried individuals. Each individual in turn has its own strictly controlled categorisations:

Individual1, Adult Male

Individual2, Adult Female

Individual3, Neonate.

Each of these individuals may have descriptions of the manner in which they were interred, for example orientation of the long axis of the body, position of the head or the side of the body it is lying on. Additionally, the individual grave goods are associated with the correct individual to aid interpretation. Wherever possible, all such free text fields have been converted into discrete lists of valid options.

This structure allows us to use queries like ‘Return all burial sites where there are more than five individuals, all male, regardless of age, found with a Cairn, but not a Pit, where at least one individual is oriented NW-SE’.

This is an admittedly complex example and one that is possibly unlikely to yield results, but it demonstrates the sort of complexity available.

If this sort of query were expressed in SQL, it would be verbose and probably unintelligible to a non-technically minded researcher.

To help researchers assemble similar queries, the data sets will be carefully pre indexed to expose the most useful facets of the data. Each record can exhibit none or many of the data facets indexed. These options are presented in simple checkbox lists to either include or exclude particular facets and to continually narrow the results.

This screen grab from a functional prototype shows a user requesting burials from the database that are of type “Cist”, have been “Disturbed” and DO NOT include individuals aligned “East to West”.

facet

Although these results could then be displayed to the user in a tabular form, they are presumably interested in the geographic distribution of the results. The results are mapped so it can be seen where these sorts of similar burials are.

A common problem when the result set is large, or is particularly concentrated in a small area, is that map markers tend to overlay each other and the relative density of clusters cannot be easily seen or individual markers easily identified. To deal with this potential issue, the mapped results are clustered and styled so that at a glance, the researcher can see where her results are concentrated. Intensity of colour reflects an increasing count in these two example screen grabs (the second being the same result set as the first at a higher zoom level). A useful side effect of this approach is that the web browser doesn’t have to cope with rendering too many markers which can impact performance.

map1       map2

Leave a Reply

Your email address will not be published. Required fields are marked *