heritage.data.gov.uk …?

The Linking Open Data dataset cloud

The Linking Open Data dataset cloud by Fenng

Exciting news for UK data this week as the new UK data website, www.data.gov.uk, had its official launch. It’s been in beta for a while but is now fully functional and open for business, providing access to a range of datasets. Importantly, as well as the more traditional download of files in formats such as Comma Seperated Variable (csv) text files, the site promises to provide information in the form of Linked Data. This is a massive advance towards the semantic web with data freely available to be used and reused by all manner of web apps, promising virtually limitless potential; graphed, mapped, and mashed up in a myriad of ways.

This news follows hot on the heels of the consultation document on the future of Ordnance Survey data which promises to make more high quality map resources far more widely available. So in addition to having access to government data such as crime, education and health statistics, we will soon (assuming the consultation goes the way it ought to) have access to basemaps to plot it all on and administrative area boundaries to analyse by.

Of course, this is not the end of the story, just the beginning. Whilst the Linked Data approach works well for simple discreet datasets such as numeric/statistical publications and the accessibility of map data is largely a political issue, there is still some way to go before much more complex datasets will be available in this way; Whilst many datasets can be provided as Linked Data and mapping can be delivered as WMS/WFS, both providing mechanisms for exposing data and making it interoperable, there are a couple of big outstanding issues pertaining to cultural heritage information.

Firstly, the semantic clarity with which we record and have recorded information. Concepts are frequently complex and compound and often indistinct or inconsistent; the notion of period (eg Bronze Age) for example encapsulates a spatial element and a temporal element. For resource discovery, disambiguating the colour bronze from the material bronze may be essential; not all bronze (material) objects are bronze (colour) whilst some non-bronze (material) objects are bronze in colour. What one expert refers to as a bronze knife is to another a bronze dagger. And that’s only the tip of the iceberg with the variation  in classifications used, levels of atomicity in recording schemas and the legacy of how heritage datasets came into being. So simply converting a digital site archive or other archaeological record to a pile of RDF triples, providing URIs and exposing the data will not magically take us into a world of semantically interoperable heritage data but it is possible to see how it will/can work for some heritage datasets such as eg Oasis records, records of archaeological activities, events and publications or many of the discreet project archives lodged with the Archaeology Data Service (ADS) with some care and attention to detail. There will undoubtedly need to be some additional mediating frameworks to facilitate access to data, conceptual frameworks such as ontologies capable of rationalising semantic conflicts and terminological differences, but such frameworks are being developed/implemented (for example the CIDOC Conteptual Reference Model) and can help to resolve some of the semantic issues associated with heritage data.

The other major obstacle for much heritage data is politics. Currently, the primary sources of archaeological data for sites and monuments are the Historic Environment Records (HER) or Sites and Monuments Records (SMR), typically held within local authorities. Increasingly, these resources have implemented strict data licensing conditions and charge for supply of data. This is not only counter to the whole open data concept but the income streams generated by restricting and controlling access must be appealing to local authorities as a way of making heritage resources self funding rather than being a drain on limited resources. The result of this is that local authorities are going to protect what they see as their intellectual property which has monetary value. Sound familiar? Where HER/SMR data is made more freely available it is typically through local authority online portals in order than control can still be exerted over the data both with respect to content, modes of presentation and interaction. Even the Heritage Gateway, the exemplar of opening up heritage data which has tremendous potential and is an amazing resource, only provides limited access to partial datasets from participating originators with individual originators retaining control over what is presented and how. Why can’t we all have access to the data within the held by HER/SMRs and indeed the National Monuments Record (NMR) using similar techniques as are being used for other government datasets, aiming towards heritage.data.gov.uk as successor to the Heritage Gateway…?

So, it looks like attitudes towards government data are changing; data will become more openly available, leveraging recent developments such as Linked Data and heading some way towards a semantic web of sorts. With any luck, heritage data will get carried along and we will start to see more opportunities for novel heritage research off the back of this. But, the semantics of heritage data are by far more complex than anything emerging to date and will provide us with considerable challenges to make this information truly interoperable so one day we can indeed have a semantic web of the entire corpus of knowledge regarding how we got to where we are today.