Nick Poole has recently written a proposal for the development of a SPECTRUM Data Standard. While I fully support this idea, I think that the concept should be broadened, as outlined below.
A standard for assertions
The proposed SPECTRUM Data Standard would be “a modelling of the SPECTRUM Units of Information using the CIDOC CRM”. As such, it will allow statements to be made about collections objects, their history and management processes applied to them. While Nick’s position paper states that the SPECTRUM Data Standard will not in itself be an interchange standard, it will nonetheless require some concrete syntax. Given the intention to use the ResearchSpace work as a starting point, I think it is reasonable to assume that this concrete syntax will be RDF. This means that each SPECTRUM Unit of Information will be represented by one or more CIDOC CRM classes or properties, expressed as a template for a Linked Data assertion.
The need for shared Linked Data “instances”
While the proposed Data Standard will allow different institutions to make statements like “this object was donated by person X” in a consistent way, this will only be of marginal benefit if every institution defines “person X” in a different way. The point Nick makes about needing agreement on field names such as “Title” applies equally to data values held within those fields.
There are two levels of problem here. The first is that pretty much all existing data is stored in collections management systems as string values (e.g. “Light, Richard”). While it is perfectly allowable to include these strings in your RDF results, and valid to say that they conform to the SPECTRUM Data Standard, in practice the results are ambiguous. (Look for “Richard Light” on Youtube – that guy isn’t me!) In order to be clear which person called Richard Light you mean, you need to employ a Linked Data approach, and create a unique, persistent identifier for that person. The standard way of doing this is to invent a URL for each person, and back it up with facts about the person (expressed, again, as RDF), so that others can determine which Richard Light you mean. So there is a requirement on collections management systems, to provide cultural heritage institutions with the toolset to enable the conversion of their string-value data to URLs.
The second problem is that inventing URLs for people (etc.) will not bring any shared benefit if each institution mints their own set of URLs. While it is possible to “co-reference” URLs (i.e. to say that they refer to the same individual), this becomes a fruitless exercise in stable-door-shutting once the number of systems to co-reference gets much above three. Much more effective would be the shared use of a standard Linked Data authority, so that everyone can use the same URLs when they are describing the same entity. This begs the question: where are these standard authorities to come from? In some cases they will already be out there, for example there are Linked Data authorities for geographical information (Geonames; OpenStreetMap; Ordnance Survey) which could simply be adopted. In other cases, the specialized needs of the cultural heritage sector have already been met – in particular the Getty vocabularies (AAT, ULAN, TGN) are being prepared for publication as Linked Data resources. However, there will be other cases where no central authority exists. In my view, it should be part of the SPECTRUM Data Standard brief (a) to identify and promote the use of existing Linked Data authorities which are suitable for cultural heritage use and (b) to enable the creation and maintenance/development of new authorities, where required by the community.
Textual resources
Nick mentions in passing the different needs of a conservation description and a web site description. These are both textual resources. The scenario he outlines suggests that these descriptions are simple string values, i.e. that they have no internal structure. Such string values are often found in Linked Data applications (e.g. the multilingual abstracts which form a central part of dbpedia descriptions), but cultural heritage institutions produce a great deal of material which has a more complex structure than this. Even a simple exhibition label will typically have a heading, several paragraphs, and a reference number.
I think that a major opportunity will have been lost if the SPECTRUM Data Standard project does not make some attempt to apply the COPE philosophy to richly-structured textual resources. Museums have a wealth of formally published material (think exhibition catalogues, wall texts) as well as “grey literature” (e.g. conservation reports, correspondence, email) which is increasingly held in a digital form. Whereas in the past this material would have been stored in an impenetrable binary format, it is now increasingly possible to access it in a machine-processible form (typically some sort of XML). This opens up the possibility of applying COPE to such resources as they stand, or (more realistically) of converting them to a standard format which can then be used for COPE delivery. An example of such a format is the Text Encoding Initiative, which is a well-established framework for encoding humanities documents of any kind.
As well as allowing all of part of these textual resources to be freely repurposed using COPE, their publication as a set of stable web-accessible resources would enable cross-reference and annotation, using standards such as Open Annotation.