Linked Open Data - Benefits and Challenges

Created date: 2019-08-28

Last update: 2020-10-22

Abstract

This working document is intended to give an overview of the benefits and challenges of opening up and semanticizing data, as well as the potential challenges an institution might face in doing so.

Linked Open Data (LOD) for Cultural Institutions

At the moment, CHIN is developing a model intended for Canadian collections of artefacts. Its facet dedicated to people and groups is currently being developed and will be tested shortly. The Objects facet of DOPHEDA is intended to be able to align with linked.art’s model for art institutions.

If, as an institution, you want to semanticize your data, CHIN would be happy to collaborate with you on this matter and advise you as best we can. As a general rule, you should take the following main elements into consideration:

The use of open licenses on your data: knowing that you can choose which data to make available and that different licenses can be applied to different data (although an open license is always preferable from an LOD standpoint). For example, you could decide to make all information pertaining to an object available in LOD without providing the image of that object.
The cleaning of your data: knowing that messy data is publishable data that will not be as semantically sound. There are tools to semi-automate this process (OpenRefine and the Getty’s extension to it, for example). CHIN can advise you on this if need be. Keep in mind that if you want to publish rich LOD, the data cleaning process must be integrated with a semantic model that suits your needs. This will largely depend on the semantic valuation you are aiming to reach.
The development of a cultural heritage semantic model is most often based on CIDOC CRM, and this will be the case with CHIN’s model. The easiest way for an institution to semanticize its data is to use a pre-existing model rather than develop one of its own. You are invited to use CHIN’s model once it is available, and should you wish to use linked.art’s version, CHIN will be happy to put you in contact with them.
The publication of the semanticized and enriched data does not amount to its visualisation. As a result, the development of interface(s) is the next important step in a digital data strategy that is specific to your institution, should you want to make the data available to the public online. In most cases, the model you use or develop should not be determined by your intended visual displays (interfaces). Rather, it should be selected or developed according to your needs and use cases (such as domain experts’ questions that could eventually become queries).

You will find below a list of benefits and challenges that we have identified as part of our research. Keep in mind that many of the challenges can be mitigated by using a strategic approach.

Benefits

Linked open data offer a number of advantages, especially when it comes to accessibility and visibility online. LOD are a set of tools and principles that can benefit heritage institutions because they can:

Increase the discoverability rate of:
- Institutions and their collections;
- Artefacts and actors (people and groups) represented in the dataset;
- Anyone who openly contributes.
Foster more nuanced data (online and offline) by:
- Generating new knowledge;
- Creating new results that original authors/owners of data were not looking for/into;
- Showing errors that might have gone unnoticed.
Contribute to greater knowledge and understanding of the data by:
- Helping disseminate new ideas more rapidly and widely, which in turn triggers new research studies and serves as an impetus for knowledge;
- Making this knowledge widely known through reuse and publication, which can be put to immediate use in teaching;
- Enabling citizen advocacy groups and researchers to analyze data, producing new and better insights.
Diminish the financial and human resources needed for day-to-day tasks by:
- Distributing the maintenance of data across the network when it comes to researching, gathering and presenting heritage data;
- Minimizing the risk of using old/outdated metadata.
Offer an opportunity to engage stakeholders as well as citizens:
- Researchers and academics might be interested in micro-data;
- Decision makers and the public might be interested in higher-level aggregates;
- More people can access information, including those who would otherwise not have access to institutions and their databases, etc.;
- Citizens and others can familiarize themselves with the collections so that the museum reach and societal impact can be much broader, especially as a contributor to an open, knowledgeable and creative society, considering how people increasingly expect transparency from museums;
- Institutions can themselves use the datasets to further engage their own audience.
Standardize data which:
- Diminishes the risk of data loss through multiple conversions;
- Enables manipulation and analysis of data, making it more easily usable and visualizable;
- Renders heritage information more accessible to search engines.
Encourage socio-economic development by:
- Adopting transparency and accountability principles when it comes to engaging audiences;
- Making data re-usable for profit and non-profit organizations alike by giving broad access to the most recent data, which organizations can then build on;
- Offering better documentation and statistics when asking for private or public funding (or, in turn, when evaluating such proposals on the part of the public body).

Institutions that do enter the open-access arena usually do so for the following reasons:

The high cost of administering rights and permission fees for artworks that are subject to copyrights is comparable or superior to that of paying fees for these works (although this is highly dependent on the collection);
As a result of the remix culture of the Internet, it is now something that audiences are expecting from museums;
Open-access principles are considered to be a mission-serving imperative of the 21st century;
It fosters community engagement and expands the reach and scalability of online collections.

Challenges

The value of the data catalogue is realized when it is used by people, so that it relies on engagement of users more than on availability of data:
- Users should be in a position to discover the data they are anticipating and be equipped to use it;
- Rigorous work might be devalued because it takes longer to produce and much more resources to promote than “noisy” content (such as a big controversy or chatter about no specific content).
The passage to LOD entails a paradigm shift when it comes to assessing and commenting data:
- It entails acquiring new expertise or networks of advisors who are knowledgeable about LOD;
- Institutions are often fearful that they will lose their ability to sell images, hence cutting themselves off from significant revenue and financial independence (image revenues, however, are usually minimal, especially in Canada, where the market is relatively small; in addition, it is possible to open only select data, thus excluding images if necessary);
- Who is considered to have authority and knowledge over information (as opposed to data, which remains strictly under the umbrella of its host institution) might change as more information is generated;
- The decentralization of information implies subjecting data to public scrutiny and questioning the authority of institutions, especially in the case of conflicting or problematic data for sensitive datasets.
The catalogue has to be built according to who the users will be, which might involve:
- A re-evaluation of the needs of the community following a change in the data management landscape (where the users of the data will no longer be solely cataloguers);
- A need for the data to not only be structured and classified, but to also be meaningfully and consistently organized (i.e. the information not only has to be retrievable, the path to reach it and where it is within the structure is meaningful as well);
- A transparent data production/contribution process where users expect to have access to original information, be able to scrutinize it and have a way to manipulate it themselves.
There is a risk of users misinterpreting or misrepresenting data either deliberately or through a lack of understanding:
- This might generate intense debates with no single authority to adjudicate who is knowledgeable and who is not. However, the reverse is also true, as opening up data exposes it to scrutiny by a wider set of experts that the host institution might not have known about;
- Everyone must be able to use, reuse and redistribute easily, but provisions to communicate with data contributors (at all stages, namely production, storage and distribution) must also be offered to users.
Opening data is generally not a priority for stakeholders:
- Maintaining, cleaning and opening data can be resource-intensive;
- Fear of criticism when it comes to problematic, incomplete or inaccurate datasets is a real concern for institutions;
- Converting an existing dataset to an LOD portal can be daunting, especially as information technologies and management systems have been developed without considering public use or the groups that are now likely to mobilize the data.

Feasibility Guidelines

In an interview with Jason Bailey, Neal Stimler suggested the following process to navigate the opening of your data (Bailey 2019: 1-2):

Perform a thorough rights assessments using relevant resources such as:
Consult with licensed legal counsel
Build tools to provide mass self-serve access to data and digital asset sets. These tools typically come in the form of:
- A museum’s collection on a website;
- A public application programming interface (API);
- A GitHub repository of data in the .CSV and .JSON formats. Data should be offered with the same permissions and legal frameworks as associated image assets. The API serves application developers and partners, while .CSV and .JSON formatted data mainly support researchers and scholars.
Ensure open-access content is hosted in partnership with crucial aggregation platforms such as Wikidata, Wikimedia Commons and Internet Archive.
Ensure decisions are evaluated and made with respect to cultural and ethical considerations of open access in collaboration with communities and scholars.
An internal working group or project team from relevant areas across the organization should be assembled. The internal group would be directed by a project manager who leads the project vision and has ultimate decision-making authority. Partnerships with allied organizations engaged with an institution’s users and working directly with Creative Commons is strongly recommended to implement best practices.

Best Practices for Publishing LOD

Additional steps necessary for the production of LOD have also been identified by Bernadette Hyland, Ghislain A. Atemezing and Boris Villazón-Terrazas (Hyland, Atemezing, and Villazón-Terrazas 2014):

Prepare stakeholders: Because LOD is by nature a collective endeavour, its principles must be understood by practitioners and stakeholders, and it is preferable to identify the roles of each party in an LOD ecosystem as well as the benefits of such an environment.
Select a dataset: LOD is a step-by-step process that is best understood when working with a well-known dataset that can be useful to your organization, partners or the public.
Model the data: This step includes many questions such as which semantic models to use or how to properly use a semantic model to ensure the proper aggregation of content. CHIN can advise you on these matters, especially if you are using DOPHEDA.
Specify an appropriate license: Choosing the right license is a crucial part of using and producing LOD so that data is both manageable for your organization and reusable.
Assign good URIs for linked data: An organization producing LOD must assign unique identifiers called Uniform Resource Identifiers (URIs) to its data. These URIs must be based on HTTP protocol and be stable, machine- and human-readable, and dereferenceable (accessible in different representations such as HTML or JSON-LD). The best way to generate and maintain URIs will depend on the producing organization’s infrastructure and resources.
Use standard vocabularies: The institution must reuse external vocabularies’ URIs as much as possible in order to foster interoperability of the content. Selecting the proper vocabularies should be based on the terms’ definitions and their usage by the institution’s partners. CHIN can advise you on this matter should you wish to manage URIs yourselves.
Convert data: There are several tools on the market that allow you to transform tabular data into RDF formats following ontological patterns.
Provide machine access to data: Ideally, these new files should reside in a triple store (a database for LOD) that will enable SPARQL queries (a protocol similar to SQL, but for LOD) on the RDF data. However, the data can also be accessible through a file download system.
Announce to the public: It is crucial to advertise your work to let stakeholders know that your content is now available as LOD so that potential users are aware of it; this could be achieved through mailing lists or by adding your dataset to the LOD cloud, for example.
Abide by the social contract you enter as an LOD publisher: As an LOD publisher, you must recognize your responsibility in maintaining your data so that it is up to date and accessible. In order to do so, you could for instance create a discussion channel to keep track of issues submitted by users or help decide which model to implement.

Selected Bibliography

Bailey, Jason. 2019. « Solving Art’s Data Problem - Part One, Museums ». Artnome (blog). April 29, 2019. https://www.artnome.com/news/2019/4/29/solving-arts-data-problem-part-one-museums.

Data, Open Art. 2018. « Museums: Interactive Map with Wikidata ». Open Data Art (blog). December 16, 2018. https://www.openartdata.org/2018/12/museums-map-wikidata.html.

Edson, Michael Peter. 2019. « Wikimania 2019 Keynote Address ». Keynote presented at Wikimania 2019, Stockholm, SE, April 29. https://www.youtube.com/watch?v=9NBonp9KLz8.

Goldman, Kathryn. 2018. « Open Access Images of Public Domain Work ». Creative Law Center (blog). 2018. https://creativelawcenter.com/museums-open-access-images/.

Hyland, Bernadette, Ghislain A. Atemezing, and Boris Villazón-Terrazas. 2014. « Best Practices for Publishing Linked Data ». W3C Working Group Note. January 9, 2014. https://www.w3.org/TR/ld-bp/.

Kela, Riitta. 2019. « Opening Collections as Open Data: Challenges and Possibilities ». In Documenting Culture: A Culture of Documentation. International Council of Museums (ICOM). Tokyo, JP.

McCarthy, Douglas. 2019. « Licensing Policy and Practice in Open Glam ». Medium, May 30, 2019. https://medium.com/open-glam/licensing-policy-and-practice-in-open-glam-49c867b49de8.

Oomen, Johan, Enno Meijers, and Wilbert Helmus. 2016. « Network Digital Heritage: Towards A Distributed Network of Heritage Information ». International Conference on Digital Preservation (IPRES). Amsterdam, NL: Dutch Digital Heritage Network. https://www.netwerkdigitaalerfgoed.nl/wp-content/uploads/2018/02/NDE_PositionPaper_NetworkHeritageInformation-EN-v2.pdf.

Open GLAM. 2020. « Declaration on Open Access for Cultural Heritage ». January 21, 2020. https://docs.google.com/document/d/1CpDGlWLgkEYJC5A2HJ_Os8XYEv7ONOIBYAobSFzWm14/edit?usp=embed_facebook.

Open Knowledge Foundation. 2012. « Resources ». OpenGLAM. November 27, 2012. https://openglam.org/resources/.

Openness: Politics, Practices, Poetics. 2017. Living Archives. Malmˆ, SE: Malmˆ University. http://muep.mau.se/bitstream/handle/2043/23606/openness_final.pdf?sequence=2\&isAllowed=y#page=14.

Sanderhoff, Merete, ed. 2014. Sharing Is Caring: Openness and Sharing in The Cultural Heritage Sector. Translated by Néné La Beet and René Lauritsen. Copenhagen, DK: Statens Museum for Kunst. https://www.smk.dk/en/article/the-sharing-is-caring-anthology/.

Schrier, Bill. 2014. « Government Open Data: Benefits, Strategies, and Use ». The Evans School Review, Alumni Perspective, 4 (1): 12‑27.

Stimler, Neal, and Louise Rawlinson. 2019. « Where Are The Edit and Upload Buttons? Dynamic Futures for Museum Collections Online ». Dans MuseWeb. Boston, MA: MuseWeb 2019. https://mw19.mwconf.org/paper/where-are-the-edit-and-upload-buttons-dynamic-futures-for-museum-collections-online/.

Stinson, Alex. 2018. « Wikidata in Collections: Building a Universal Language for Connecting GLAM Catalogs ». Medium (blog). April 9, 2018. https://medium.com/freely-sharing-the-sum-of-all-knowledge/wikidata-in-collections-building-a-universal-language-for-connecting-glam-catalogs-59b14aa3214c.

Vathana, Anly, and Dev Pramil Audsin. 2013. « An Open Analysis on Open Data ». Submission paper. In Open Data on the Web, 4. London, GB: W3C. https://www.w3.org/2013/04/odw/odw13_submission_33.pdf.

Wallace, Andrea. 2017. « Access and the Digital Surrogate: Openness as a Philosophy ». Presented at National Digital Forum, Wellington, NZ, November 27. https://www.youtube.com/watch?v=crKUIxIX3sY.