Requirements for Dataset Register Implementations

Living Standard,

This version:
https://docs.nde.nl/requirements-dataset-register-implementations/
Issue Tracking:
GitHub
Editor:
David de Boer (Netwerk Digitaal Erfgoed)
Version:
0.1.1

Abstract

This document describes requirements for implementing the Dataset Register API in collection management systems. By following these requirements, vendors enable users to efficiently publish dataset descriptions from their collection management system.

1. Introduction

This section is non-normative.

The Requirements for Collection Management Systems include a general obligation for vendors to implement the Dataset Register API in their collection management systems. However, feedback from users of these systems, as well as questions from vendors, has shown the need for more detailed guidance. The requirements below provide such guidance, specifying in more detail the steps needed for performant and user-friendly implementations of the Dataset Register. The goal is to help both users and vendors.

Heritage institutions can use these requirements during procurement as a checklist to evaluate collection management systems, or to encourage their current vendor to offer the functionality described here.

This document is a counterpart to [nde-network-of-terms-implementations], which provides similar guidance for implementing the Network of Terms API.

2. Definitions

Collection manager

A person who creates metadata records in a collection management system and creates dataset descriptions, without requiring technical knowledge.

Collection management system

A software application for creating, editing, and publishing metadata records that implements the Dataset Register API. The system may include a separate publication component or layer that wraps the core application. For the purpose of this document, the system is treated as a whole: it does not matter whether the dataset description, publication, and registration with the Dataset Register are handled by the core application itself or by a publication layer or component on top of it.

Dataset Register

An NDE service that allows institutions to register and publish dataset descriptions, making them findable through the Dataset Register API and visible on the Dataset Register website.

Registration URL

A URL submitted to the Dataset Register API that points to one or more dataset descriptions. The Dataset Register fetches and validates the descriptions found at this URL, and periodically re-crawls it to pick up changes.

Vendor

An organization that supplies a collection management system.

3. Requirements

These requirements are organized into two groups: managing dataset descriptions and distributions.

3.1. Manage dataset descriptions

3.1.1. Create datasets

Collection managers MUST be able to create datasets by themselves in the collection management system without requiring technical knowledge.

They MUST be able to select heritage objects to be included

in datasets, either manually or through filters (such as terms) that automatically include matching objects.

This gives collection managers the ability to determine the boundaries of their datasets, and to decide between (e.g.) creating a single large dataset or any number of smaller datasets. Dynamic selections (e.g. filters based on terms or object types) allow newly matching objects to be included in the dataset automatically.

3.1.2. Edit dataset descriptions

Collection managers MUST be able to edit all required and recommended attributes for datasets and organizations without requiring technical knowledge.

This gives collection managers full control over their dataset descriptions: required properties are needed to create valid dataset descriptions right now; recommended properties create richer descriptions and enable future-proofing, because they may become required in the future.

3.1.3. Required fields

Collection management systems MUST mark required attributes as such to collection managers.

This helps collection managers distinguish between fields that are needed for a valid dataset description and fields that are recommended but optional.

3.1.4. Enforce valid input

Collection management systems MUST enforce valid attributes to be entered. In particular:

3.1.5. Validate dataset descriptions

Collection management systems MUST NOT allow invalid dataset descriptions to be published. They MUST run the current NDE SHACL shapes against each dataset description and MUST surface any violations and warnings to collection managers, translated to plain language and mapped to the relevant form field. In particular, they MUST communicate upcoming requirement changes (such as newly required properties or changed data types) to collection managers in advance by surfacing SHACL warnings. Using the SHACL and/or validation API provided by NDE is RECOMMENDED.

Whether validation happens during editing or at publish time is left to the vendor.

Violations indicate issues that must be fixed now to meet current requirements. They should rarely occur in collection management systems that implement § 3.1.3 Required fields, § 3.1.4 Enforce valid input, and § 3.1.2 Edit dataset descriptions, as the form already prevents them. When they do appear, they typically concern existing descriptions: legacy data that predates the current form, or data affected by shape rules added after the form was last updated.

Warnings indicate issues that must be fixed before the next major version of the requirements. They are expected to occur on new data as well, as they represent behaviour that the Dataset Register tolerates today but will enforce in the future. Vendors SHOULD add form fields for upcoming requirements as soon as they are announced, so collection managers have sufficient time to update their descriptions before the next major version rejects them. The migration path for new requirements relies on surfacing warnings early and giving collection managers the means to act on them.

3.1.6. Show help texts

Collection management systems SHOULD show help texts for all input fields to collection managers.

NDE will provide help texts that vendors may use freely.

3.1.7. Provide default values

Collection management systems SHOULD provide sensible default values for form fields where applicable.

For example: a default license such as CC0, the publisher name and identifier pre-filled from the organisation’s account settings.

3.1.8. Auto-fill system-determined fields

Collection management systems MUST automatically fill fields whose values the system can determine, without requiring input from collection managers.

For example:

3.1.9. Publish dataset descriptions

Collection management systems MUST publish valid dataset descriptions to the registration URL. Changes to dataset descriptions MUST be validated and, if valid, immediately propagated to the registration URL. If invalid, the validation results MUST be reported to collection managers.

Publishing happens continuously, regardless of whether the registration URL has been registered with the Dataset Register.

3.1.10. Register with the Dataset Register

Collection managers MUST have control over when to register the registration URL with the Dataset Register directly from the collection management system. A clear indication MUST be given to collection managers that the dataset description is registered in the Dataset Register.

Registration can also be done through a catalog of dataset descriptions submitted via a single registration URL. The Dataset Register will automatically re-crawl the registration URL to pick up changes.

3.1.11. Remove datasets

Collection managers MUST be able to remove datasets from the Dataset Register in the collection management system. When removing or unpublishing a dataset, the dataset description MUST be removed from the registration URL.

3.1.12. Dataset Register detail page

Collection management systems MAY show a link to the Dataset Register detail page for each dataset. If shown, the link MUST have the format https://datasetregister.netwerkdigitaalerfgoed.nl/datasets/<dataset URI>.

This allows collection managers to quickly view and validate the dataset description as published in the Dataset Register, as well as insights generated by the Dataset Knowledge Graph.

3.2. Distributions

3.2.1. Auto-add distributions

Collection management systems MUST automatically add distributions to dataset descriptions, including all required distribution attributes. All recommended distribution attributes SHOULD be included. The schema:contentUrl MUST be reachable and MUST return the declared schema:encodingFormat as Content-Type.

While collection managers own the dataset information, distributions (such as a SPARQL endpoint or RDF download) are provided by the collection management system’s infrastructure and should therefore be added automatically to the dataset description.

3.2.2. Keep distributions up to date

Updates made by collection managers to heritage object records MUST be reflected in the dataset’s distributions as soon as possible.

For example, changes are reflected immediately in a SPARQL endpoint, or included in the next run of a periodic data dump.

4. Changes

4.1. Version 0.1.1 (2026-04-13)

4.2. Version 0.1.0 (2026-04-12)

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[NDE-CMS]
Gertjan Filarski; Enno Meijers. Requirements for Collection Management Systems. URL: https://docs.nde.nl/requirements-collection-management-systems/
[NDE-DATASETS]
David de Boer; Bob Coret. NDE Requirements for Datasets. Living Specification. URL: https://docs.nde.nl/requirements-datasets/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[NDE-NETWORK-OF-TERMS-IMPLEMENTATIONS]
David de Boer. Requirements for Network of Terms Implementations. URL: https://docs.nde.nl/requirements-network-of-terms-implementations/