Ga naar hoofdinhoud

Data model

This page is for users of the Dataset Register: anyone querying the SPARQL endpoint, fetching dataset descriptions in RDF, or building applications on top of the register. It describes the consumer-facing, published data model: the RDF as it appears in the register after fetching, validating, mapping, and storing the providers’ input.

info

If you are a publisher (a data platform submitting dataset descriptions) you are looking for the input format and validation rules instead — see the Requirements for Datasets.

The register stores descriptions in DCAT, aligned with DCAT-AP-NL 3.0. Schema.org submissions are converted to DCAT at ingest, so consumers see the DCAT form regardless of how the data was originally submitted. The Schema.org ↔ DCAT alignment mostly follows the W3C DCAT 3 Alignment with Schema.org appendix.

Cardinalities reflect the data as stored, including auto-derived and auto-default values.

Property-column tags signal the source vocabulary:

  • untagged — profiled by DCAT-AP-NL 3.0 (the default).
  • DCAT — defined in DCAT 3.0 but not profiled by DCAT-AP-NL.
  • DC — plain Dublin Core (dct:) passthrough; not profiled by DCAT-AP-NL, DCAT-AP, or DCAT 3.0.
  • DCAT-AP-NL: Distribution only — DCAT-AP-NL profiles the property only on Distribution; the dataset-level usage is a register convenience.

NAL stands for Named Authority List, the EU Publications Office’s term for the controlled vocabularies it maintains (Languages, Frequency, Access Rights, File Type, etc.).

Dataset

The dcat:Dataset, dcat:Distribution, and foaf:Agent shapes describe the public dataset description as stored.

dcat:Dataset

When a dataset’s RDF description is fetched and validated, it is stored as a dcat:Dataset in its own graph. The URL of the graph corresponds to the dataset’s IRI.

DCAT termData type / notesCardinality
dct:titlerdf:langString1..n (one per language)
dct:identifierAuto-derived from the dataset IRI1..1
dct:descriptionrdf:langString1..n (one per language)
dct:license DCAT-AP-NL: Distribution onlyIRI or literal in v1; v2.0: IRI required. Inherited by distributions that don’t specify their own. If the dataset has no IRI license, the register denormalises one IRI license from its distributions onto the dataset for query convenience. A license must exist on the dataset or on every distribution — see DistributionLicenseRequiredShape.0..1
dct:accessRightsEU Access Rights NAL IRI; defaults to PUBLIC1..1
dcat:themeIRI from a controlled vocabulary; the EU Data Theme NAL value data-theme/EDUC is auto-assigned1..n
dcat:contactPointvcard:Kind with vcard:fn and vcard:hasEmail (mailto: IRI)0..1
v2.0: 1..1
dct:languageEU Language Authority IRI0..n
dcat:keywordrdf:langString0..n
dcat:landingPageIRI0..n
dct:sourceIRI0..n
dct:created DCxsd:date or xsd:dateTime; the lexical form may not be ISO 8601 in v1.
v2.0: ISO 8601 value required
0..1
dct:issuedxsd:date or xsd:dateTime; the lexical form may not be ISO 8601 in v1.
v2.0: ISO 8601 value required
0..1
dct:modifiedxsd:date or xsd:dateTime; the lexical form may not be ISO 8601 in v1.
v2.0: ISO 8601 value required
0..1
dcat:versionxsd:string0..1
dct:creatorfoaf:Organization or foaf:Person0..n
v2.0: 1..n
dct:publisherfoaf:Organization or foaf:Person0..1
v2.0: 1..1
dct:spatialIRI (e.g. GeoNames). DCAT-AP-NL also allows dct:Location with dcat:bbox / dcat:centroid / dcat:geometry, but the register stores IRI references only.0..n
dct:temporaldct:PeriodOfTime blank node with dcat:startDate and/or dcat:endDate0..n
dct:isPartOf DCIRI or literal in v1
v2.0: HTTPS IRI required
0..n
dct:hasPart DCATIRI0..n
dct:isReferencedByIRI0..n
dct:accrual­PeriodicityEU Frequency NAL IRI0..1
dcat:distributiondcat:Distribution (see below)0..n

dcat:Distribution

The objects of dcat:distribution dataset properties have type dcat:Distribution.

DCAT termData type / notesCardinality
dcat:accessURLIRI
v2.0: HTTPS IRI
1..1
dcat:mediaTypeIANA media type IRI. Required for download distributions; APIs use dct:conformsTo instead. Any compression suffix is split off into dcat:compressFormat.0..n
v2.0: 0..1
dcat:compressFormatIANA media type IRI; added when e.g. +gzip is stripped from dcat:mediaType0..1
dct:conformsToProtocol IRI (e.g. <https://www.w3.org/TR/sparql11-protocol/> for SPARQL endpoints)0..1
dct:issuedxsd:date or xsd:dateTime0..1
dct:modifiedxsd:date or xsd:dateTime0..1
dct:titlerdf:langString0..n
dct:descriptionrdf:langString0..n
dct:languageEU Language Authority IRI0..1
dct:licenseIRI or literal in v1; v2.0: IRI required. Inherited from the dataset if not specified. The register requires a license to exist on the distribution or the dataset via DistributionLicenseRequiredShape.0..1
dcat:byteSizexsd:integer (bytes)0..1
foaf:pageIRI to a documentation page (SPARQL UI, download landing page, etc.)
v2.0: HTTPS required
0..n
odrl:hasPolicyODRL policy associated with the distribution0..n

foaf:Agent

The objects of both the dct:creator and dct:publisher dataset properties are foaf:Agent instances — concretely either foaf:Organization or foaf:Person. The publisher carries additional properties beyond those available on the creator.

PropertyData type / notesCardinality
foaf:namerdf:langString — the organization or person name1..n (one per language)
foaf:nickAlternate name (publisher only)0..n
dct:identifierIdentifier (publisher only)0..1
foaf:mboxEmail address as a literal (publisher only)0..1
owl:sameAsEquivalent entity IRI (publisher only)0..n

Registration

The shapes below describe how the register tracks registrations themselves – not the public dataset description that consumers query.

schema:EntryPoint

Any URL registered by clients is added as a schema:EntryPoint to the Registrations graph.

Datasets are fetched from this URL on registration and when the crawler runs.

PropertyDescription
schema:additionalTypeComputed registration status:
  • <https://data.netwerkdigitaalerfgoed.nl/registry/valid> — fetched and passed SHACL validation
  • <https://data.netwerkdigitaalerfgoed.nl/registry/invalid> — fetched but failed SHACL validation; see schema:validUntil
  • <https://data.netwerkdigitaalerfgoed.nl/registry/gone> — could not be fetched as a dataset description. Covers HTTP error responses (≥ 300) and non‑HTTP failures: parse errors, unrecognised content types, and URLs that returned 200 but contained no dataset triples.
schema:datePostedUTC datetime when the URL was registered.
schema:dateReadUTC datetime when the URL was last read by the application. The crawler updates this value when fetching descriptions.
schema:statusThe HTTP status code last encountered when fetching the URL.
schema:validUntilIf the URL has become invalid, the UTC datetime at which it did so.
schema:aboutThe schema:Datasets found at this URL. A registration URL may describe a single dataset (one entry) or a catalog of multiple datasets (multiple entries). The crawler updates this value when fetching descriptions.

schema:Dataset

Each dataset that is found at the schema:EntryPoint registration URL gets added as a schema:Dataset to the Registrations graph.

PropertyDescription
schema:dateReadUTC datetime when the dataset was last read by the application.
schema:subjectOfFrom which registration URL the dataset was read.

schema:Rating

A separate named graph keeps a schema:Rating instance for each dataset description, indicating how complete the description is. Reach it from a dataset via the schema:contentRating property.

PropertyDescription
schema:bestRatingThe highest possible rating.
schema:worstRatingThe lowest possible rating.
schema:ratingValueRating for the dataset description.
schema:ratingExplanationExplanation for the rating: which properties are missing?

Allow list

A registration URL must be on a domain that is allowed before it can be added to the Register. The allow list lives in the https://data.netwerkdigitaalerfgoed.nl/registry/allowed_domain_names RDF graph. Each entry is a blank node with a single property:

PropertyDescription
https://data.netwerkdigitaalerfgoed.nl/allowed_domain_names/def/domain_nameLiteral: either a registrable domain (example.com) or a specific subdomain (sub.example.com). A registrable domain implicitly covers all its subdomains.

To modify the allow list, use the REST API (POST /allowed-domains); the SPARQL endpoint is read-only.