Skip to main content

Data model

This page is for users of the Dataset Register: anyone querying the SPARQL endpoint, fetching dataset descriptions in RDF, or building applications on top of the register. It describes the consumer-facing, published data model: the RDF as it appears in the register after fetching, validating, mapping, and storing the providers’ input.

info

If you are a publisher (a data platform submitting dataset descriptions) you are looking for the input format and validation rules instead — see the Requirements for Datasets.

The register stores descriptions in DCAT, aligned with DCAT-AP-NL 3.0. Schema.org submissions are converted to DCAT at ingest, so consumers see the DCAT form regardless of how the data was originally submitted. The Schema.org ↔ DCAT alignment mostly follows the W3C DCAT 3 Alignment with Schema.org appendix.

Cardinalities reflect the data as stored, including auto-derived and auto-default values.

Property-column tags signal the source vocabulary:

  • untagged — profiled by DCAT-AP-NL 3.0 (the default).
  • DCAT — defined in DCAT 3.0 but not profiled by DCAT-AP-NL.
  • DC — plain Dublin Core (dct:) passthrough; not profiled by DCAT-AP-NL, DCAT-AP, or DCAT 3.0.
  • DCAT-AP-NL: Distribution only — DCAT-AP-NL profiles the property only on Distribution; the dataset-level usage is a register convenience.

NAL stands for Named Authority List, the EU Publications Office’s term for the controlled vocabularies it maintains (Languages, Frequency, Access Rights, File Type, etc.).

Dataset

The dcat:Dataset, dcat:Distribution, and foaf:Agent shapes describe the public dataset description as stored.

dcat:Dataset

When a dataset’s RDF description is fetched and validated, it is stored as a dcat:Dataset in its own graph. The URL of the graph corresponds to the dataset’s IRI.

DCAT termData type / notesCardinality
dct:titlerdf:langString1..n (one per language)
dct:identifierAuto-derived from the dataset IRI1..1
dct:descriptionrdf:langString1..n (one per language)
dct:license DCAT-AP-NL: Distribution onlyIRI or literal in v1; v2.0: IRI required. Inherited by distributions that don’t specify their own. If the dataset has no IRI license, the register denormalises one IRI license from its distributions onto the dataset for query convenience. A license must exist on the dataset or on every distribution — see DistributionLicenseRequiredShape.0..1
dct:accessRightsEU Access Rights NAL IRI; defaults to PUBLIC1..1
dcat:themeIRI from a controlled vocabulary; the EU Data Theme NAL value data-theme/EDUC is auto-assigned1..n
dcat:contactPointvcard:Kind with vcard:fn and vcard:hasEmail (mailto: IRI)0..1
v2.0: 1..1
dct:languageEU Language Authority IRI0..n
dcat:keywordrdf:langString0..n
dcat:landingPageIRI0..n
dct:sourceIRI0..n
dct:created DCxsd:date or xsd:dateTime; the lexical form may not be ISO 8601 in v1.
v2.0: ISO 8601 value required
0..1
dct:issuedxsd:date or xsd:dateTime; the lexical form may not be ISO 8601 in v1.
v2.0: ISO 8601 value required
0..1
dct:modifiedxsd:date or xsd:dateTime; the lexical form may not be ISO 8601 in v1.
v2.0: ISO 8601 value required
0..1
dcat:versionxsd:string0..1
dct:creatorfoaf:Organization or foaf:Person0..n
v2.0: 1..n
dct:publisherfoaf:Organization or foaf:Person0..1
v2.0: 1..1
dct:spatialIRI (e.g. GeoNames). DCAT-AP-NL also allows dct:Location with dcat:bbox / dcat:centroid / dcat:geometry, but the register stores IRI references only.0..n
dct:temporaldct:PeriodOfTime blank node with dcat:startDate and/or dcat:endDate0..n
dct:isPartOf DCIRI or literal in v1
v2.0: HTTPS IRI required
0..n
dct:hasPart DCATIRI0..n
dct:isReferencedByIRI0..n
dct:accrual­PeriodicityEU Frequency NAL IRI0..1
dcat:distributiondcat:Distribution (see below)0..n

dcat:Distribution

The objects of dcat:distribution dataset properties have type dcat:Distribution.

DCAT termData type / notesCardinality
dcat:accessURLIRI
v2.0: HTTPS IRI
1..1
dcat:mediaTypeIANA media type IRI. Required for download distributions; APIs use dct:conformsTo instead. Any compression suffix is split off into dcat:compressFormat.0..n
v2.0: 0..1
dcat:compressFormatIANA media type IRI; added when e.g. +gzip is stripped from dcat:mediaType0..1
dct:conformsToProtocol IRI (e.g. <https://www.w3.org/TR/sparql11-protocol/> for SPARQL endpoints)0..1
dct:issuedxsd:date or xsd:dateTime0..1
dct:modifiedxsd:date or xsd:dateTime0..1
dct:titlerdf:langString0..n
dct:descriptionrdf:langString0..n
dct:languageEU Language Authority IRI0..1
dct:licenseIRI or literal in v1; v2.0: IRI required. Inherited from the dataset if not specified. The register requires a license to exist on the distribution or the dataset via DistributionLicenseRequiredShape.0..1
dcat:byteSizexsd:integer (bytes)0..1
foaf:pageIRI to a documentation page (SPARQL UI, download landing page, etc.)
v2.0: HTTPS required
0..n
odrl:hasPolicyODRL policy associated with the distribution0..n

foaf:Agent

The objects of both the dct:creator and dct:publisher dataset properties are foaf:Agent instances — concretely either foaf:Organization or foaf:Person. The publisher carries additional properties beyond those available on the creator.

PropertyData type / notesCardinality
foaf:namerdf:langString — the organization or person name1..n (one per language)
foaf:nickAlternate name (publisher only)0..n
dct:identifierIdentifier (publisher only)0..1
foaf:mboxEmail address as a literal (publisher only)0..1
owl:sameAsEquivalent entity IRI (publisher only)0..n

Registration

The shapes below describe how the register tracks registrations themselves – not the public dataset description that consumers query.

schema:EntryPoint

Any URL registered by clients is added as a schema:EntryPoint to the Registrations graph.

Datasets are fetched from this URL on registration and when the crawler runs.

PropertyDescription
schema:additionalTypeComputed registration status:
  • <https://data.netwerkdigitaalerfgoed.nl/registry/valid> — fetched and passed SHACL validation
  • <https://data.netwerkdigitaalerfgoed.nl/registry/invalid> — fetched but failed SHACL validation; see schema:validUntil
  • <https://data.netwerkdigitaalerfgoed.nl/registry/gone> — could not be fetched as a dataset description. Covers HTTP error responses (≥ 300) and non‑HTTP failures: parse errors, unrecognised content types, and URLs that returned 200 but contained no dataset triples.
schema:datePostedUTC datetime when the URL was registered.
schema:dateReadUTC datetime when the URL was last read by the application. The crawler updates this value when fetching descriptions.
schema:statusThe HTTP status code last encountered when fetching the URL.
schema:validUntilIf the URL has become invalid, the UTC datetime at which it did so.
schema:aboutThe schema:Datasets found at this URL. A registration URL may describe a single dataset (one entry) or a catalog of multiple datasets (multiple entries). The crawler updates this value when fetching descriptions.

schema:Dataset

Each dataset that is found at the schema:EntryPoint registration URL gets added as a schema:Dataset to the Registrations graph.

PropertyDescription
schema:dateReadUTC datetime when the dataset was last read by the application.
schema:subjectOfFrom which registration URL the dataset was read.

schema:Rating

A separate named graph keeps a schema:Rating instance for each dataset description, indicating how complete the description is. Reach it from a dataset via the schema:contentRating property.

PropertyDescription
schema:bestRatingThe highest possible rating.
schema:worstRatingThe lowest possible rating.
schema:ratingValueRating for the dataset description.
schema:ratingExplanationExplanation for the rating: which properties are missing?

Distribution health

For every distribution URL referenced by a registered dataset, the crawler periodically issues a probe (an HTTP HEAD/GET or a SPARQL ASK, depending on the distribution type) and records the outcome in a dedicated named graph:

https://datasetregister.netwerkdigitaalerfgoed.nl/sparql/distribution-health

Distribution health is enrichment data produced by the register, not metadata supplied by publishers. Keeping it in its own named graph – parallel to the dataset and registration graphs – makes that origin explicit: consumers can opt in or out of it cleanly, and the register can re-probe, prune, or reset the data without touching the published DCAT description.

Vocabulary prefix: nde-probe: <https://def.nde.nl/probe#>.

nde-probe:DistributionHealthRecord

Each probed URL appears as a nde-probe:DistributionHealthRecord whose IRI is the distribution URL itself.

PropertyData type / notesCardinality
nde-probe:lastProbedAtxsd:dateTime — UTC timestamp of the most recent probe attempt.1..1
nde-probe:lastOutcomeOutcome IRI of the last probe; absent when the last probe succeeded. One of the IRIs listed under Probe outcomes below.0..1
nde-probe:lastSuccessAtxsd:dateTime — UTC timestamp of the most recent successful probe, if any.0..1
nde-probe:firstFailureAtxsd:dateTime — UTC timestamp at which the current failure streak began. Cleared on the next success.0..1
nde-probe:consecutiveFailuresxsd:integer — length of the current failure streak. Reset to 0 on the next success.1..1

Probe outcomes

When a probe fails, nde-probe:lastOutcome is one of:

Outcome IRIMeaning
nde-probe:NetworkErrorConnection refused, DNS failure, TLS error, timeout, or any other non-HTTP transport failure.
nde-probe:NotFoundHTTP 404 or 410.
nde-probe:ServerErrorHTTP 5xx.
nde-probe:AuthRequiredHTTP 401 or 403.
nde-probe:RateLimitedHTTP 429.
nde-probe:ContentTypeMismatchResponse was reachable but served the wrong content type. For a data dump, its Content-Type did not match the declared dcat:mediaType / dct:format / schema:encodingFormat. For a SPARQL endpoint, the response was not a SPARQL results media type – most often an HTML page, meaning the access URL points to a SPARQL query web UI rather than the SPARQL protocol endpoint itself. The fix is to put the SPARQL protocol endpoint in dcat:accessURL (schema:contentUrl) and declare the query UI on foaf:page (schema:documentation) instead.
nde-probe:ContentTypeMissingResponse had no Content-Type header at all.
nde-probe:EmptyBodyResponse body was empty for a distribution that should have returned data.
nde-probe:SparqlProbeFailedThe distribution declares a SPARQL endpoint (dct:conformsTo <https://www.w3.org/TR/sparql11-protocol/>) but the probe ASK query did not return a valid SPARQL result.
nde-probe:RdfParseFailedThe body was returned but could not be parsed as RDF.

Effect on validation results

Probe failures also surface in the SHACL validation report as sh:ValidationResult nodes. See Validation: how probe failures appear in the report for the constraint components, the extra properties they carry, and how strict each caller (registration, validation, crawler) is.

Allow list

A registration URL must be on a domain that is allowed before it can be added to the Register. The allow list lives in the https://data.netwerkdigitaalerfgoed.nl/registry/allowed_domain_names RDF graph. Each entry is a blank node with a single property:

PropertyDescription
https://data.netwerkdigitaalerfgoed.nl/allowed_domain_names/def/domain_nameLiteral: either a registrable domain (example.com) or a specific subdomain (sub.example.com). A registrable domain implicitly covers all its subdomains.

To modify the allow list, use the REST API (POST /allowed-domains); the SPARQL endpoint is read-only.