Ga naar hoofdinhoud

Stack

Draft – under review

This chapter proposes engineering choices, naming conventions, and operational patterns for the NDE network which are not yet endorsed by NDE. Feedback welcome via the issue tracker.

Introduction

The NDE Stack is the working name given here to the ecosystem of NDE-compatible components and the operational patterns that compose them, grounded in the network’s shared standards. The Stack’s goal is to help software developers in the NDE network solve shared functionality once rather than reinventing it in each project. Most of what appears here is new: proposed components and patterns yet to be built. The rest is existing software given a role in the Stack.

The Stack operationalises the architecture sketched in Van data naar dienst: visie op de ontwikkeling van verbonden erfgoeddiensten (NDE, November 2025), covering the Data and Presentation Layers of a Service Platform. The Stack also includes the Publication Layer and connects to source-side data management. As more of the network’s functionality is named, the Stack grows. Where the report leaves gaps, the Stack fills them.

These chapters are also meant to create a shared language and understanding across the network: a common vocabulary for the components and patterns that builders, operators, and decision-makers can use when discussing what the Stack does and how it fits together. Naming the parts is the prerequisite for talking about them coherently across teams and organisations.

Scope

This is: a bridge from the report’s architectural vocabulary to running code. Names concrete components (existing and proposed), default operational patterns, and the foundational technologies the Stack depends on.

This isn’t: a restatement of the report. The report describes what should happen at each step of a Service Platform; this guidance proposes how, at the engineering level, with named patterns and packages.

Goes beyond the report where useful: some practical engineering concerns are not in the report, such as semantic search, snapshot-CDC (change data capture) deletion handling, per-source outage resilience, declarative standards-backed pipeline configuration. The Stack picks these up as natural extensions of the report’s framework, flagged in context where they appear so a reader can tell report-grounded content from Stack-direction extensions.

Primary audience: builders of Service Platforms and network services within the NDE network.

Status of contents: many components are proposals (@lde/* or @ndes/* packages that do not exist yet). The function-mapping table marks them as such. Patterns labelled “Proposed” have been discussed but not endorsed.

Taxonomy

The Stack uses a small vocabulary consistently.

TermDefinitionExamples
ComponentSoftware the Stack provides@lde/* packages, @ndes/* packages
PatternOperational mechanicBlue/green Rebuild, SCHEMA-AP-NDE-first, Ports & Adapters
ServiceA running instance of a Component, deployed with a specific configurationA Service Platform running the search projection with its own SHACL and search configuration
Network serviceA Service operated by NDE, network-wide, addressable as a single canonical endpointDataset Register, Network of Terms, Dataset Knowledge Graph
StandardA network commitment the Stack adoptsSCHEMA-AP-NDE, LDES, IIIF, DCAT-AP 3.0 / Schema.org Dataset
Foundational technologyUpstream open-source dependency outside NDE governanceQLever, Typesense, nginx, Fastify, Mercurius

Components

The Stack provides the components below. They live in the Service Platform side chapter; this catalog is a quick index. The Pipeline chapter shows how the pipeline components compose.

ComponentLayerBrief
Search Pipeline (proposed)DataBuilds a search index of records from selected datasets
Knowledge Graph PipelineDataBuilds a queryable knowledge graph from selected datasets
Search APIs (proposed)DataSearch and filter API that Presentation Layers consume
Knowledge Graph APIsDataQuery interfaces for the Stack’s knowledge graphs: DKG (operational), Term Backlink Graph (proposed), Knowledge Graph voor Termen (proposed), and any self-operated KG a Service Platform builds
Change Stream Producer (proposed)DataPublishes a Service Platform’s data changes as a feed other systems can subscribe to
Heritage UI Components (future)PresentationReusable display components

Foundational technologies

The Stack is built on top of mature open-source infrastructure that lives outside NDE/LDE governance. These are dependencies, not Stack components; release cycles, roadmaps, and breaking changes follow upstream projects. The Stack picks opinionated defaults and treats them as exchangeable for any conformant alternative; substitutes plug in behind ports defined by the Ports & adapters pattern, so a swap is a configuration concern rather than a code rewrite.

Defaults favour operational lightness: solutions that are simple to run on LDE’s shared infrastructure, on national infrastructure, and by individual service providers on their own hardware, so the same Stack stays realistic to operate at every scale. That criterion is why the default search engine is Typesense rather than the heavier Elasticsearch.

ConcernStack defaultRealistic substitutes
RDF triplestore / SPARQL engineQLever – read-only-after-load, fast bulk-load, fits blue/green rebuildOxigraph, GraphDB, Jena Fuseki
Search engineTypesense – used by the search pipelineElasticsearch, OpenSearch (each behind their own adapter at the search pipeline’s engine boundary)
Reverse proxynginx – default for proxy-level blue/greenHAProxy (runtime API for zero-reload switching), Caddy (admin-API hot reload), Envoy
Web / API runtimeFastify – one runtime for both API styles: REST via @fastify/swagger for the OpenAPI surface (used by the Dataset Register API today), GraphQL via Mercurius (proposed @lde/graphql-server)REST: any OpenAPI-capable framework. GraphQL: Apollo Server, Yoga, any GraphQL.js-based runtime

Substrates

Three distinct bodies of source data (substrates) underlie the Stack. Each pipeline rides on exactly one. They differ in source, scale, cadence, and consumers; the layer pages refer back to them by letter.

SubstrateWhat’s crawledSource(s)Scale / cadenceFeeds
A. Dataset descriptionsDCAT-AP metadata about datasets (titles, publishers, distributions, licenses, subjects)Publishers’ DCAT-AP descriptions, harvested by the NDE DatasetregisterSmall, metadata-only. Refresh frequently (daily)Enumeration for B: which datasets exist and where their distributions live. Catalog Search Pipeline over the descriptions themselves (the Dataset Register browser), enriched with DKG facets
B. Metadata recordsMetadata records inside each distribution (instances of CreativeWork, Person, Place, …)Per-dataset SPARQL endpoint or RDF dumpLarge, per-dataset. Refresh per source on last modified dateObject Search Pipeline (records inside distributions); Dataset Knowledge Graph; Term Backlink Graph (data-model-agnostic vocab walk)
C. TermsTerms and the relations between them, across terminology sourcesTerminology sources, aggregated by the NDE Network of TermsMedium, vocabulary-scoped. Refresh on vocab updatesThe report’s “Knowledge Graph voor Termen” (function 5)

Observations that fall out of this separation:

  • A enumerates B, not C. B’s pipelines read the Register to learn which datasets exist and where their distributions live. C is enumerated separately, from the Network of Terms catalogue of terminology sources. Part of that catalogue may migrate into the Register over time, but external sources like GeoNames, AAT and Wikidata stay outside it, so C keeps its own enumeration.
  • Object search over B is the norm; only the register indexes A. A Service Platform’s Search Pipeline reads the register (substrate A) only to enumerate which datasets to crawl; the records it ingests are the objects inside their distributions (substrate B) – each CreativeWork, Person, or Place. The dataset descriptions themselves never enter that index. The one exception is the Dataset Register’s own browser: being the catalog, it indexes substrate A directly – the dataset descriptions are its records – enriched with DKG facets. Same pipeline and stages either way; only the substrate and record grain differ.
  • B carries multiple projections. The same crawl feeds three structurally different sinks: an AP-aware search-document projection (Search Pipeline), a VoID (Vocabulary of Interlinked Datasets) statistical summary (Dataset Knowledge Graph), and a data-model-agnostic term-backlink projection (Term Backlink Graph). All three are enumerated from A but compute over B’s contents: what differs is the projection, not the substrate.
  • Change cadences differ. B changes most often: records are added, updated, and removed at source as collections grow. A changes less often: dataset metadata is updated more rarely than the records it describes. C is most stable: vocabularies move slowly. Stack components inherit the change cadence of the substrate they ride on.