This documents the nTangle architecture and corresponding code-generation when targeting a Microsoft SQL Server database.
At its core nTangle is a database-driven code-generation solution, in that it leverages an existing source database to infer its underlying schema (tables and columns). A YAML (or JSON) configuration file is required to define the CDC requirements, being the root and child tables, and their relationships, representing the entity (aggregate root). The nTangle code-gen tool connects to the database, and using the YAML configuration generates the required database (source and sidecar), and .NET publishing runtime artefacts. This publishing runtime then uses the database's CDC capabilities to detect changes to the underlying tables and trigger the creation of the related entity events that are then published.
The core components are:
- Database - the existing tables (source database), plus generated nTangle runtime artefacts (including outbox) in separate sidecar database;
- Config - the nTangle code-gen configuration;
- Tooling - the nTangle code generator and resulting generated publisher .NET runtime;
- Events - the resulting published entity event (depicted as a CloudEvent).
This official documentation describes the Microsoft SQL Server CDC-capabilities.
In addition, this article provides an excellent overview of the Microsoft SQL Server CDC-capabilities and walks through the process of setting up and using to aid in the fundamental understanding of this key dependent capability.
Note: the SQL Server Agent service must be running for CDC to function correctly.
The introduction of CDC into a database may have an impact on the database performance, and as such it is important to consider this impact; see performance considerations.
Within the nTangle configuration the CdcEnable
property can be used to enable/disable the automatic generation of the CDC-related sys.sp_cdc_enable_table
TSQL code. This allows the developer and/or database administrator to manually enable CDC on the required tables external to nTangle code generation. By default, the auto CdcEnable
property is set to false
.
Given the above, it is generally recommended that CDC enablement is explicity managed by the developer and/or database administrator. Additionally, as schema changes are made to the underlying tables the CDC configuration may need to be changed accordingly; this does not happen automatically within SQL Server. See this article for more information.
Note that nTangle does not require CDC to capture changes to all columns to function, only the primary and joining columns are required. Capturing only this subset will aid in minimizing impact on performance and be more flexible to ongoing schema changes.
Microsoft SQL Server provides two change tracking capabilities, namely CDC and Change Tracking. Change Tracking differs in that it only captures the fact that rows in a table were changed, but doesn't capture the data that was changed. Given the multiple related table hierarchy triggering that nTangle enables the joining columns as well as the primary key columns are required to be captured; this is not possible with Change Tracking as it only captures the primary key columns; therefore, making it unsuitable for walking back up the join hierarchy to determine the primary entity (aggregate root) that was changed. And largely impossible where a child table row has been physically deleted.
As of version 3.0.0
the preferred (recommended and default) approach is to use a sidecar database to manage the nTangle runtime artefacts. This is to limit changes to the source database beyond the requirement for CDC itself.
Usage of a sidecar database will also limit impact (load and data) on the source database by minimizing access to the required CDC and related data selection only. Otherwise, the required runtime orchestration will leverage the sidecar database only.
Note that there are no cross database dependencies; as such, the sidecar database can be hosted separately, be on a different version, etc. as required. The .NET orchestrator logic will require access to both databases to function.
The following represents the high-level conceptual run-time architecture for a solution leveraging nTangle.
The SQL Server databases are as follows:
- Source - the existing database (tables and columns) that are being captured on change (CDC).
- Sidecar - the nTangle-required runtime artefacts (tables and stored procedures) that are used to orchestrate the CDC-based event publishing.
The key .NET components are as follows.
- Orchestrator - one per entity (aggregate root); change tracking and event publisher orchestrator (execution phases depicted).
- Hosted service - one per entity (aggregate root); timer-based hosted service responsible for executing the orchestrator (optional).
- Outbox dequeue publisher - event outbox dequeue and publishing (execution phases depicted).
- Outbox dequeue hosted service - timer-based hosted service responsible for executing the outbox dequeue publisher.
The XxxOrchestrator
is responsible for the primary orchestration of the CDC-related change tracking and event publishing - this is essentially the "secret sauce" behind nTangle. This is achieved by executing (orchestrating) the following phases in the order specifed.
- Change detection - detects changes to the underlying source tables and packages within a batch;
- Consolidation - consolidates changes to minimize redundant event publishing and upserted batch configuration;
- Identifier mapping - assign global identifier mappings (where applicable);
- Versioning - version events to minimize publishing with same content;
- Publishing - publish / send unique events to selected destination;
- Completion - complete batch and record latest version hashes.
The XxxOrchestrator
inherits from
EntitySidecarOrchestrator
, which in turn inherits from EntitySidecarOrchestratorBase
and EntityOrchestratorCore
, to enable the standardized orchestration processing.
An orchestrator is generated per entity (the aggregate root which has a naming convention of XxxCdc
). The entity is essentially the .NET representation of the configured root and child table hierarchy, and selected columns as properties.
A batch check is performed against the sidecar database using the spXxxBatchExecute
stored procedure to determine if there is already an incomplete Batch and attempt to reprocess; otherwise, determine current CDC position from last complete Batch to continue. The batch tracking is persisted in the XxxBatchTracking
table.
The orchestrator will the select (detect) the changes as enabled by the SQL Server CDC capabilities in the source database. This is achieved by invoking the XxxExecuteBatch
SQL statement and updating the corresponding result sets into the .NET entity equivalents. The following steps are performed.
Step | Description |
---|---|
LSN check | Get minimum and maximum LSNs for each table. If the minimum is less than previous Batch minimum then there is a CDC data loss scenario and some changes will be lost as a result; this will error unless option to continue with data loss is selected. |
Root CDC | Gets all CDC changes (create, update and delete) for the table (see fn_cdc_get_all_changes_ ) up to the maximum query size, and stores the result into a temporary #changes table. |
Child CDC | Gets all CDC changes (create, update and delete) for each child table up to the maximum query size joining against the parent table(s) to ensure existence. Appends the result into the temporary #changes table where distinct (as per root table primary key). |
Batch select | Select the resulting batch LSN values so that can be persisted within the sidecar database by the orchestrator. |
Root query | Select result set using the temporary #changes table to left outer join to the root table (latest). |
Child query | Select result set using the temporary #changes for each child table using inner joins to ensure data is selected for only what currently exists within the database (latest). |
The orchestrator will consolidate the selected data returned from the change detection phase. The following consolidations are performed for each distinct primary key from the root table (being the aggregated entity), in order specified. The result of the consolidation phase is that there should be zero or more records (one per primary key from the root table).
Step | Description |
---|---|
Delete wins | Where there is a create and/or, one or more updates, followed by a delete, then only the delete will be considered for publishing. Can not include intermediary create/update events as the data is no longer available (physically deleted). |
Create wins | Where there is a create, followed by one or more updates, then only the create will be considered for publishing. |
Where a delete is referenced above, this relates to both physical and logical deletes equally (as per configuration).
Once consolidated the orchestrator will manage the batch tracking, either update (where previous was incomplete) or insert (where new) the batch data into the XxxBatchTracking using the spXxxBatchTracking
stored procedure. This stored procedure will also query for the latest version hash and identifier mapping (where configured) data for entities in the batch; as needed by the following phases.
The orhestrator (where identifier mapping is configured) will assign new global identifiers generated using IIdentifierGenerator<T>
to each of the selected columns where no value was previously selected during the consolidation phase.
The spIdentifierMappingCreate
stored procedure is then invoked to persist any new mappings into the IdentifierMapping
table; where mappings are already assigned, then the previously assigned value will be returned for use and the newly allocated value discarded. This may occur where concurrent access is being performed against the same related identifier.
The orchestrator will version each record by JSON serializing the data (removing any properties that should be excluded) and then SHA256 hashed. This is compared to the existing version (from the VersionTracking
table) selected during the consolidation phase. Where the version has not changed for the entity then no publish will occur; avoids sending same event content more than once.
The orchestrator will instantiate an EventData
per entity item, then invoke the IEventPublisher.Publish
passing all events to be published.
The default is to use the generated EventOutboxEnqueue
which enqueues all events using stored procedure spEventOutboxEnqueue
into the EventOutbox
and EventOutboxData
tables. This can be configured to use an alternate IEventPublisher
where required.
There are multiple advantages of using the event outbox, a) a log of published events will be maintained, b) performance of orchestrator may be improved, c) dependency of external destination removed from orchestrator, and d) multiple sends will have the same event identifier which is useful in duplicate detection scenarios guaranteeing at least once delivery.
The orchestrator will complete the batch and update the latest version tracking hashes. This phase is managed by the spXxxBatchComplete
stored procedure. The following stored procedure steps are all performed within a database transactional context.
Step | Description |
---|---|
Batch complete | Update the batch as complete within the XxxBatchTracking table, including updating the completed date/time. |
Version tracking | Create or update (merge) the latest versions for each of the records published into the VersionTracking table. |
The XxxHostedService
is responsible for hosting the Orchestrator at runtime. This is a synchronized timer-based hosted service which will continue to execute on a configured interval until the process host is stopped.
An IServiceSynchronizer
is required by the XxxHostedService
to ensure that only a single serivce per root table is running at any given time. This is primarily needed to ensure that the explicit order of changes in honored; otherwise, events for the root table would be emitted out of sequence (and/or duplicated).
The XxxHostedService
is not explicitly required and can be replaced with any other host implementation, i.e. timer-based trigger Azure Function that directly invokes the XxxOrchestrator
. As above, this host will need to manage the sychronized singleton behavior; see BlobLeaseSynchronizer
.
The EventOutboxHostedService
is responsible for managing the dequeue of events from the database and then sending. This is achieved by executing the following phases in the order specifed.
- Dequeue - dequeue one or more events from the outbox;
- Sending - send events to selected destination.
The phases are encapsulated within a database transaction to ensure that the publishing completes successfully before the dequeue is committed. As a result this will ensure guaranteed delivery, but may result in messages being sent more than once. The event receiver may be required to perform duplicate detection on the events where duplicates can not be tolerated.
The EventOutboxHostedService
property EventOutboxDequeueFactory
must be set to instantiate an instance of the EventOutboxDequeueBase
, being the generated EventOutboxDequeue
. This manages the final IEventSender
to send the events to the final destination.
The events will be dequeued (up to the maximum dequeue size) from the database using stored procedure spEventOutboxDequeue
; this will dequeue the events from the underlying EventOutbox
and EventOutboxData
tables.
The EventOutboxDequeue
manages the final IEventSender
to send the events to the final destination.
To leverage the likes of Azure Service Bus consider the CoreEx ServiceBusSender
.
The EventOutboxHostedService
is responsible for hosting the Outbox dequeue publisher at runtime. This is a synchronized timer-based service which will continue to execute on a configured interval until the process host is stopped.
In addition to the primary sidecar behavior, the XxxOrchestrator
can be explicitly executed (ExecuteExplicitAsync
) for a specified set of primary keys bypassing CDC and batch tracking.
This is useful for scenarios:
- Where a specific set of records need to be reprocessed or where the CDC data has been lost;
- As a one-off operation to perform an initial load of data to be published;
- Alternate triggering where nTangle CDC is not permissable (i.e. using Debezium, etc) to get the best of both worlds, being an external trigger and leveraging the entity-based publishing.
The hosting of the XxxOrchestrator
to enable explicit execution is the responsibility of the developer; i.e. nothing is generated to support.
The code-generator will leverage the ntangle.yaml configuration to generate the requisite source database, sidecar database, and .NET artefacts.
Where Xxx
is referenced in the artefact name this is replaced with the name of the entity (root aggregate). Also, the artefact name represents the default, there are opportunities within the ntangle.yaml
to change the behavior of these where applicable.
Finally, features such as event outbox and identity mapping are configurable; where not leveraged their respective artefacts will not be generated.
The AppName.Database
project generated artefacts are as follows.
Type | Artefact | Description |
---|---|---|
Script | CdcEnable.post.deploy |
Turns CDC on for the selected tables (where configured). |
The AppName.SidecarDb
project generated artefacts are as follows.
Type | Artefact | Description |
---|---|---|
Schema | NTangle |
Creates the database schema. |
Stored procedure | spXxxBatchComplete |
Performs the batch completion (per entity). |
Stored procedure | spXxxBatchExecute |
Performs the batch execution (per entity). |
Stored procedure | spXxxBatchReset |
Performs the batch LSN reset (per entity). |
Stored procedure | spXxxBatchTracking |
Performs the batch tracking (per entity). |
Stored procedure | spEventOutboxDequeue |
Performs the event outbox dequeue. |
Stored procedure | spEventOutboxEnqueue |
Performs the event outbox enqueue. |
Stored procedure | spIdentifierMappingCreate |
Performs the identifier mapping management. |
Table | XxxBatchTracking |
Batch tracking (per entity). |
Schema | Outbox |
Creates the event outbox database schema. |
Table | EventOutbox |
Creates the event outbox table. |
Table | EventOutboxData |
Creates the event outbox data table. |
Table | IdentifierMapping |
Creates the identifier mapping table. |
Table | VersionTracking |
Created the version (hash) tracking table. |
The AppName.Publisher
project .NET generated artefacts are as follows.
Namespace | Artefact | Description |
---|---|---|
-- | ServiceCollectionExtensions |
IServiceCollection extension methods for Dependency Injection (DI) set up. |
Data | XxxOrchestrator |
Change tracking and event publisher orchestrator (per entity). |
Data | EventOutboxDequeue |
Event outbox dequeue. |
Data | EventOutboxEnqueue |
Event outbox enqueue. |
Entities | XxxCdc |
Entity (aggregate root) representation of database table(s) and relationships (per entity). |
Resources | XxxExecuteBatch.sql |
TSQL statement (source database) to execute the batch using CDC (per entity). |
Resources | XxxExecuteExplicit.sql |
TSQL statement (source database) to execute explicitly without Batch and CDC (per entity). |
Services | XxxHostedService |
Timer-based host for the XxxCdcOrchestrator . |