Skip to content

Data Integration

GCHQDeveloper42 edited this page May 18, 2021 · 1 revision

Why Care About Data Integration?

It can be useful to consider for a while what we might mean when we use the word "integrate" in association with data. To make use of a dictionary definition for the verb (e.g. lexico.com) it relates to the "combination of one (thing) with another to form a whole". This will have later resonance with the conceptual model that allows us to achieve integration but before we get to that part we can consider "integration" more generally to be the activity of bringing two more things together to serve a combined purpose. The reason that we start with this term is because it is often used without qualification in project proposals and corporate visions; but what does it take to integrate things in general? If we can generalise then surely we can reduce the waste that happens when we find that a more limited approach needs rework or isn't up to the task that was originally (or subsequently) set. This waste is a common experience when systems are intended to exchange data (i.e. interoperate) even if it is intended to be a nailed-down point-to-point interface. The vision of an approach that allows a generalised approach to integrating data, whatever the requirement, is something that could provide a path to allowing inter-system, data-based interactions to be able to operate in a combined way that results in a greater "whole"; conceptually 'as one'. It can therefore be inferred that the solution to this problem is more fundamental than data formatting and syntactic aspects of data exchange between systems. The solution requires an approach that can ensure that there is precision of representation, whatever the format, to ensure that there is almost* no ambiguity between records within any connected system that is "integrated" in this generalised way.

The study of representation in data can be both abstract and complex (see links here for further reading) but we as humans take much of it for granted in our day-to-day interactions. However, we shouldn't kid ourselves that it is easy for humans to communicate information meaningfully between each other in an unambiguous manner. In fact, we have developed many ways of coping with ambiguity and we still fall foul of it frequently. One thing we can recognise in humans is that when semantic precision is required we ensure that the language employed and definitions of the key terms are agreed in advance and are at the quality required to ensure minimal ambiguity. When we look to computers to handle representations of things that we have a common interest in as data then we need to ensure that even greater semantic precision is achieved, at least within the systems of interest, because computers based on logic don't have the capability to address ambiguity in the way that we do (this statement is specifically about representational aspects of data quality and is not about what additional processing can, or could, be done to improve that quality algorithmically).

Lesson in Data Quality: If you want your data to be used consistently as your system evolves, including integrating with other information systems, then you have to consider how to achieve this as early as possible; ideally at the start of a project, programme or initiative.

It is this lesson that was the primary inspiration for taking the data model that is used in the Magma Core code base seriously. The design of the data model and the architecture that Magma Core subsequently enables a rigorous approach to the analysis of what the data is about (what it is intended to represent) and a straightforward approach to mapping between the analysis and design of a Magma Core based information system that can ensure consistency in implementation. Addressing data quality involves more than just addressing the consistency challenge. However, historically, the subject of consistency in data quality has not included the inter- and intra-system representational consistency, relying primarily on definitional aspects. The Magma Core code allows a comprehensive approach to the consistency challenge from which many of the other properties of data quality become easier to address.

Next: A Path to Data Integration


* We are only human and we shouldn't fall into the trap that we can completely remove ambiguity of meaning in data!

Clone this wiki locally