forked from coolharsh55/phd-thesis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathconclusion.tex
199 lines (149 loc) · 37.8 KB
/
conclusion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
\chapter{Conclusion}\label{chapter:conclusion}
This chapter concludes the thesis with a discussion on the extent to which the research question (\autoref{sec:intro:RQ}) and objectives (\autoref{sec:intro:RQ}) have been addressed through presented work. The chapter also presents resulting contributions which were previously summarised in \autoref{sec:intro:contributions}, and its comparison with similar approaches in state of the art (SotA) identified in \autoref{chapter:sota}. The chapter concludes with potential avenues for further work arising from research presented within the thesis.
% RESEARCH OBJECTIVES
\section{Fulfilment of Research Objectives}\label{sec:conclusion-RO}
The research question guiding the work presented in this thesis, defined in \autoref{sec:intro:RQ}, is - ``\textit{To what extent can information regarding activities associated with processing of personal data and consent be represented, queried, and validated using Semantic Web technologies for GDPR compliance?}''.
Five research objectives were identified which guided the work towards answering the research question.
This section discusses the extent of their fulfilment based on work presented in previous chapters of the thesis.
% $RO1$: Identify the subset of GDPR relevant for activities associated with personal data and consent regarding ex-ante and ex-post compliance.
% $RO2$: Identify information required to represent activities associated with personal data and consent towards investigations of GDPR compliance.
\subsubsection*{Fulfilment of \textit{RO1} and \textit{RO2}}
The first two research objectives ($RO1$ and $RO2$) were concerned with identifying the subset within GDPR regarding activities associated with personal data and consent, and the information required for evaluating its compliance. This was fulfilled by work presented in \autoref{chapter:information}.
The first research objective ($RO1$) required identification of a subset of GDPR relevant to activities associated with personal data and consent in ex-ante and ex-post phases of compliance.
% This was achieved through the analysis of literature and resources associated with GDPR compliance, including the text of the legislation, and opinions, guides, reports produced by authoritative bodies such as data protection commissions, Article 29 Working Party (A29WP), and the European Data Protection Board (EDPB).
The second research objective ($RO2$) was to identify information required to represent activities associated with personal data and consent for GDPR compliance based on the identified clauses of the GDPR from $RO1$.
To facilitate this, an information model was developed to explore the entities and their relationships with respect to information exchange guided by GDPR compliance requirements (\autoref{sec:info:model}). The model provided an analysis of GDPR in the form of requirements and processes associated with information for compliance, and was used to categorise the requirements of information as Provenance, Agreements, Consent, Certification, and Compliance. These categories were then used to analyse and identify the nature and source of information required for compliance, and its relationship with entities and stakeholders defined by the GDPR.
The information requirements were expressed in the form of questions, termed `compliance questions' (\autoref{sec:info:compliance-questions}), which provided structure for identifying information required to answer them for evaluating compliance.
Authoritative sources used in the information gathering process for $RO1$ and $RO2$ included European Data Protection Commissioner's offices, reports and opinions produced by Article 29 Working Party regarding interpretation of GDPR, information about case law pertaining to the interpretation of the GDPR, and documents published by institutions providing legal services.
As the motivation for work was utilisation of semantic web to represent this information, the methodology of using `competency questions' was adopted to enable the formulation of an ontology from identified information \cite{noy_ontology_2001}.
This was done by interpreting the compliance questions as `competency questions' and identifying concepts and relationships about activities associated with personal data and consent to answer the questions (\autoref{sec:info:compliance-questions}).
From these compliance questions, a set of constraints were identified which the information needed to satisfy in order be valid for its use in evaluating compliance, and the assumptions which always hold true (\autoref{sec:info:constraints}). These constraints and assumptions were utilised in \autoref{chapter:testing} for development of an approach for validation of information as required to fulfil research objectives $RO4$ and $RO5$.
% RO3: Create OWL2 ontologies for expressing information about:
% a) concepts and text of the GDPR
% b) activities associated with personal data
% c) activities associated with consent
\subsubsection*{\textit{RO3}}
The third research objective ($RO3$) was to create ontologies to represent information identified in $RO2$ about activities associated with personal data and consent in ex-ante and ex-post phases for GDPR compliance.
$RO3$ was divided into three sub-objectives, which involved creation of ontologies for representing - (a) concepts and text of GDPR, (b) activities associated with personal data and consent, and (c) information regarding consent.
The objective was fulfilled through work presented in \autoref{chapter:vocabularies} consisting of GDPRtEXT, GDPRov, and GConsent ontologies for each of respective sub-objectives.
The ontologies were developed following well established methodologies \cite{noy_ontology_2001,suarez-figueroa_neon_2012,de_nicola_lightweight_2016} and best practices advocated by the semantic web community, as summarised in \autoref{sec:intro:ontology-engineering}.
They were evaluated based on their ability to represent information required to answer the competency questions \cite{noy_ontology_2001} from $RO2$, as well as against common pitfalls in design using the OOPS! tool \cite{poveda-villalon_oops!_2014}.
The ontologies were documented using the WIDOCO tool \cite{garijo_widoco_2017}, utilised persistent identifiers provided by W3ID in name-spaces, and were published under an open license in the public repository at Zenodo with DOIs.
Each ontology was compared against the state of the art to identify the extent of information representation and novelty of its concepts and approach.
The first sub-objective ($RO3(a)$) was fulfilled with the GDPRtEXT ontology (\autoref{sec:voc:GDPRtEXT}), which enabled unambiguous and machine-readable linking of information to concepts and text of GDPR. GDPRtEXT provided an OWL2 ontology to represent the structured text of GDPR as individual Recitals, Chapters, Sections, Articles, Points, Sub-Points, and Citations, by extending the European Legislation Identifier (ELI) ontology \cite{ELI_2012}. ELI is the authoritative ontology used by the European Publication Office to define metadata for all published documents. The extension mechanism used by GDPRtEXT maintains formal compatibility with ELI. Using GDPRtEXT, the text of GDPR was re-defined as linked data in machine-readable representations by assigning an unique identifier for individual resources, which made it possible to define machine-readable links to specific clauses of the GDPR. In addition, a thesauri of terms and concepts defined or referenced by the GDPR was provided using SKOS. GDPRtEXT thus fulfils $RO3(a)$ regarding provision of a mechanism for associating information with concepts and text of the GDPR.
The second sub-objective ($RO3(b)$) was fulfilled by the GDPRov ontology (\autoref{sec:voc:GDPRov}) by enabling representation of activities associated with personal data and consent in ex-ante and ex-post phases. GDPRov extends the PROV-O \cite{lebo_prov-o_2013} and P-Plan \cite{garijo_p-plan_2014} ontologies with terms and relationships relevant for GDPR, where PROV-O is the W3C standard for representing provenance information, and P-Plan is its extension for defining abstract models as plans which then get instantiated into activities having provenance. GDPRov extends the PROV-O and P-Plan ontologies to represent a model or plan of how processes are supposed to interact with personal data and consent (ex-ante phase), such as for collection, use, storage, and sharing. The model or plan can then be used as the template for activities to be carried out whose provenance (ex-post phase) is linked to the model. Apart from providing terms for addressing personal data and consent, GDPRov also enables representation of other activities defined by GDPR, such as handling rights and data breaches, which can similarly be depicted using a model or plan.
The third research sub-objective ($RO3(c)$) was fulfilled by the GConsent ontology (\autoref{sec:voc:GConsent}) by enabling representation of information associated with consent. GConsent expands upon the use of consent as an abstract entity in GDPRov by providing representation of contextual information associated with actors, state, relationships, and provenance of consent as required for compliance. In particular, it provides representations for association of purpose, processing, personal data, data subject, third parties, and delegates with a specific instance of consent. It also provides representations for contextual information such as the medium the consent was given, timestamp, and location. GConsent also provides the novel notion of `states' which reflect the status of consent for compliance and provide an indication of its use, such as `requested' or `explicitly given' or `invalidated', which are categorised based on whether they can be used as a valid legal basis for processing. GConsent thus provides a comprehensive ontology for the representation of information associated with consent for GDPR compliance and fulfils $RO3(c)$.
\subsubsection*{\textit{RO4} and \textit{RO5}}
The fourth research objective ($RO4$) was to create SPARQL queries that retrieve information about activities associated with personal data and consent for GDPR compliance. These SPARQL queries were formulated as semantic representations of compliance queries identified in $RO2$ and utilised ontologies created in $RO3$ to define concepts and relationships pertaining to GDPR. The SPARQL queries demonstrate the linking of retrieved information with relevant concepts and parts of the GDPR, as well as the creation of knowledge graphs for use in compliance processes. This work was presented in \autoref{chapter:testing}.
The fifth and final research objective ($RO5$) was the creation of a framework utilising SHACL to validate information regarding activities associated with personal data and consent, and linking the results to relevant concepts and clauses of GDPR. This was fulfilled by the work presented in \autoref{chapter:testing}.
The framework utilised SHACL to validate information based on the constraints and assumptions identified in $RO2$ and presented in \autoref{chapter:information}.
The validation tests utilised the developed ontologies in $RO3$ to define the concepts and relationships, and to annotate the test and their results with links to relevant clauses of the GDPR.
The framework also utilised SPARQL queries generated in $RO4$ to retrieve and validate information by using SHACL-SPARQL.
The framework was demonstrated and evaluated through a use-case generated from the consent mechanism on a real-world website, where the information associated with consent was validated for ex-ante and ex-post phases of compliance.
Information about the consent mechanism was represented using the developed ontologies (GDPRov and GConsent), with SHACL used to represent ex-ante and ex-post validation tests.
The ex-post approach validated individual instances of consent from provenance log of given consent, while the ex-ante approach validated the template used to provide information and choices for requesting consent.
A third approach was developed which utilised a combination of both ex-ante and ex-post approaches by validating common requirements on the consent template in the ex-ante, and persisting its results for reuse in the ex-post validation of unique validations for given consent in the provenance log.
The combined approach was shown to be more efficient in terms of reducing the number of validations as compared to individually validating ex-ante and ex-post requirements.
The SHACL tests defined for validation were annotated with an additional property that linked them with the relevant clauses and concepts of the GDPR using GDPRtEXT. This property was used to associate the validation test and result with the GDPR, and provided the basis for querying information regarding validation against GDPR clauses and concepts.
The framework thus demonstrated the validation of information for GDPR compliance regarding activities associated with personal data and consent, and linking the validation results with relevant clauses and concepts of the GDPR using GDPRtEXT, thus fulfilling $RO5$.
\section{Extent of semantic web technologies in addressing RQ}
The research question guiding this thesis focuses on the representation, querying, and validation of information as the basis upon which GDPR compliance is evaluated.
More specifically, it concerns activities associated with processing of personal data and consent - which, while being an important part of GDPR, represents only a subset of requirements in the GDPR.
To investigate the `extent' aspect in the research question, the research objectives were formulated to correspond with information representation ($RO3$), querying ($RO4$), and validation ($RO5$).
In addition to these, the use of linked data principles enables associating information - in general and including queries, validations, and results - with clauses of the GDPR.
This provides the argument for specifying that semantic web technologies provide information management with respect to its representation, querying, validation, and association with the GDPR in the process of compliance.
While the above is sufficient to cover the scope of the thesis, the domain of GDPR compliance (and legal compliance in general) has other areas where semantic web has been demonstrated to be capable from analysis of the state of the art.
Existing approaches have demonstrated use of semantic web technologies in representing information regarding compliance as deontic logic, norms, requirements, and other logic based formalism which are used to represent the requirements of GDPR and are evaluated as a measure of compliance itself (see SotA in \autoref{chapter:sota}).
These serve to prove that there is more than one way to represent and evaluate information towards evaluating GDPR compliance using semantic web.
At the same time, there is a lack of information representation and approaches utilising semantic web technologies for addressing the larger scope of information associated with GDPR compliance as discussed in \autoref{sec:info:model} regarding data governance, data processing agreements, and documentation of information.
The existing approaches - including the contributions of this thesis - provide the necessary building blocks for addressing representation of information required to evaluate compliance, expressing information about compliance itself, querying information, validating it for correctness, evaluating information for sufficiency to an expressed obligation or requirement, compiling reports or records, recording provenance of activities, and generating documentation for compliance.
Semantic web is notable in providing a consistent, coherent, interoperable, and modular set of technologies for carrying out the above activities - which makes their use in development of legal compliance solutions particularly attractive due to the complexity of the domain and a need to expand or specialise applications in use-cases.
Therefore, from the perspective of information and knowledge modeling - semantic web provides the foundational set of technologies useful towards carrying out the activities associated with GDPR compliance.
Future work within this domain largely consists of utilising existing approaches towards extending or revising existing work in an application-oriented manner with a few examples of these mentioned as future work in \autoref{sec:conclusion-future-work}.
To end the discussion with an analogy - \textit{Though all roads lead to Rome, we aren't there yet.}
\section{Contributions}\label{sec:conclusion-contributions}
This section provides a summary of contributions arising from the research presented in this thesis, which were initially summarised in \autoref{sec:intro:contributions}.
The thesis yielded two major contributions - using semantic web to enable linking of information with concepts and text of GDPR, and ontologies for representing information about activities associated with personal data and consent for GDPR compliance. The thesis also yielded minor contributions in the form of an information model for interoperability between entities associated with the GDPR, and a framework for querying and validating information for compliance using semantic web technologies.
The impact and extent of the contributions in terms of publications was listed in \autoref{sec:intro:publications}, which included 17 publications related to the work presented in this thesis.
The impact and relevance of the work presented in this thesis also includes participation in the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG) and its deliverable - the Data Privacy Vocabulary (DPV), as elaborated in \autoref{sec:intro:dpvcg}.
\subsection*{Major Contributions}
\subsubsection*{GDPR as a Linked Data Resource}
The first major contribution, represented by GDPRtEXT, enables association of information with the concepts and text of GDPR using linked data principles. It provides machine-readable unique identifiers for each specific part (Chapter, Article, clauses etc.) of the GDPR by representing its text in RDF using an extension of the European Legislation Identifier (ELI) ontology. It also provides a SKOS vocabulary of concepts and terms defined or represented within the text of the GDPR. The usefulness of GDPRtEXT has been demonstrated in its use to define the source of terms in the ontologies presented in this thesis, as well as in linking information related to compliance with the relevant concepts and clauses of the GDPR.
GDPRtEXT advances the state of the art in its provision of unambiguous and machine-readable representations of concepts and text of GDPR (see \autoref{sec:voc:gdprtext:evaluation}).
It is currently the only ontology addressing GDPR that extends ELI, and the only open ontology for concepts associated with the GDPR \cite{leone_taking_2019}.
GDPRtEXT is also the only approach providing a glossary of terms associated with GDPR compliance.
Its use and extension of ELI has had an impact on the development plans of the ontology by the EU Publications Offices by demonstrating the use of granular representation of legal clauses and the necessity of linking terms with their occurrences and definitions within the text.
It also had an impact on the creation of the DPV by providing a vocabulary of concepts linked to their definition and use in the GDPR.
GDPRtEXT has received 19 citations to date (excluding self-citations), and has been referenced by approaches in the SotA in context of modelling GDPR concepts.
GDPRtEXT is available under an open license (CC-by-4.0) along with its documentation at
\url{https://w3id.org/GDPRtEXT/}, and has been incorporated into Ireland's open data portal as an dataset with 5 star rating for satisfying linked data principles.
\subsubsection*{Ontologies for representing activities associated with personal data and consent}
The second major contribution is the GDPRov and GConsent ontologies, which together enable representation of information about activities associated with personal data and consent relevant for investigation of GDPR compliance.
GDPRov extends the existing ontologies of PROV-O and P-Plan with concepts and relationships specific to GDPR in order to represent provenance of personal data and consent at ex-ante and ex-post stages. Where ex-post representations are common as provenance logs, the ex-ante representations act as a model or plan or template of intended activities for evaluation of compliance. Furthermore, provenance logs (ex-post) can be linked to their models (ex-ante) to represent the relationship between planning and implementation of processes within an organisation. GDPRov also enables representation of other activities associated with the GDPR such as the handling of rights and data breaches.
Compared to the SotA (see \autoref{sec:voc:gdprov:evaluation}), GDPRov provides the most comprehensive representation of concepts and relationships for activities associated with GDPR.
It is also the only ontology to provide ex-ante and ex-post concepts within the same representation.
To date, the publication of GDPRov has received 18 citations (excluding self-citations). It has been used in an approach to model data flow diagrams (DFDs) for analyses of compliance \cite{debruyneOntologyRepresentingAnnotating2019}.
GDPRov has been released under an open license (CC-by-4.0) and is available along with its documentation at \url{https://w3id.org/GDPRov/}.
GConsent expands upon the abstract representation of consent in GDPRov to provide more verbose information regarding entities and contextual information relevant for the management of consent. It also provides the concept of `consent states' which reflect the use of consent as a valid legal basis and are useful in the representation and management of consent in information systems. To date, GConsent is the most comprehensive vocabulary regarding consent associated with the GDPR (see \autoref{sec:voc:gconsent:evaluation}).
GConsent had a direct impact on the representation of consent in the DPV by providing the concepts and competency questions associated with consent based on GDPR requirements.
GConsent has been released under an open license (CC-by-4.0) and is available along with its documentation at \url{https://w3id.org/GConsent/}.
Together, the three ontologies (GDPRtEXT, GDPRov, and GConsent) enable the representation of activities associated with personal data and consent for GDPR compliance, and to link information represented using them with the clauses of the GDPR.
This enables the use of metadata to annotate legal documents, and automation in the management of information by utilising aspects of querying and validation in the governance process.
\subsection*{Minor Contributions}
The minor contributions of this thesis are - an information model of entities and their relationships defined by the GDPR, and a framework utilising semantic web technologies for validating and evaluating information for GDPR compliance. The minor contributions complement the previously described major contributions by providing a theoretical basis in the form of an information model, and demonstrate the feasibility and usability of developed ontologies through an application for validating information for compliance.
The first minor contribution is an information model, which was presented in \autoref{sec:info:model}, provides an analysis of information exchanged between entities and its interoperability based on requirements of GDPR.
It provides a categorisation of the information requirements as provenance, agreements, consent, certification, and compliance, and the exploration of existing standards in representation these in an interoperable form for GDPR compliance.
The information model advances the state of the art by being the first systemic analysis of information flows and interoperability associated with the entities and stakeholders within the context of GDPR compliance.
The model serves to identify and evaluate the potential applications of technology in addressing requirements, and provides motivation to the argument for using semantic web as a suitable representation based on the notion of semantic interoperability.
The second minor contribution, presented in \autoref{chapter:testing}, is the utilisation of semantic web technologies to query information for GDPR compliance using SPARQL and the developed ontologies - GDPRtEXT, GDPRov, and GConsent - to represent the compliance questions as queries that are executed over data represented using the developed ontologies.
The approach provides assistance with the investigation of compliance by providing an automated way to query required information. This was demonstrated through the use of SPARQL to represent questions from templates provided by Ireland's Data Protection Commission for assisting organisations with their GDPR compliance.
Compared to the SotA, the approach is novel in its use of authoritative sources for compliance questions, and the linking of information with GDPR using GDPRtEXT.
The third minor contribution, as presented in \autoref{chapter:testing}, is a framework that utilises SHACL to validate information for GDPR compliance and link the results with relevant clauses of the GDPR using GDPRtEXT.
The framework enables the creation of machine-readable metadata associated with the GDPR, which in turn makes it possible to automate the generation of documentation regarding assessment of compliance.
The demonstration of the approach, conducted on a consent mechanism from a real-world website, demonstrate its use in validating both ex-ante and ex-post phases.
In addition, the demonstration also provides the advantages of combining the ex-ante and ex-post phases to create a more efficient compliance mechanism by abstracting the common validation tests to the ex-ante phase and validating only the unique constraints associated with an instance in the ex-post phase.
The framework advances the SotA through its novel use of SHACL for GDPR compliance, the combination of ex-ante and ex-post phases of validation, and the linking of validation results with the clauses of the GDPR to create machine-readable documentation for compliance.
\subsection*{Contributions to the DPVCG}
The ontologies presented in this thesis - namely GDPRtEXT, GDPRov, and GConsent - were used as an input by the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG) in its analysis of existing work towards creating a standardised common vocabulary.
In addition, the author of the thesis was an active contributing member towards the development of the Data Privacy Vocabulary (DPV), and served as the editor for its specification.
The DPV provides an ontology associated with personal data processing and legal compliance, including GDPR, and represents a community consensus regarding its definitions, usage, and representation.
It is available and documented at \url{http://w3.org/ns/dpv}.
Comparing the DPV with the ontologies presented in this thesis, the DPV provides a high-level abstraction whereas the ontologies in this thesis - GDPRov and GConsent - represent a more comprehensive and detailed model for representation of information, making them complimentary in usage with the DPV.
% In addition, the DPV does not overlap with the GDPRtEXT with regards to linking of information with the concepts and clauses of the GDPR, thereby making GDPRtEXT a novel contribution within the SotA.
\section{Opportunities for Further Work}\label{sec:conclusion-future-work}
Due to the novelty of GDPR and increased interest in its compliance, there are several opportunities where the work presented in this thesis can be further developed and applied, as categorised in the following three areas -
\subsection*{Align approaches for Regulatory Compliance}
% \subsubsection{Semantic analysis of ontologies targeting GDPR}
% - Consolidate all vocabularies to create a cohesive domain ontology aligned to GDPR
Differences in domain ontologies offer varying perspectives on the modelling of relationships and concepts within the same domain. In the case of GDPR, these ontologies can be compared using the commonality of concepts and aims. For example, `consent' is represented in the ontologies GDPRtEXT, GDPRov, GConsent, SPECIAL \cite{kirrane_scalable_2018}, and PrOnto \cite{palmirani_pronto_2018} - where each representation is based on the same concept of consent, and yet differs in its modelling of the relationships associated with consent. A comparison of ontologies based on semantics of concepts is useful to establish compatibility in their usage and approaches, and to evaluate their usefulness for a given use-case.
The state of the art, presented in \autoref{chapter:sota}, describes existing work outlining such an analysis \cite{leone_taking_2019} and its use in a tool \cite{leone_legal_2018} to compare approaches in the legal domain. It also presents approaches involving application of deontic logic to address regulatory compliance for GDPR, where the text is interpreted using ODRL \cite{agarwal_legislative_2018}, and PrOnto \cite{palmirani_pronto_2018} which models deontic operations and uses LKIF \cite{hoekstra_lkif_2007} to model actions and roles.
This can be further expanded to align existing approaches and ontologies through semantics of concepts provided by GDPRtEXT.
% , and to establish a library of design patterns for use of concepts for GDPR compliance.
\subsection*{Expand Scope of Ontologies}
\subsubsection*{Incorporate future updates to ELI into GDPRtEXT}
GDPRtEXT addresses the aim of linking to specific parts of the GDPR by extending the ELI ontology. The EU Publications Office, as the official developers and maintainers of ELI, are currently working on updating the ELI ontology to enable such linking for all published documents. Their work will provide authoritative URIs for all aspects of a legal document, and will also enable identification of definitions. Once published, the updated ELI ontology will make the GDPRtEXT extension redundant. However, GDPRtEXT will still have uses as a SKOS vocabulary of concepts that is used by ontologies such as GDPRov and GConsent to define the source of their concepts and relationships. By updating GDPRtEXT to use the updated ELI ontology, the interpretation of GDPR as a linked data resource can be provided using the authoritative URIs for use with the provided SKOS vocabulary.
\subsubsection*{Create vocabulary for expressing GDPR Compliance}
The vocabularies associated with GDPR, including those presented in the state of the art in \autoref{chapter:sota} and as contributions of the thesis in \autoref{chapter:vocabularies}, address compliance by associating information with its requirements. This establishes the opportunity to create a vocabulary that represents compliance itself by describing the state of information in fulfilling requirements. Such a vocabulary would be of use to supervisory authorities as well as controllers and processors in generating documentation demonstrating the compliance of information as well as the degree to which it was fulfilled or achieved.
% \subsubsection{Enable use of mappings to align existing information systems}
% - Use the ontologies and queries to create an abstraction over existing information system so that existing systems can use the linked queries using SPARQL and mappings
\subsubsection*{Expand GConsent to capture real-world interactions on the web}
The aim of GConsent, as presented in this thesis, is to represent information about consent. While it is a comprehensive and detailed ontology compared to the state of the art, it currently is not sufficient to express the nuances and complexities of real-world interactions - such as those found in the consent mechanisms on websites. More specifically, it lacks a way to describe the intricate relationships of different organisations, including third parties, and the combined collection and dissemination of consent which happens via real-time bidding online. This can be remedied by incorporating legal opinions on online consent as they appear in the coming periods of time.
Currently, GConsent is also being used in conjunction with the DPV to create an updated Consent Receipt \cite{lizar_consent_2017} based on requirements of consent under GDPR.
\subsection*{Generate Assistive Systems for Compliance}
%% what steps would an organisation need to take to use this research
The research presented in this thesis provides a technological base for modeling, querying, and validating information associated with GDPR compliance. In order for organisations to make use of this research - they need to express their use-case and internal activities using the developed ontologies. This presents an opportunity for tools and assistive technologies to be developed for helping organisations with the task of information gathering and documentation associated with GDPR compliance. These can be commercial products - similar to existing ones - or a collaborative community effort that takes advantage of the interoperability provided by semantic web. With this background, following are three opportunities where this research can be applied.
\subsubsection*{Incorporate GDPRov and GConsent in the SPECIAL compliance checker}
The compliance checker developed by the SPECIAL project \cite{kirrane_scalable_2018} uses a semantic reasoner \cite{bonatti_fast_2018} with a controlled vocabulary consisting of personal data, processing, purpose, storage, and recipients expressed in OWL2. It is potentially possible to use the SPECIAL compliance checker to check the compliance of information defined using GDPRov and GConsent by modifying the checker to target these vocabularies or by alignment of SPECIAL vocabularies with GDPRov and GConsent. This would enable the work presented in this thesis to take advantage of large scale analysis and transparent log mechanisms provided by the SPECIAL architecture. Evaluation of the approach would be based on analysis of scalability and performance to ascertain extent of its benefit.
\subsubsection*{Privacy Policy annotation and automatic generation}
A privacy policy fulfils the legal requirement for dissemination of information concerning the processing of personal data. Existing approaches for annotating privacy policies \cite{harkous_polisis_2018} do not take into account the semantics of associated information, nor effects of GDPR on privacy policy as a document. The argument for a privacy policy dataset specifically annotated for GDPR \cite{galle_case_2019} consists of using concepts relevant to the legislation in the annotation process. This can be achieved through use of GDPRtEXT as a vocabulary of GDPR concepts. In addition, workflows represented using GDPRov provide the necessary information in order to generate a partial privacy policy, and can be used to automate generation of text by converting the processing workflows into natural language text. An early exploration of this work regarding annotation of privacy policy and personalising was presented in \cite{pandit_personalised_2018}.
Providing privacy policies with machine-readable metadata would assist in the automated information extraction regarding processing activities and provide assistance to supervisory authorities and data subjects in evaluating an organisation's practices.
\subsubsection*{Design patterns for GDPR compliance}
While there are verbose ontologies to represent information associated with compliance, their specific usage is dependant on the applied use-case. To facilitate adoption and usage, a library of design patterns can be created where each pattern is concerned with representing information associated with compliance for a specific concept or clause of the GDPR. For example: a design pattern representing periodic collection of GPS data from smartphone devices, which is linked with applicable clauses of GDPR as well as requirements or constraints it must fulfil in order to be compliant. Such design patterns can be used as the basis for assistive tools that generate and assess information for compliance.
% \subsubsection{Tool to validate and assess compliance documentation}
% Create a set of constraints that validate model of the system and assist organisations in ensuring their processes are documented. e.g. process to handle data breach exists
\section{Final Remarks}\label{sec:conclusion-final-remarks}
GDPR is the subject of scrutiny due to its impending interpretation by supervisory authorities and courts of law and the possibility of incurring large amount of fines. Consequently, there is significant interest in approaches associated with its compliance, particularly those that involve technological means as they promise algorithmic solutions that can be automated.
Technological solutions towards addressing compliance are dependant on the underlying information model, and have a range of approaches to choose from - as is evident in the state of the art regarding regulatory compliance. However, it can be argued that the law ultimately only deals with legal documentation where information is invariably linked with specific clauses of the law.
% The work presented in this thesis is a step towards enabling technological solutions that assist in the linking of information for compliance in an interoperable and machine-readable form through semantic web.
Within this context, the work presented in this thesis is useful for all involved stakeholders - controllers, processors, supervisory authorities, and data subjects - by enabling creation of tools and services to assist in the representation, querying, and validation of information. In particular, the thesis establishes advantages of using semantic web technologies and provides an argument towards their adoption in the regulatory compliance domain. Where use-cases and context differs, stakeholders now have the technological means towards establishing common patterns and tools beneficial to the larger community.
With an increased need and focus on the intersection of technology and privacy, approaches based on semantic web can foster transparency and accountability by enabling an open medium for knowledge interaction for all stakeholders. It is therefore the author's hope that this thesis and the work presented therein is of benefit to society for meeting the expectations demanded by privacy laws such as GDPR as well those arising from social obligations.