-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathsec-applications.tex
101 lines (91 loc) · 6.2 KB
/
sec-applications.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
\section{Applications}
\label{sec:applications}
The results of this thesis can be applied virtually everywhere data is
intellectually used and created, including the design of automatic methods of
data processing. In particular the identified categories, paradigms, and
patterns can help to better understand existing data and to improve (the
creation of) data models. Ideally, the results will foster a general
understanding of methods to describe and structure data, independent from
specific technologies and trends, such as programming languages, software
architectures, and storage systems. Two specific emerging domains of
application will be described below with data archaeology
(section~\ref{sec:dataarchaeology}) and data literacy
(section~\ref{sec:datalit}).
\subsection{Data archaeology}
\label{sec:dataarchaeology}
The domain of \Term{data archaeology} is recovery of digital data in unknown or
obsolete formats. This activity is closely related to \term{data recovery},
which focuses on recovery of data from damaged media and \term{file system}s.
Data archaeology includes all methods of interpretation that follow after data
recovery. Just like archaeology exposes layers and artifacts by excavation and
remote sensing, data archaeology can use many methods to uncover structures in
data. The most related existing discipline to data archaeology is \term{digital
forensics}. Digital forensics has a more specific scope and its application to
more complex and heterogeneous methods of data structuring, e.g. databases, is
in an early stage of development \cite{Olivier2009}.
The term data archaeology first appeared in 1992 in the \term{Global
Oceanographic Data Archaeology and Rescue Project}. The goal of this project
was to collect, digitize, and consolidate historical data on temperature,
chlorophyll, and plankton of the oceans \cite{GODAR2007}. To prevent the need
of data archaeology, \Term{digital preservation} or \Term{long-term
preservation} has been established as important field in library and
information science and archival science. Digital preservation is a set of
activities aimed towards ensuring access to digital materials over time
\cite{Caplan2008}. This includes creation of descriptive metadata, protection
from change, and ensuring that a given digital publication can be read in its
original form. Two strategies are followed to manage the variety and change of
digital formats: emulation of obsolete software needed to read the data, and
conversion of data to newer formats and systems. Both ways are complex and
require constant attention. Moreover you can only describe, emulate, and
migrate what you currently know --- but from a historical view, relevant
aspects may emerge only after years and decades.
That said, data archaeology as retrospective analysis of incompletely defined
data will gain importance. The paradigms and patterns found in this thesis will
help \term{intellectual data analysis}, which is needed to underpin and
interpret algorithmic data analysis. Algorithmic data analysis with
\term{data mining}, \term{knowledge discovery}, and related applied sciences
provides useful tools to discover detailed views on data, but they cannot
reveal its meaning as part of social practice. For this reason it is important
to locate data archaeology in the (digital) humanities\footnote{See
\textcite{Svensson2010} for a discussion of the scope and definition of
\term{digital humanities}.} as meaningful data is always a product of human
action. It can therefore only be studied involving the cultural context of its
creation and usage.
As \Person[Steve]{Hoberman} points out in the third edition of \textcite[p.
63]{Kent2012}, data archaeology is also an act of \term{reverse-engineering}:
``Just as an archaeologist must try to find out what this piece of clay that was
buried under the sand for thousand of years was used for, so must we try to
figure out what these [data] fields were used for when no or little
documentation or knowledgeable people resources exist.'' The data categories,
paradigms, and patterns identified in this thesis can help to detect intended
shape and purpose of such buried data elements.
\subsection{Data literacy}
\label{sec:datalit}
The term \Term{data literacy} has gained popularity in recent years to describe
the increasing need for reading and writing data, especially among researchers.
The focus of data literacy is similar to the needs of ``data science'' and
``data journalism'' \cite{Bradshaw2011} which mainly include capabilities to
aggregate, filter and visualize large sets of data with statistical methods of
data analysis. Definitions of data literacy refer to the knowledge ``how to
obtain and manipulate data'' \cite{Schield2004} and how to ``understand, use,
and manage science data'' \cite{Qin2010}.\footnote{\textcite{Qin2010} refer to
\emph{scientific} or \emph{science} data literacy with the ability of
``collecting, processing, managing, evaluating, and using data for scientific
inquiry'' but they neither provide a separation to general data literacy nor a
definition of data.} \textcite{Carlson2011} refer to data literacy as the
capability of ``understanding what data mean, including how to read graphs and
charts appropriately, draw correct conclusions from data, and recognize when
data are being used in misleading or inappropriate ways.'' These definitions
and the majority of data literacy literature and curricula focus on numerical
data, management of scientific data sets \cite{Haendel2012}, common data
processing software, file formats, and preservation. Despite the importance of
these aspects of data, there is a lack of theory in current data literacy. In
particular, current data literacy mostly ignores the semiotic nature of data
and the conception of data as communications which are not measured or observed
but created \cite{BallsunStanton2012}. Instead the domain is committed to the
notions of data as hard numbers or data as observations and emphasises
\term{statistical literacy} to aggregate and filter large sets of data. This
thesis with its focus on data as communications can provide both, a theoretical
foundation of data literacy, and guidelines to better appraise practical method
of data structuring and description, which are already subject of current data
literacy.