-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path06_data-management.tex
230 lines (198 loc) · 18.1 KB
/
06_data-management.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
\chapter{Data Management}
\label{cha:dms}
\Authors{Carolin Helbig, Uwe-Jens G\"orke, Mathias Nest, Daniel P\"otschke, Amir S. Sattari, Patrick Schmidt, Bernhard Vowinckel, Keita Yoshioka, Olaf Kolditz}
Data management includes the development and use of architectures, guidelines, practices and procedures for accurate managing of data during the entire data lifecycle of an institutional unit or a research project. Data are defined as different information units such as numbers, alphabetic characters, and symbols that are particularly formatted and can be processed by computer. The data in the project is provided by various actors which can be GeomInt partners, their legal representatives, employees, and external partners. GeomInt Data is provided at GeomInt data management portal (DMP).
In GeomInt project the partners work in very close cooperation. Project-owned and connected infrastructures are synergetic used (as illustrated in Fig. \ref{fig:geomint-dms}). In addition to the rock mechanics laboratories of the partners CAU, IfG and TUBAF with partly unique equipment data from ongoing experiments in the underground laboratories are accessed. An essential element of the GeomInt project is the simulation and software development structures illustrated in Fig. \ref{fig:geomint-dms}. With regard to the use of the simulation platform OpenGeoSys, the development of which is coordinated by the UFZ and in whose further development the GeomInt partners BGR, CAU, IfG and TUBAF are involved, the simulation and development infrastructures located at the UFZ, including version management, is available to these partners.
%\begin{figure}{l}{\textwidth}
\begin{figure}
\includegraphics[width=\textwidth]{figures/geomint-dms_en2.png}
\caption{Network diagram to illustrate synergies and dependencies in the GeomInt network in connection with the infrastructure elements and numerical methods in the project}
\label{fig:geomint-dms}
\end{figure}
The collaborative work requires data management structures and guidelines. Therefore, the first step was to set up a document that includes a user agreement and a data management plan which is the basis for data management in the project.
% The synergetic use of project-owned and connected infrastructures is illustrated in Fig. \ref{fig:geomint-dms}. In addition to the rock mechanics laboratories of the partners CAU, IfG and TUBAF with partly unique equipment (e.g. large shear apparatus of TUBAF, crack detection by means of sound waves by CAU and TUBAF, especially triax cells for percolation experiments at high temperatures at IfG), data from ongoing experiments in the above-mentioned underground laboratories are accessed. An essential element of the GeomInt project is the simulation and software development structures illustrated in Fig. \ref{fig:geomint-dms}. With regard to the use of the simulation platform OpenGeoSys, the development of which is coordinated by the UFZ and in whose further development the GeomInt partners BGR, CAU, IfG and TUBAF are involved, the simulation and development infrastructures located at the UFZ, including version management, is available to these partners.
% The immediate project results include specific data from laboratory and in-situ experiments, software components and data sets from numerical simulations (i.e. model and result files). An estimation of the extent of the data generated in GeomInt is variable; therefore, the data management concept must be flexible. This uncertainty is mainly due to the fact that the evaluation of test and calculation results may lead to a change in test and calculation planning and may even lead to additional experiments or simulation calculations. As a measure for the expected orders of magnitude, however, it is possible, for example, to estimate the size of the data set (input and output data) for the three-dimensional simulation of a transient coupled process in relation to an in-situ experiment (small field scale) with a gigabyte amount in the low two-digit range. A total data volume in the terrabyte range therefore must be manageable.
% The availability of experimental and numerical data generated in the project, including existing metadata, is realised for the project partners in an internal area of the Geomint homepage. The Helmholtz Centre for Environmental Research - UFZ is responsible for the project data. The UFZ has many years of experience in data management regarding the cooperative development of open source software (OpenGeoSys) as well as the acquisition, storage and processing of data from experiments on different scales, exploration and monitoring campaigns, numerical simulations and scientific 3D visualisations. The UFZ has sufficient capacities and modern data management systems for data storage, which are available as a central data infrastructure for the planned research network. Specifically, data sets are managed by means of an ORACLE database. Access is via a web portal, where each data record must be provided with metadata before uploading. The metadata standard used is compatible with the INSPIRE Directive 2007/2/EC and also regulates the rights for access, use and transfer of the data. A tape system is also available for the long-term storage of very large amounts of data. For the provision of exploration and monitoring data, geo-services mentioned in the GDI-DE are used as far as possible (e.g. Sensor Observation Service or Web Map Service). Since such services for complex modelling and simulation data do not exist so far, the provision is done via a data research portal, where data can be found by means of stored metadata.
% As software components are part of non-commercial, scientific program platforms and are open source products (e.g. OpenGeoSys), they are hosted by the responsible partner via established source code hosting services (e.g. GitHub) is made publicly available. A possible public access to project data, which goes beyond the status quo as described in technical publications, as well as the handling of the data after the end of the project is regulated in the cooperation agreement or in the cooperation contract between the project partners. The handling of data obtained from the in-situ experiments in the underground laboratories through synergies with other projects is also regulated separately (access authorisation for these data, storage location, publication, handling of the data after the end of the project). Such an approach is necessary because specific parts of these data can be used for the scientific purposes of GeomInt, but they are generated in other projects with partly other partners.
\section{User agreement and data management plan}
The GeomInt project partners agreed to set up a user agreement which includes specifications for data structures including metadata, data formats, access authorization for data, the possible publication of data, as well as the handling of the data after the end of the project and outside the project. A first version of this user agreement was created six months after the start of the project.
The user agreement includes guidelines and definitions for the following aspects
\begin{list}{-}{\leftmargin=1em \itemindent=0em \itemsep=0.1em}
\item Which data will be generated in the project and has to be managed?
\item How will data be provided and exchanged?
\item What are the rights of use for the partners and for third parties?
\item How to cite data?
\item How to supervise the compliance of the user agreement?
\end{list}
As part of the user agreement, a data management plan, which is a formal document that describes how project data is managed during the research period and after completion of the project, was developed. The goal of a data management plan is to consider the aspects of data management (metadata creation, data preservation and analysis) before the start of the project. Following points are discussed in the GeomInt data management plan:
\begin{enumerate}{\leftmargin=1em \itemindent=0em \itemsep=0.1em}
\item Generation and management methods (data infrastructure, external data, data integration, data formats, quality control, user groups, data processing stages, versioning, documentation and meta data, geocoding)
\item Data Legal Management
\item Data exchange and provision, citation rules
\item Short-term storage and data management (storages, data transfer, backup, security)
\item Long-term storage (characteristics, metadata and documentation, responsibility)
\item Resources (organizational roles and responsibilities for data management)
\end{enumerate}
\section{GeomInt data}
The project results include specific data from laboratory and in-situ experiments, software components and data sets from numerical simulations (i.e. model and result files). An estimation of the extent of the data generated in GeomInt could not be made before the project. Therefore, the data management concept had to be flexible. This uncertainty was mainly due to the fact that the evaluation of test and calculation results may lead to a change in test and calculation planning and may even lead to additional experiments or simulation calculations.
The availability of experimental and numerical data generated in the project, including existing metadata, is realized on an internal area of the Geomint homepage. The UFZ is responsible for the project data and has many years of experience in data management regarding the cooperative development of open source software (OpenGeoSys) as well as the acquisition, storage and processing of data from experiments on different scales, exploration and monitoring campaigns, numerical simulations and scientific 3D visualizations.
The UFZ has sufficient capacities and modern data management systems for data storage, which are available as a central data infrastructure for the research network. Specifically, data sets are managed by means of an ORACLE database. Access is via a web portal, where each data record must be provided with metadata before uploading. The metadata standard used is compatible with the INSPIRE Directive 2007/2/EC and also regulates the rights for access, use and transfer of the data. A tape system is also available for the long-term storage of very large amounts of data. For the provision of exploration and monitoring data, geo-services mentioned in the GDI-DE are used as far as possible. Since such services for complex modelling and simulation data do not exist so far, the provision is done via a data research portal, where data can be found by means of stored metadata.
As software components are part of non-commercial, scientific program platforms and are open source products (e.g. OpenGeoSys), they are hosted by the responsible partner via established source code hosting services (e.g. GitHub) and is publicly available. A possible public access to project data, which goes beyond the status quo as described in technical publications, as well as the handling of the data after the end of the project is regulated in the cooperation agreement or in the cooperation contract between the project partners.
The handling of data obtained from the in-situ experiments in the underground laboratories through synergies with other projects is also regulated separately (access authorisation for these data, storage location, publication, handling of the data after the end of the project). Such an approach is necessary because specific parts of these data can be used for the scientific purposes of GeomInt, but they are generated in other projects with partly other partners.
\section{GeomInt DMP}
In this section, exemplary data sets of every project partner are described. A table of these data sets including description and link are available only for project partners at the website (Fig. \ref{fig:geomint-dms-web}). Some data sets can be found on the UFZ data investigation portal \url{https://www.ufz.de/drp/}. These data sets are uploaded to the data management portal at UFZ (DMP@UFZ).
\begin{figure}[!ht]
\includegraphics[width=\textwidth]{figures/geomint-web-01.png}
\includegraphics[width=\textwidth]{figures/geomint-dms-01.png}
\caption{GeomInt DMS Portal \url{https://www.ufz.de/geomint/index.php?de=46799}}
\label{fig:geomint-dms-web}https://www.overleaf.com/project/5c3c2efca0c90d4eca841d4d
\end{figure}
The GeomInt data management system (DMS) is organised in three sections (Fig. \ref{fig:geomint-dms-ove}):
\begin{list}{-}{\leftmargin=1em \itemindent=0em \itemsep=0em}
\item Experimental data (Fig. \ref{fig:geomint-dms-ove} left)
\item Simulation data (Fig. \ref{fig:geomint-dms-ove} right),
\item and data connected to URLs (Fig. \ref{fig:geomint-dms-ove} bottom right)
\end{list}
\begin{figure}[!ht]
\includegraphics[width=\textwidth]{figures/geomint-dms-ove.png}
\caption{GeomInt DMS Portal: Data areas for experimental, simulation and URL related information}
\label{fig:geomint-dms-ove}
\end{figure}
Table \ref{tab:dms-mex} summarizes the MEX related data concerning experiments and simulations.
A selection will be described in the following sections.
\begin{table}[!ht]
\footnotesize
\centering
\caption{MEX Data Management}
\label{tab:dms-mex}
\begin{tabular}{|C{0.7cm}|L{3.7cm}|C{0.7cm}|C{0.7cm}|C{0.7cm}|C{0.7cm}|C{0.7cm}|C{0.7cm}|}
\hline
\rowcolor{cyan!50}
MEX & TOP & EXP & \multicolumn{5}{c|}{MOD} \\
\hline
\rowcolor{cyan!50}
WP & & & LEM & DEM & FEM & HDF & FFS \\
\hline \hline
%-------------------
0-1a & Bending fracture test & \cellcolor{lightgray} LIT & \cellcolor{lightgray} \checkmark & \cellcolor{lightgray} \checkmark & \cellcolor{lightgray} \checkmark & & \\
\hline
0-1b & Bending fracture test (aniso) & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & & \\
\hline
0-2 & Humidity controlled bending & \multicolumn{6}{c|}{Concept} \\
\hline \hline
%-------------------
1-1a & Swelling of clay & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & & \\
\hline
1-1b & Swelling of clay & \cellcolor{lightgray} & & \cellcolor{lightgray} & & & \\
\hline
1-2 & Shrinkage of clay & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & & \\
\hline
1-3 & Desiccation of clay & \cellcolor{lightgray} & & & & & \\
\hline
1-4 & CD/LP experiment & & & & \cellcolor{lightgray} & & \\
\hline \hline
%-------------------
2-1a & Pressure driven percolation & & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & & \\
\hline
2-1b & Pressure driven percolation & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & & \\
\hline
2-2 & Healing / closure & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & & \\
\hline
2-3 & Compressible fluids & \cellcolor{lightgray} & \cellcolor{lightgray} & \cellcolor{lightgray} & & & \\
\hline
2-4 & URL Springen & \cellcolor{lightgray} & & \cellcolor{lightgray} & & & \\
\hline \hline
%-------------------
3-1 & CNL test & \cellcolor{lightgray} & & & & & \cellcolor{lightgray} \\
\hline
3-2 & CNS test & \cellcolor{lightgray} & & & & & \cellcolor{lightgray} \\
\hline
3-3 & Cyclic loading & & & & & \cellcolor{lightgray} & \\
\hline \hline
%-------------------
\end{tabular}
\end{table}
\normalsize
The following codes (and related input files) are used (see Chapter \ref{cha:codes} for detailed code introductions).
\subsection*{Software Codes}
\begin{list}{-}{\leftmargin=1em \itemindent=0em \itemsep=0em}
\item LEM: In-house developed MATLAB code available in executable P-file format (not uploaded yet)
\item DEM: Commercial code by Itasca Ltd.
\item SPH: In-house code
\item OGS (OpenGeoSys): \url{https://www.opengeosys.org/releases/}
\item HDF: In-house code
\end{list}
\subsection*{Input files}
\begin{list}{-}{\leftmargin=1em \itemindent=0em \itemsep=0em}
\item LEM: Available in .txt format (not uploaded yet)
\item DEM: Itasca Ltd. (user's manuals)
\item SPH/HDF: In-house documentation
\item OGS (Benchmarks): \url{https://www.opengeosys.org/docs/benchmarks/}
\item HDF: In-house documentation
\end{list}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex01a}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex01b}
\clearpage
%------------------------------------------------------------------------
%%\input{06-dms-mex11a}
%%\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex11b}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex12}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex14}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex21a}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex21b}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex22}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex23}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex24}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex31}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex32}
\clearpage
%------------------------------------------------------------------------
\input{06-dms-mex33}
\clearpage
%------------------------------------------------------------------------
\begin{comment}
%-------------------
\begin{table}[!ht]
\footnotesize
\centering
\caption{MEX 0-1a: Data Management}
\label{tab:dms-mex0-1}
\begin{tabular}{|L{0.5cm}|L{1cm}|L{2cm}|L{3cm}|L{1cm}|}
\hline
%..................
\rowcolor{cyan!50}
& Methods & Codes/Reference & Files & Analysis \\ \hline
& EXP & \cite{} & Files & Analysis \\ \hline
& LEM & Codes & Files & Analysis \\ \hline
& DEM & UDEC & Files & Analysis \\ \hline
& FEM & OGS-6 & Benchmark collection\footnote{\url{https://www.opengeosys.org/docs/benchmarks/phase-field/phasefield/}} & Analysis \\ \hline
%..................
\end{tabular}
\end{table}
\normalsize
%-------------------
\end{comment}