Skip to content

Commit

Permalink
adding lido-xml
Browse files Browse the repository at this point in the history
  • Loading branch information
RodionLisch authored Nov 28, 2024
1 parent ba3b9a4 commit d793777
Showing 1 changed file with 134 additions and 13 deletions.
147 changes: 134 additions & 13 deletions nfdinspector_tutorials/LIDO_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"## LIDOInspector Tutorial\n",
"\n",
"### Authors\n",
"Rodion Lischnewski, Andreas Ketelaer\n",
"Rodion Lischnewski\n",
"\n",
"### Introduction\n",
"NFDInspector is designed to facilitate the inspection of formal quality issues pertaining to research data. It is currently compatible with the LIDO and EAD metadata standards. The project has been funded by the “4Memory Incubator Funds” of the NFDI4Memory consortium and is being developed and maintained by the Montanhistorisches Dokumentationszentrum (montan.dok) of the Deutsches Bergbau-Museum Bochum.\n",
Expand All @@ -22,32 +22,62 @@
"+ __[lxml](https://lxml.de/)__\n",
"+ __[json](https://docs.python.org/3/library/json.html)__\n",
"\n",
"NFDInspector uses following libraries:\n",
"+ os\n",
"+ json\n",
"+ csv\n",
"+ re\n",
"+ datetime\n",
"+ lxml\n",
"\n",
"### Learning goals\n",
"* [Installation](#installation)\n",
"* [Read metadata records from different sources](#read-metadata-records-from-different-sources)\n",
"* Customize inspection configuration\n",
"* [Customize inspection configuration](#configuration)\n",
"* Carry out inspection\n",
"* Process or output the results\n",
"\n",
"For information on how to install the NFDInspector please refer to the installation guide.\n",
"\n",
"* Process or output the results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n"
"The NFDInspector package includes modules for the inspection of LIDO-xml and EAD-xml formats as standard. To install NFDInspector using pip on macOS or Linux, run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"python3 -m pip install nfdinspector"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To install with pip under Windows, run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
"source": [
"py -m pip install nfdinspector"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import and initialize:\n",
"You can import the ``LIDOInspector()``, a class from the ``nfdinspector.lido_inspector`` module.\n",
"while initiliazing the LIDOInspector, you can specify the language for the error messages. Though currently only ``'en'`` and ``'de'`` are available. In our case we will stick to english output.\n"
"While initiliazing the LIDOInspector, you can specify the language for the error messages. Though currently only ``'en'`` and ``'de'`` are available. In our case we will stick to english output."
]
},
{
Expand All @@ -57,6 +87,7 @@
"outputs": [],
"source": [
"from nfdinspector.lido_inspector import LIDOInspector\n",
"\n",
"lido_inspector = LIDOInspector(error_lang='en')"
]
},
Expand All @@ -66,14 +97,16 @@
"source": [
"### Read metadata records from different sources\n",
"\n",
"The easiest way to ingest metadata is from a standalone ``.xml`` file. Multiple file can be ingested and inspected in succession. Several meta data records can also come combined in one xml file."
"The easiest way to ingest metadata is from a standalone ``.xml`` file. XML files can also contain more than one LIDO-object. The NFDInspector can destinguish between LIDO-objects in one file."
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Read metadata from single source"
"lido_inspector.read_lido_file('file_path')"
]
},
{
Expand All @@ -87,7 +120,95 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configuration"
"You can read several files by specifying the folder path containing the xml files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lido_inspector.read_lido_files('files_path')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, you can parse LIDO-XML directly from a string and forward it to the Inspector by using the function ``.read_lido``.\n",
"This is useful if you are implementing the NFDIsnpector functionality into a larger workflow."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lido_inspector.read_lido(lido_xml_string)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configuration\n",
"NFDInspector offers the ability to customise the inspection to your specific needs. If no special configuration is specified, the built-in configuration will be used. \n",
"Customisation is usually done via a configuration file. Configurations can be exported and imported in JSON format.\n",
"The default configuration is variable and can be changed with package updates. To view the current default configuration, it can be retrieved from LIDOInspector by calling the ``configuration`` property. The dictionary returned lists the default configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lido_config = lido_inspector.configuration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following script will store the configuration in JSON format in the ``config_path`` location."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"config_path = 'lido_config_schema.json'\n",
"with open(config_path, 'w') as outfile:\n",
" json.dump(lido_config, outfile, indent=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``config_file()`` function can be used to import configuration files from JSON files with compatible syntax."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lido_inspector.config_file(config_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will take a closer look at the configuration options later on. For now, let's take a look at the basic functions."
]
}
],
Expand Down

0 comments on commit d793777

Please sign in to comment.