From 67c15318933c402e2794f3e29c580d808e49344d Mon Sep 17 00:00:00 2001 From: RodionLisch Date: Fri, 13 Dec 2024 17:00:51 +0100 Subject: [PATCH] renaming Tutorial and adding EAD parts --- nfdinspector_tutorials/LIDO_tutorial.ipynb | 276 ----------- nfdinspector_tutorials/NFDI_tutorial.ipynb | 547 +++++++++++++++++++++ 2 files changed, 547 insertions(+), 276 deletions(-) delete mode 100644 nfdinspector_tutorials/LIDO_tutorial.ipynb create mode 100644 nfdinspector_tutorials/NFDI_tutorial.ipynb diff --git a/nfdinspector_tutorials/LIDO_tutorial.ipynb b/nfdinspector_tutorials/LIDO_tutorial.ipynb deleted file mode 100644 index 08b5b45..0000000 --- a/nfdinspector_tutorials/LIDO_tutorial.ipynb +++ /dev/null @@ -1,276 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## LIDOInspector Tutorial\n", - "\n", - "### Authors\n", - "Rodion Lischnewski\n", - "\n", - "### Introduction\n", - "NFDInspector is designed to facilitate the inspection of formal quality issues pertaining to research data. It is currently compatible with the LIDO and EAD metadata standards. The project has been funded by the “4Memory Incubator Funds” of the NFDI4Memory consortium and is being developed and maintained by the Montanhistorisches Dokumentationszentrum (montan.dok) of the Deutsches Bergbau-Museum Bochum.\n", - "\n", - "### Target Group\n", - "This tutorial is aimed at new users of the NFDInspector. The aim is to make it easier to get started using the tool based on a use case presented here. A basic understanding of Python programming is required to use the tool independently. Due to the open source architecture, the tool can be integrated into other programs, tools and applications.\n", - "\n", - "### Requirements\n", - "It is assumed that Python version 3.10 is installed on the machine. Further the lxml library is required to use the NFDInspector package.\n", - "+ __[lxml](https://lxml.de/)__\n", - "It is \n", - "+ __[pandas](https://pandas.pydata.org/)__\n", - "\n", - "\n", - "### Learning goals\n", - "* [Installation](#installation)\n", - "* [Read metadata records from different sources](#read-metadata-records-from-different-sources)\n", - "* [Customize inspection configuration](#configuration)\n", - "* [Carry out inspection](#inspection)\n", - "* [Process or output the results](#file-output)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Installation\n", - "The __[NFDInspector](https://github.com/montan-code/nfdinspector)__ package includes modules for the inspection of LIDO-XML and EAD-XML formats as standard. To install NFDInspector using pip on macOS or Linux, run:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "python3 -m pip install nfdinspector" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To install with pip under Windows, run:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "py -m pip install nfdinspector" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Import and initialize:\n", - "You can import the __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__, a class from the __[``nfdinspector.lido_inspector``](https://montan-code.github.io/nfdinspector/nfdinspector.html#module-nfdinspector.lido_inspector)__ module.\n", - "While initiliazing the LIDOInspector, you can specify the language for the error messages. Though currently only ``'en'`` and ``'de'`` are available. In our case we will stick to english output." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "from nfdinspector.lido_inspector import LIDOInspector\n", - "\n", - "lido_inspector = LIDOInspector(error_lang='en')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Read metadata records from files or strings\n", - "\n", - "The easiest way to ingest metadata is from a standalone ``.xml`` file. XML files can also contain more than one LIDO-object. The NFDInspector can destinguish between LIDO-objects in one file and store them separately in a list, which is the property __[``lido_objects``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.lido_objects)__ of the __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__ class as." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "file_path = '../nfdinspector_tutorials/LIDO_xml/23310.xml'\n", - "lido_inspector.read_lido_file(file_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can read several files by specifying the folder path containing the ``.xml`` files." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "lido_inspector.read_lido_files('../nfdinspector_tutorials/LIDO_xml')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can parse LIDO-XML directly from a string and forward it to the NFDInspector by using the method __[``.read_lido()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido)__.\n", - "This is useful if you are implementing the NFDInspector functionality into a larger workflow." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lido_inspector.read_lido(lido_xml_string)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configuration\n", - "NFDInspector offers the ability to customise the inspection to your specific needs. If no special configuration is specified, the built-in configuration will be used. \n", - "Customisation is usually done via a configuration file. Configurations can be exported and imported in JSON format.\n", - "It is possible that the default configuration will change with upcoming updates in the future. To view the current default configuration, it can be retrieved from __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__ by calling the __[``.configuration``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.configuration)__ property." - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "lido_config = lido_inspector.configuration" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you like to look at the current configuration, the following script will store the configuration in JSON format in the ``config_path.json`` location." - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "config_path = 'lido_config.json'\n", - "with open(config_path, 'w') as outfile:\n", - " json.dump(lido_config, outfile, indent=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The __[``.config_file()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.config_file)__ method can be used to import configuration files from JSON files with compatible syntax." - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "lido_inspector.config_file(config_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will take a closer look at the configuration options later on. For now, let's take a look at the basic functions.\n", - "\n", - "### Inspection\n", - "First we read the LIDO-XML files by passing the path to the __[``.read_lido_files()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido_files)__ method. The data is then inspected based on the settings in the [configuration](#configuration). This is done by calling the __[``.inspect()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.inspect)__ method.
\n" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "lido_inspector.inspect()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The results of the inspections are stored as a list of dictionaries under __[``LIDOInspector.inspections``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.inspections)__ and can be accessed from there." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(lido_inspector.inspections)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### File output\n", - "\n", - "Using __[``.to_json()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_json)__ and __[``.to_csv()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_csv)__ it is possible to output the inspection results in the appropriate formats. The indentation level of the JSON file can be specified with the ``indent`` parameter, the file path for the output file can be specified respectively in the ``to_json(\"filepath\", indent=4)`` method. A ``delimiter`` for the csv format output can likewise be specified in the method as a string ``.to_csv(\"filepath\", delimiter=';')``" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lido_inspector.to_json('../nfdinspector_tutorials/results/results.json', indent=4)\n", - "lido_inspector.to_csv('../nfdinspector_tutorials/results/results.csv', delimiter=';')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For further informations please refer to the __[NFDInspector Documentation](https://montan-code.github.io/nfdinspector/)__" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/nfdinspector_tutorials/NFDI_tutorial.ipynb b/nfdinspector_tutorials/NFDI_tutorial.ipynb new file mode 100644 index 0000000..582082e --- /dev/null +++ b/nfdinspector_tutorials/NFDI_tutorial.ipynb @@ -0,0 +1,547 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## NFDInspector Tutorial\n", + "\n", + "### Authors\n", + "Rodion Lischnewski\n", + "\n", + "### Introduction\n", + "NFDInspector is designed to facilitate the inspection of formal quality issues pertaining to research data. It is currently compatible with the [LIDO](#lido) and EAD metadata standards. The project has been funded by the “4Memory Incubator Funds” of the NFDI4Memory consortium and is being developed and maintained by the Montanhistorisches Dokumentationszentrum (montan.dok) of the Deutsches Bergbau-Museum Bochum.\n", + "\n", + "### Target Group\n", + "This tutorial is aimed at new users of the NFDInspector. The aim is to make it easier to get started using the tool based on a use case presented here. A basic understanding of Python programming is required to use the tool independently. Due to the open source architecture, the tool can be integrated into other programs, tools and applications.\n", + "\n", + "### Requirements\n", + "It is assumed that Python version 3.10 is installed on the machine. Further the lxml library is required to use the NFDInspector package.\n", + "+ __[lxml](https://lxml.de/)__\n", + "It is \n", + "+ __[pandas](https://pandas.pydata.org/)__\n", + "\n", + "\n", + "### Learning goals\n", + "* [Installation](#installation)\n", + "* [Read metadata records from different sources](#read-lido-metadata-records-from-files-or-strings)\n", + "* [Customize inspection configuration](#configuration)\n", + "* [Carry out inspection](#inspection)\n", + "* [Process or output the results](#file-output)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Installation\n", + "The __[NFDInspector](https://github.com/montan-code/nfdinspector)__ package includes modules for the inspection of LIDO-XML and EAD-XML formats as standard. To install NFDInspector using pip on macOS or Linux, run:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "python3 -m pip install nfdinspector" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To install with pip under Windows, run:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "py -m pip install nfdinspector" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## LIDO\n", + "LIDO is an XML schema intended for delivering metadata, for use in a variety of online services, from an organization’s collections database to portals of aggregated resources, as well as exposing, sharing and connecting data on the web. Its strength lies in its ability to support the typical range of descriptive information about objects of material culture. It can be used for all kinds of object, e.g., art, cultural, technology and natural science and supports multilingual portal environments.\n", + "\n", + "The NFDinspector offers a module whose functionality is specially adapted to LIDO data records. The following is a basic introduction to the use of the module. Further possible applications are explored in greater depth in a realistic use case." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import and initialize LIDOInspector:\n", + "You can import the __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__, a class from the __[``nfdinspector.lido_inspector``](https://montan-code.github.io/nfdinspector/nfdinspector.html#module-nfdinspector.lido_inspector)__ module.\n", + "While initiliazing the LIDOInspector, you can specify the language for the error messages. Though currently only ``'en'`` and ``'de'`` are available. In our case we will stick to english output." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from nfdinspector.lido_inspector import LIDOInspector\n", + "\n", + "lido_inspector = LIDOInspector(error_lang='en')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Read LIDO metadata records from files or strings\n", + "\n", + "The easiest way to ingest metadata is from a standalone ``.xml`` file. XML files can also contain more than one LIDO-object. The NFDInspector can destinguish between LIDO-objects in one file and store them separately in a list, which is the property __[``lido_objects``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.lido_objects)__ of the __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__ class as." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "file_path = '../nfdinspector_tutorials/LIDO_xml/23310.xml'\n", + "lido_inspector.read_lido_file(file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can read several files by specifying the folder path containing the ``.xml`` files." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "lido_inspector.read_lido_files('../nfdinspector_tutorials/LIDO_xml')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can parse LIDO-XML directly from a string and forward it to the NFDInspector by using the method __[``.read_lido()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido)__.\n", + "This is useful if you are implementing the NFDInspector functionality into a larger workflow." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lido_inspector.read_lido(lido_xml_string)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configuration\n", + "NFDInspector offers the ability to customise the inspection to your specific needs. If no special configuration is specified, the built-in configuration will be used. \n", + "Customisation is usually done via a configuration file. Configurations can be exported and imported in JSON format.\n", + "It is possible that the default configuration will change with upcoming updates in the future. To view the current default configuration, it can be retrieved from __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__ by calling the __[``.configuration``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.configuration)__ property." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "lido_config = lido_inspector.configuration" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you like to look at the current configuration, the following script will store the configuration in JSON format in the ``config_path.json`` location." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "config_path = 'lido_config.json'\n", + "with open(config_path, 'w') as outfile:\n", + " json.dump(lido_config, outfile, indent=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The __[``.config_file()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.config_file)__ method can be used to import configuration files from JSON files with compatible syntax." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "lido_inspector.config_file(config_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will take a closer look at the configuration options later on. For now, let's take a look at the basic functions.\n", + "\n", + "### Inspection\n", + "First we read the LIDO-XML files by passing the path to the __[``.read_lido_files()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido_files)__ method. The data is then inspected based on the settings in the [configuration](#configuration). This is done by calling the __[``.inspect()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.inspect)__ method.
\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "lido_inspector.inspect()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The results of the inspections are stored as a list of dictionaries under __[``LIDOInspector.inspections``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.inspections)__ and can be accessed from there." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(lido_inspector.inspections)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### File output\n", + "\n", + "Using __[``.to_json()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_json)__ and __[``.to_csv()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_csv)__ it is possible to output the inspection results in the appropriate formats. The indentation level of the JSON file can be specified with the ``indent`` parameter, the file path for the output file can be specified respectively in the ``to_json(\"filepath\", indent=4)`` method. A ``delimiter`` for the csv format output can likewise be specified in the method as a string ``.to_csv(\"filepath\", delimiter=';')``" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lido_inspector.to_json('../nfdinspector_tutorials/results/results.json', indent=4)\n", + "lido_inspector.to_csv('../nfdinspector_tutorials/results/results.csv', delimiter=';')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## EAD\n", + "\n", + "__[Encoded Archival Description (EAD)](https://www.loc.gov/ead/)__ is a standardized XML format used to represent and share archival descriptions. It enables hierarchical and structured documentation of collections, series, and individual archival items, including metadata such as titles, dates, descriptions, and related materials. EAD facilitates interoperability between archives and makes finding aids machine-readable, enhancing accessibility and searchability for both researchers and archivists.\n", + "\n", + "Globally adopted by archives, libraries, museums, and historical societies, EAD provides a hierarchical framework for organizing collection metadata. This framework enables the discovery of geographically remote primary sources by standardizing the encoding of archival descriptions. A typical EAD finding aid includes metadata about the collection and its control information. It enhances the interoperability of archival data, preserves its structure, and supports long-term accessibility.\n", + "\n", + "EAD’s hierarchical and machine-readable format is essential for organizing and accessing archival materials, particularly in digital libraries like the __[Deutsche Digitale Bibliothek](https://www.deutsche-digitale-bibliothek.de/)__. By bridging geographic and institutional gaps, EAD promotes the accessibility and usability of primary sources worldwide." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import and initialize:\n", + "You can import the __[``EADInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector)__, a class from the __[``nfdinspector.ead_inspector``](https://montan-code.github.io/nfdinspector/nfdinspector.html#module-nfdinspector.ead_inspector)__ module.\n", + "While initiliazing the EADInspector, you can specify the language for the error messages. Though currently only ``'en'`` and ``'de'`` are available. In our case we will stick to english output." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "from nfdinspector.ead_inspector import EADInspector\n", + "\n", + "ead_inspector = EADInspector(error_lang='en')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Read EAD metadata records from files or strings\n", + "\n", + "Unlike the LIDO data, EAD data contains multiple entities and their hierarchical relationships. Files are therefore examined individually. It is not possible to read multiple files using the path of a folder. To read a EAD-XML file use the __[``.read_ead_file()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.read_ead_file)__ method or the __[``.read_ead()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.read_ead)__ method for reading a XML String." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "ead_inspector.read_ead_file('../nfdinspector_tutorials/EAD_xml/BBA-217.xml')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### EAD Inspector configuration\n", + "\n", + "NFDInspector offers the ability to customise the inspection to your specific needs. If no special configuration is specified, the built-in configuration will be used. \n", + "Customisation is usually done via a configuration file. Configurations can be exported and imported in JSON format.\n", + "It is possible that the default configuration will change with upcoming updates in the future. To view the current default configuration, it can be retrieved from __[``EADInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector)__ by calling the __[``.configuration``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.configuration)__ property. It will return a dictionary." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'unitid': {'pattern': ''},\n", + " 'unittitle': {'collection': {'inspect': True, 'min_word_num': 2},\n", + " 'class': {'inspect': True, 'min_word_num': 1},\n", + " 'series': {'inspect': True, 'min_word_num': 1},\n", + " 'file': {'inspect': True, 'min_word_num': 2},\n", + " 'item': {'inspect': True, 'min_word_num': 2},\n", + " '_': {'inspect': True, 'min_word_num': 2}},\n", + " 'unitdate': {'collection': {'inspect': True},\n", + " 'class': {'inspect': False},\n", + " 'series': {'inspect': False},\n", + " 'file': {'inspect': True},\n", + " 'item': {'inspect': True},\n", + " '_': {'inspect': True}},\n", + " 'abstract': {'collection': {'inspect': True, 'min_word_num': 10},\n", + " 'class': {'inspect': False, 'min_word_num': 10},\n", + " 'series': {'inspect': False, 'min_word_num': 10},\n", + " 'file': {'inspect': True, 'min_word_num': 10},\n", + " 'item': {'inspect': True, 'min_word_num': 10},\n", + " '_': {'inspect': True, 'min_word_num': 10}},\n", + " 'genreform': {'normal': ['Urkunden',\n", + " 'Siegel',\n", + " 'Amtsbücher, Register und Grundbücher',\n", + " 'Akten',\n", + " 'Karten und Pläne',\n", + " 'Plakate und Flugblätter',\n", + " 'Drucksachen',\n", + " 'Bilder',\n", + " 'Handschriften',\n", + " 'Audio-Visuelle Medien',\n", + " 'Datenbanken',\n", + " 'Sonstiges'],\n", + " 'collection': {'inspect': True},\n", + " 'class': {'inspect': False},\n", + " 'series': {'inspect': False},\n", + " 'file': {'inspect': True},\n", + " 'item': {'inspect': True},\n", + " '_': {'inspect': True}},\n", + " 'dimensions': {'collection': {'inspect': True},\n", + " 'class': {'inspect': False},\n", + " 'series': {'inspect': False},\n", + " 'file': {'inspect': True},\n", + " 'item': {'inspect': True},\n", + " '_': {'inspect': True}},\n", + " 'extent': {'collection': {'inspect': True},\n", + " 'class': {'inspect': False},\n", + " 'series': {'inspect': False},\n", + " 'file': {'inspect': True},\n", + " 'item': {'inspect': True},\n", + " '_': {'inspect': True}},\n", + " 'scopecontent': {'collection': {'inspect': True, 'min_word_num': 100},\n", + " 'class': {'inspect': False, 'min_word_num': 10},\n", + " 'series': {'inspect': False, 'min_word_num': 10},\n", + " 'file': {'inspect': True, 'min_word_num': 10},\n", + " 'item': {'inspect': False, 'min_word_num': 10},\n", + " '_': {'inspect': True, 'min_word_num': 10}},\n", + " 'origination': {'collection': {'inspect': True, 'ref': True},\n", + " 'class': {'inspect': False, 'ref': True},\n", + " 'series': {'inspect': False, 'ref': True},\n", + " 'file': {'inspect': True, 'ref': True},\n", + " 'item': {'inspect': True, 'ref': True},\n", + " '_': {'inspect': True, 'ref': True},\n", + " 'patterns': {'label': '', 'ref': ''}},\n", + " 'materialspec': {'collection': {'inspect': True},\n", + " 'class': {'inspect': False},\n", + " 'series': {'inspect': False},\n", + " 'file': {'inspect': True},\n", + " 'item': {'inspect': True},\n", + " '_': {'inspect': True}},\n", + " 'language': {'collection': {'inspect': True},\n", + " 'class': {'inspect': False},\n", + " 'series': {'inspect': False},\n", + " 'file': {'inspect': True},\n", + " 'item': {'inspect': True},\n", + " '_': {'inspect': True}},\n", + " 'digital_archival_object': {'collection': {'inspect': False},\n", + " 'class': {'inspect': False},\n", + " 'series': {'inspect': False},\n", + " 'file': {'inspect': False},\n", + " 'item': {'inspect': True},\n", + " '_': {'inspect': True}},\n", + " 'index': {'collection': {'inspect': True, 'ref': True, 'min_num': 5},\n", + " 'class': {'inspect': False, 'ref': True, 'min_num': 3},\n", + " 'series': {'inspect': False, 'ref': True, 'min_num': 3},\n", + " 'file': {'inspect': True, 'ref': True, 'min_num': 3},\n", + " 'item': {'inspect': True, 'ref': True, 'min_num': 3},\n", + " '_': {'inspect': True, 'ref': True, 'min_num': 3},\n", + " 'patterns': {'label': '', 'ref': ''}},\n", + " 'userestrict': {'collection': {'inspect': True, 'ref': True},\n", + " 'class': {'inspect': False, 'ref': True},\n", + " 'series': {'inspect': False, 'ref': True},\n", + " 'file': {'inspect': True, 'ref': True},\n", + " 'item': {'inspect': True, 'ref': True},\n", + " '_': {'inspect': True, 'ref': True}}}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ead_inspector.configuration" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the settings for the EADInspector are specialized on different levels (collection, class, series, file, item). Note that depending on whether it is an archive tectonics or a finding aid, these levels have different meanings. Independent settings for the test parameters can be made for each level. As with the LIDOInspector, configuration files can be saved in JSON format and read in again." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# to export the configuration, you can use the following script\n", + "import json\n", + "with open('../nfdinspector_tutorials/ead_config.json', 'w') as outfile:\n", + " json.dump(ead_inspector.configuration, outfile, indent=4)\n", + "\n", + "# to import a configuration from a JSON file use the following method\n", + "ead_inspector.config_file('../nfdinspector_tutorials/ead_config.json')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can either change the settings in the exported configuration file and then import them again, or change the settings directly in the EADInspector. To do this, you can alter the default configuration using the __[``configure()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.configure)__ method or __[``configure_level()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.configure_level)__ to change a specific level in the configurations or __[``configure_setting()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.configure_setting)__ to change a specific setting in the configurations." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "ead_inspector.configure({\n", + " \"unittitle\": {\n", + " \"collection\": {\"inspect\": True, \"min_word_num\": 2},\n", + " \"class\": {\"inspect\": True, \"min_word_num\": 1},\n", + " \"series\": {\"inspect\": True, \"min_word_num\": 1},\n", + " \"file\": {\"inspect\": True, \"min_word_num\": 2},\n", + " \"item\": {\"inspect\": True, \"min_word_num\": 2},\n", + " \"_\": {\"inspect\": True, \"min_word_num\": 2},\n", + " }\n", + "})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As with the LIDO Inspector, it is possible to define patterns for certain fields. If they do not match, an error message is returned. For example, you can specify a pattern for the ``unitid`` field on the file level. In this case, the pattern must be a sequence of only digits:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "ead_inspector.configure({\n", + " \"unitid\": {\n", + " \"file\": {\n", + " \"pattern\": \"^\\\\d$\",\n", + " }\n", + " }\n", + "})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Inspection\n", + "\n", + "Once all the necessary settings have been made and the file to be inspected has been read in, the inspection can be called up using the __[``inspect()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.inspect)__ method. The data to be inspected is stored as a list in __[``cs``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.ead_inspector.EADInspector.cs)__." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "ead_inspector.inspect()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For further informations please refer to the __[NFDInspector Documentation](https://montan-code.github.io/nfdinspector/)__" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}