diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-csw.md b/docs/manual/docs/user-guide/harvesting/harvesting-csw.md index 614687eb4716..dc94a777d4a0 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-csw.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-csw.md @@ -4,16 +4,38 @@ This harvester will connect to a remote CSW server and retrieve metadata records ## Adding a CSW harvester -The figure above shows the options available: - -- **Site** - Options about the remote site. - - *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the CSW harvester. - - *Service URL* - The URL of the capabilities document of the CSW server to be harvested. eg. . This document is used to discover the location of the services to call to query and retrieve metadata. - - *Icon* - An icon to assign to harvested metadata. The icon will be used when showing harvested metadata records in the search results. - - *Use account* - Account credentials for basic HTTP authentication on the CSW server. -- **Search criteria** - Using the Add button, you can add several search criteria. You can query only the fields recognised by the CSW protocol. -- **Options** - Scheduling options. -- **Options** - Specific harvesting options for this harvester. - - *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped. +To create a CSW harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `CSW`: + +![](img/add-csw-harvester.png) + +Providing the following information: + +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. + +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to OGC CSW 2.0.2** + - *Service URL*: The URL of the capabilities document of the CSW server to be harvested. eg. . This document is used to discover the location of the services to call to query and retrieve metadata. + - *Remote authentication*: If checked, should be provided the credentials for basic HTTP authentication on the CSW server. + - *Search filter*: (Optional) Define the search criteria below to restrict the records to harvest. + - *Search options*: + - *Sort by*: Define sort option to retrieve the results. Sorting by 'identifier:A' means by UUID with alphabetical order. Any CSW queryables can be used in combination with A or D for setting the ordering. + - *Output Schema*: The metadata standard to request the metadata records from the CSW server. + - *Distributed search*: Enables the distributed search in remote server (if the remote server supports it). When this option is enabled, the remote catalog cascades the search to the Federated CSW servers that has configured. + +- **Configure response processing for CSW** + - *Action on UUID collision*: When a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...), should this record be skipped (default), overriden or generate a new UUID? + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. + - *Check for duplicate resources based on the resource identifier*: If checked, ignores metadata with a resource identifier (`gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:identifier/*/gmd:code/gco:CharacterString`) that is assigned to other metadata record in the catalog. It only applies to records in ISO19139 or ISO profiles. + - *XPath filter*: (Optional) When record is retrived from remote server, check an XPath expression to accept or discard the record. + - *XSL transformation to apply*: (Optional) The referenced XSL transform will be applied to each metadata record before it is added to GeoNetwork. + - *Batch edits*: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element. + - *Category*: (Optional) A GeoNetwork category to assign to each metadata record. + - **Privileges** - Assign privileges to harvested metadata. -- **Categories** diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-filesystem.md b/docs/manual/docs/user-guide/harvesting/harvesting-filesystem.md index 5e0b6b3ab54a..900deeafc4cd 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-filesystem.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-filesystem.md @@ -4,21 +4,35 @@ This harvester will harvest metadata as XML files from a filesystem available on ## Adding a Local File System harvester -The figure above shows the options available: - -- **Site** - Options about the remote site. - - *Name* - This is a short description of the filesystem harvester. It will be shown in the harvesting main page as the name for this instance of the Local Filesystem harvester. - - *Directory* - The path name of the directory containing the metadata (as XML files) to be harvested. - - *Recurse* - If checked and the *Directory* path contains other directories, then the harvester will traverse the entire file system tree in that directory and add all metadata files found. - - *Keep local if deleted at source* - If checked then metadata records that have already been harvested will be kept even if they have been deleted from the *Directory* specified. - - *Icon* - An icon to assign to harvested metadata. The icon will be used when showing harvested metadata records in the search results. -- **Options** - Scheduling options. -- **Harvested Content** - Options that are applied to harvested content. - - *Apply this XSLT to harvested records* - Choose an XSLT here that will convert harvested records to a different format. - - *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped. -- **Privileges** - Assign privileges to harvested metadata. -- **Categories** +To create a Local File System harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `Directory`: + +![](img/add-filesystem-harvester.png) + +Providing the following information: -!!! Notes +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. - - in order to be successfully harvested, metadata records retrieved from the file system must match a metadata schema in the local GeoNetwork instance +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to Directory** + - *Directory*: The path name of the directory containing the metadata (as XML files) to be harvested. The directory must be accessible by GeoNetwork. + - *Also search in subfolders*: If checked and the *Directory* path contains other directories, then the harvester will traverse the entire file system tree in that directory and add all metadata files found. + - *Script to run before harvesting* + - *Type of record* + +- **Configure response processing for filesystem** + - *Action on UUID collision*: When a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...), should this record be skipped (default), overriden or generate a new UUID? + - *Update catalog record only if file was updated* + - *Keep local even if deleted at source*: If checked then metadata records that have already been harvested will be kept even if they have been deleted from the *Directory* specified. + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. + - *XSL transformation to apply*: (Optional) The referenced XSL transform will be applied to each metadata record before it is added to GeoNetwork. + - *Batch edits*: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element. + - *Category*: (Optional) A GeoNetwork category to assign to each metadata record. + +- **Privileges** - Assign privileges to harvested metadata. diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-geonetwork.md b/docs/manual/docs/user-guide/harvesting/harvesting-geonetwork.md index b8e463e4533e..14edb6ccd03c 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-geonetwork.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-geonetwork.md @@ -11,11 +11,11 @@ To create a GeoNetwork 2.1-3.X harvester go to `Admin console` > `Harvesting` an Providing the following information: - **Identification** - - *Node name and logo*: A unique name for the harvester and optionally a logo to assign to the harvester. + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. - *User*: User who owns the harvested records. -- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester should be executed manually from the harvesters page. If enabled a schedule expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). - **Configure connection to GeoNetwork (from 2.1 to 3.x)** - *Catalog URL*: @@ -35,6 +35,9 @@ Providing the following information: It could be composed of parameter which will be sent to XSL transformation using the following syntax: `anonymizer?protocol=MYLOCALNETWORK:FILEPATH&email=gis@organisation.org&thesaurus=MYORGONLYTHEASURUS` - - *Validate records before import*: If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped. + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. - **Privileges** - Assign privileges to harvested metadata. diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-geoportal.md b/docs/manual/docs/user-guide/harvesting/harvesting-geoportal.md index e8887286ea32..ec16a07b9aef 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-geoportal.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-geoportal.md @@ -4,24 +4,38 @@ This harvester will connect to a remote GeoPortal version 9.3.x or 10.x server a ## Adding a GeoPortal REST harvester -The figure above shows the options available: - -- **Site** - Options about the remote site. - - *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the GeoPortal REST harvester. - - *Base URL* - The base URL of the GeoPortal server to be harvested. eg. . The harvester will add the additional path required to access the REST services on the GeoPortal server. - - *Icon* - An icon to assign to harvested metadata. The icon will be used when showing harvested metadata records in the search results. -- **Search criteria** - Using the Add button, you can add several search criteria. You can query any field on the GeoPortal server using the Lucene query syntax described at . -- **Options** - Scheduling options. -- **Harvested Content** - Options that are applied to harvested content. - - *Apply this XSLT to harvested records* - Choose an XSLT here that will convert harvested records to a different format. See notes section below for typical usage. - - *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped. +To create a GeoPortal REST harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `GeoPortal REST`: + +![](img/add-geoportalrest-harvester.png) + +Providing the following information: + +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. + +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to GeoPortal REST** + - *URL*: The base URL of the GeoPortal server to be harvested. eg. . The harvester will add the additional path required to access the REST services on the GeoPortal server. + - *Remote authentication*: If checked, should be provided the credentials for basic HTTP authentication on the server. + - *Search filter*: (Optional) You can query any field on the GeoPortal server using the Lucene query syntax described at . + +- **Configure response processing for geoPREST** + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. + - *XSL transformation to apply*: (Optional) The referenced XSL transform will be applied to each metadata record before it is added to GeoNetwork. + - **Privileges** - Assign privileges to harvested metadata. -- **Categories** + !!! Notes - - this harvester uses two REST services from the GeoPortal API: + - This harvester uses two REST services from the GeoPortal API: - `rest/find/document` with searchText parameter to return an RSS listing of metadata records that meet the search criteria (maximum 100000) - `rest/document` with id parameter from each result returned in the RSS listing - - this harvester has been tested with GeoPortal 9.3.x and 10.x. It can be used in preference to the CSW harvester if there are issues with the handling of the OGC standards etc. - - typically ISO19115 metadata produced by the Geoportal software will not have a 'gmd' prefix for the namespace `http://www.isotc211.org/2005/gmd`. GeoNetwork XSLTs will not have any trouble understanding this metadata but will not be able to map titles and codelists in the viewer/editor. To fix this problem, please select the ``Add-gmd-prefix`` XSLT for the *Apply this XSLT to harvested records* in the **Harvested Content** set of options described earlier + - This harvester has been tested with GeoPortal 9.3.x and 10.x. It can be used in preference to the CSW harvester if there are issues with the handling of the OGC standards etc. + - Typically ISO19115 metadata produced by the Geoportal software will not have a 'gmd' prefix for the namespace `http://www.isotc211.org/2005/gmd`. GeoNetwork XSLTs will not have any trouble understanding this metadata but will not be able to map titles and codelists in the viewer/editor. To fix this problem, please select the ``Add-gmd-prefix`` XSLT for the *Apply this XSLT to harvested records* in the **Harvested Content** set of options described earlier diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-oaipmh.md b/docs/manual/docs/user-guide/harvesting/harvesting-oaipmh.md index cf0463636343..a85e16e24672 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-oaipmh.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-oaipmh.md @@ -1,36 +1,54 @@ # OAIPMH Harvesting {#oaipmh_harvester} -This is a harvesting protocol that is widely used among libraries. GeoNetwork implements version 2.0 of the protocol. +This is a harvesting protocol that is widely used among libraries. GeoNetwork implements version 2.0 of the protocol. An OAI-PMH server implements a harvesting protocol that GeoNetwork, acting as a client, can use to harvest metadata. ## Adding an OAI-PMH harvester -An OAI-PMH server implements a harvesting protocol that GeoNetwork, acting as a client, can use to harvest metadata. +To create a OAI-PMH harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `OAI/PMH`: -Configuration options: +![](img/add-oaipmh-harvester.png) -- **Site** - Options describing the remote site. - - *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the OAIPMH harvester. - - *URL* - The URL of the OAI-PMH server from which metadata will be harvested. - - *Icon* - An icon to assign to harvested metadata. The icon will be used when showing search results. - - *Use account* - Account credentials for basic HTTP authentication on the OAIPMH server. -- **Search criteria** - This allows you to select metadata records for harvest based on certain criteria: - - *From* - You can provide a start date here. Any metadata whose last change date is equal to or greater than this date will be harvested. To add or edit a value for this field you need to use the icon alongside the text box. This field is optional so if you don't provide a start date the constraint is dropped. Use the icon to clear the field. - - *Until* - Functions in the same way as the *From* parameter but adds an end constraint to the last change date search. Any metadata whose last change data is less than or equal to this data will be harvested. - - *Set* - An OAI-PMH server classifies metadata into sets (like categories in GeoNetwork). You can request all metadata records that belong to a set (and any of its subsets) by specifying the name of that set here. - - *Prefix* - 'Prefix' means metadata format. The oai_dc prefix must be supported by all OAI-PMH compliant servers. - - You can use the Add button to add more than one Search Criteria set. Search Criteria sets can be removed by clicking on the small cross at the top left of the set. +Providing the following information: -!!! note +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. - the 'OAI provider sets' drop down next to the *Set* text box and the 'OAI provider prefixes' drop down next to the *Prefix* textbox are initially blank. After specifying the connection URL, you can press the **Retrieve Info** button, which will connect to the remote OAI-PMH server, retrieve all supported sets and prefixes and fill the drop downs with these values. Selecting a value from either of these drop downs will fill the appropriate text box with the selected value. +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). +- **Configure connection to OGC CSW 2.0.2** + - *URL*: The URL of the OAI-PMH server from which metadata will be harvested. + - *Remote authentication*: If checked, should be provided the credentials for basic HTTP authentication on the OAIPMH server. + - *Search filter*: (Optional) Define the search criteria below to restrict the records to harvest. + - *From*: You can provide a start date here. Any metadata whose last change date is equal to or greater than this date will be harvested. To add or edit a value for this field you need to use the icon alongside the text box. This field is optional so if you don't provide a start date the constraint is dropped. Use the icon to clear the field. + - *Until*: Functions in the same way as the *From* parameter but adds an end constraint to the last change date search. Any metadata whose last change data is less than or equal to this data will be harvested. + - *Set*: An OAI-PMH server classifies metadata into sets (like categories in GeoNetwork). You can request all metadata records that belong to a set (and any of its subsets) by specifying the name of that set here. + - *Prefix*: 'Prefix' means metadata format. The oai_dc prefix must be supported by all OAI-PMH compliant servers. + + !!! note + + The 'OAI provider sets' drop down next to the *Set* text box and the 'OAI provider prefixes' drop down next to the *Prefix* textbox are initially blank. After specifying the connection URL, you can press the **Retrieve Info** button, which will connect to the remote OAI-PMH server, retrieve all supported sets and prefixes and fill the drop downs with these values. Selecting a value from either of these drop downs will fill the appropriate text box with the selected value. + + - *Search options*: + - *Sort by*: Define sort option to retrieve the results. Sorting by 'identifier:A' means by UUID with alphabetical order. Any CSW queryables can be used in combination with A or D for setting the ordering. + - *Output Schema*: The metadata standard to request the metadata records from the CSW server. + - *Distributed search*: Enables the distributed search in remote server (if the remote server supports it). When this option is enabled, the remote catalog cascades the search to the Federated CSW servers that has configured. + +- **Configure response processing for oaipmh** + - *Action on UUID collision*: When a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...), should this record be skipped (default), overriden or generate a new UUID? + - *XSL transformation to apply*: (Optional) The referenced XSL transform will be applied to each metadata record before it is added to GeoNetwork. + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. + - *Category*: (Optional) A GeoNetwork category to assign to each metadata record. + +- **Privileges** - Assign privileges to harvested metadata. -- **Options** - Scheduling Options. -- **Privileges** -- **Categories** !!! Notes - - if you request the oai_dc output format, GeoNetwork will convert it to Dublin Core format. - - when you edit a previously created OAIPMH harvester instance, both the *set* and *prefix* drop down lists will be empty. You have to press the retrieve info button again to connect to the remote server and retrieve set and prefix information. - - the id of the remote server must be a UUID. If not, metadata can be harvested but during hierarchical propagation id clashes could corrupt harvested metadata. + - If you request the oai_dc output format, GeoNetwork will convert it to Dublin Core format. + - When you edit a previously created OAIPMH harvester instance, both the *set* and *prefix* drop down lists will be empty. You have to press the retrieve info button again to connect to the remote server and retrieve set and prefix information. + - The id of the remote server must be a UUID. If not, metadata can be harvested but during hierarchical propagation id clashes could corrupt harvested metadata. diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-ogcwxs.md b/docs/manual/docs/user-guide/harvesting/harvesting-ogcwxs.md index 52c88c134d48..70f45cf75d63 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-ogcwxs.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-ogcwxs.md @@ -11,27 +11,46 @@ An OGC service implements a GetCapabilities operation that GeoNetwork, acting as ## Adding an OGC Service Harvester -Configuration options: - -- **Site** - - *Name* - The name of the catalogue and will be one of the search criteria. - - *Type* - The type of OGC service indicates if the harvester has to query for a specific kind of service. Supported type are WMS (1.0.0, 1.1.1, 1.3.0), WFS (1.0.0 and 1.1.0), WCS (1.0.0), WPS (0.4.0 and 1.0.0), CSW (2.0.2) and SOS (1.0.0). - - *Service URL* - The service URL is the URL of the service to contact (without parameters like "REQUEST=GetCapabilities", "VERSION=", \...). It has to be a valid URL like . - - *Metadata language* - Required field that will define the language of the metadata. It should be the language used by the OGC web service administrator. - - *ISO topic category* - Used to populate the topic category element in the metadata. It is recommended to choose one as the topic category is mandatory for the ISO19115/19139 standard if the hierarchical level is "datasets". - - *Type of import* - By default, the harvester produces one service metadata record. Check boxes in this group determine the other metadata that will be produced. - - *Create metadata for layer elements using GetCapabilities information*: Checking this option means that the harvester will loop over datasets served by the service as described in the GetCapabilities document. - - *Create metadata for layer elements using MetadataURL attributes*: Checkthis option means that the harvester will generate metadata from an XML document referenced in the MetadataUrl attribute of the dataset in the GetCapabilities document. If the document referred to by this attribute is not valid (eg. unknown schema, bad XML format), the GetCapabilities document is used as per the previous option. - - *Create thumbnails for WMS layers*: If harvesting from an OGC WMS, then checking this options means that thumbnails will be created during harvesting. - - *Target schema* - The metadata schema of the dataset metadata records that will be created by this harvester. - - *Icon* - The default icon displayed as attribution logo for metadata created by this harvester. -- **Options** - Scheduling Options. -- **Privileges** -- **Category for service** - Metadata for the harvested service is assigned to the category selected in this option (eg. "interactive resources"). -- **Category for datasets** - Metadata for the harvested datasets is assigned to the category selected in this option (eg. "datasets"). +To create a OGC Service harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `OGC Web Services`: + +![](img/add-ogcwebservices-harvester.png) + +Providing the following information: + +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. + +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to OGC Web Services** + - *Service URL*: The service URL is the URL of the service to contact (without parameters like "REQUEST=GetCapabilities", "VERSION=", \...). It has to be a valid URL like . + - *Service type* - The type of OGC service indicates if the harvester has to query for a specific kind of service. Supported type are WMS (1.0.0, 1.1.1, 1.3.0), WFS (1.0.0 and 1.1.0), WCS (1.0.0), WPS (0.4.0 and 1.0.0), CSW (2.0.2) and SOS (1.0.0). + - *Remote authentication*: If checked, should be provided the credentials for basic HTTP authentication on the server. + +- **Configure response processing for ogcwxs** + - *Build service metadata record from a template*: + - *Category for service metadata*: (Optional) Metadata for the harvested service is assigned to the category selected in this option (eg. "interactive resources"). + - *Create record for each layer only using GetCapabilities information*: Checking this option means that the harvester will loop over datasets served by the service as described in the GetCapabilities document. + - *Import record for each layer using MetadataURL attributes*: Checkthis option means that the harvester will generate metadata from an XML document referenced in the MetadataUrl attribute of the dataset in the GetCapabilities document. If the document referred to by this attribute is not valid (eg. unknown schema, bad XML format), the GetCapabilities document is used as per the previous option. + - *Build dataset metadata records from a template* + - *Create thumbnail*: If checked, when harvesting from an OGC Web Map Service (WMS) that supports WGS84 projection, thumbnails for the layers metadata will be created during harvesting. + - *Category for datasets*: Metadata for the harvested datasets is assigned to the category selected in this option (eg. "datasets"). + + - *ISO category*: (Optional) Used to populate the topic category element in the metadata. It is recommended to choose one as the topic category is mandatory for the ISO19115/19139 standard if the hierarchical level is "datasets". + - *Metadata language*: Required field that will define the language of the metadata. It should be the language used by the OGC web service administrator. + - *Output schema*: The metadata schema of the dataset metadata records that will be created by this harvester. The value should be an XSLT process which is used by the harvester to convert the GetCapabilities document to metadata records from that schema. If in doubt, use the default value `iso19139`. + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. + - *XSL transformation to apply*: (Optional) The referenced XSL transform will be applied to each metadata record before it is added to GeoNetwork. + + +- **Privileges** - Assign privileges to harvested metadata. + !!! Notes - - every time the harvester runs, it will remove previously harvested records and create new records. GeoNetwork will generate the uuid for all metadata (both service and datasets). The exception to this rule is dataset metadata created using the MetadataUrl tag is in the GetCapabilities document, in that case, the uuid of the remote XML document is used instead - - thumbnails can only be generated when harvesting an OGC Web Map Service (WMS). The WMS should support the WGS84 projection - - the chosen *Target schema* must have the support XSLTs which are used by the harvester to convert the GetCapabilities statement to metadata records from that schema. If in doubt, use iso19139. + - Every time the harvester runs, it will remove previously harvested records and create new records. GeoNetwork will generate the uuid for all metadata (both service and datasets). The exception to this rule is dataset metadata created using the MetadataUrl tag is in the GetCapabilities document, in that case, the uuid of the remote XML document is used instead diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-sde.md b/docs/manual/docs/user-guide/harvesting/harvesting-sde.md index 7f4f99cb913d..32cdd4df7805 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-sde.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-sde.md @@ -1,55 +1,60 @@ # Harvesting an ARCSDE Node {#sde_harvester} -This is a harvesting protocol for metadata stored in an ArcSDE installation. +This is a harvesting protocol for metadata stored in an ArcSDE installation. The harvester identifies the ESRI metadata format: ESRI ISO, ESRI FGDC to apply the required xslts to transform metadata to ISO19139. ## Adding an ArcSDE harvester -The harvester identifies the ESRI metadata format: ESRI ISO, ESRI FGDC to apply the required xslts to transform metadata to ISO19139. Configuration options: +To create an ArcSDE harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `ArcSDE`: + +![](img/add-arcsde-harvester.png) + +Providing the following information: - **Identification** - - *Name* - This is a short description of the node. It will be shown in the harvesting main page. - - *Group* - User admin of this group and catalog administrator can manage this node. - - *Harvester user* - User that owns the harvested metadata. -- **Schedule** - Schedule configuration to execute the harvester. -- **Configuration for protocol ArcSDE** - - *Server* - ArcSde server IP address or name. - - *Port* - ArcSde service port (typically 5151) or ArcSde database port, depending on the connection type selected, see below the *Connection type* section. - - *Database name* - ArcSDE instance name (typically esri_sde). - - *ArcSde version* - ArcSde version to harvest. The data model used by ArcSde is different depending on the ArcSde version. + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. + +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to Database** + - *Server*: ArcSDE server IP address or name. + - *Port*: ArcSDE service port (typically 5151) or ArcSDE database port, depending on the connection type selected, see below the *Connection type* section. + - *Database name*: ArcSDE instance name (typically esri_sde). + - *ArcSDE version: ArcSDE version to harvest. The data model used by ArcSDE is different depending on the ArcSDE version. - *Connection type* - - *ArcSde service* - Uses the ArcSde service to retrieve the metadata. + - *ArcSDE service*: Uses the ArcSDE service to retrieve the metadata. !!! note - Additional installation steps are required to use the ArcSDE harvester because it needs proprietary ESRI Java api jars to be installed. - - ArcSDE Java API libraries need to be installed by the user in GeoNetwork (folder INSTALL_DIR_GEONETWORK/WEB-INF/lib), as these are proprietary libraries not distributed with GeoNetwork. - - The following jars are required: - - - jpe_sdk.jar - - jsde_sdk.jar - - dummy-api-XXX.jar must be removed from INSTALL_DIR/web/geonetwork/WEB-INF/lib + Additional installation steps are required to use the ArcSDE harvester because it needs proprietary ESRI Java api jars to be installed. + ArcSDE Java API libraries need to be installed by the user in GeoNetwork (folder `INSTALL_DIR_GEONETWORK/WEB-INF/lib`), as these are proprietary libraries not distributed with GeoNetwork. - - *Database direct connection* - Uses a database connection (JDBC) to retrieve the metadata. With + The following jars are required: - !!! note + - jpe_sdk.jar + - jsde_sdk.jar - Database direct connection requires to copy JDBC drivers in INSTALL_DIR_GEONETWORK/WEB-INF/lib. + `dummy-api-XXX.jar` must be removed from `INSTALL_DIR/web/geonetwork/WEB-INF/lib`. + - *Database direct connection*: Uses a database connection (JDBC) to retrieve the metadata. + + !!! note + + Database direct connection requires to copy JDBC drivers in `INSTALL_DIR_GEONETWORK/WEB-INF/lib`. !!! note Postgres JDBC drivers are distributed with GeoNetwork, but not for Oracle or SqlServer. - - *Database type* - ArcSde database type: Oracle, Postgres, SqlServer. Only available if connection type is configured to *Database direct connection*. - - *Username* - Username to connect to ArcSDE server. - - *Password* - Password of the ArcSDE user. -- **Advanced options for protocol arcsde** - - *Validate records before import* - Defines the criteria to reject metadata that is invalid according to XSD and schematron rules. + - *Database type* - ArcSDE database type: Oracle, Postgres, SqlServer. Only available if connection type is configured to *Database direct connection*. + - *Remote authentication*: Credentials to connect to the ArcSDE server. + +- **Configure response processing for arcsde** + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). - Accept all metadata without validation. - Accept metadata that are XSD valid. - Accept metadata that are XSD and schematron valid. + - **Privileges** - Assign privileges to harvested metadata. diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-simpleurl.md b/docs/manual/docs/user-guide/harvesting/harvesting-simpleurl.md index 775b4a9d1a93..b4449864c1c9 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-simpleurl.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-simpleurl.md @@ -4,47 +4,69 @@ This harvester connects to a remote server via a simple URL to retrieve metadata ## Adding a simple URL harvester -- **Site** - Options about the remote site. +To create a Simple URL harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `Simple URL`: - - *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the harvester. - - *Service URL* - The URL of the server to be harvested. This can include pagination params like `?start=0&rows=20` - - *loopElement* - Propery/element containing a list of the record entries. (Indicated as an absolute path from the document root.) eg. `/datasets` - - *numberOfRecordPath* : Property indicating the total count of record entries. (Indicated as an absolute path from the document root.) eg. `/nhits` - - *recordIdPath* : Property containing the record id. eg. `datasetid` - - *pageFromParam* : Property indicating the first record item on the current "page" eg. `start` - - *pageSizeParam* : Property indicating the number of records containned in the current "page" eg. `rows` - - *toISOConversion* : Name of the conversion schema to use, which must be available as XSL on the GN instance. eg. `OPENDATASOFT-to-ISO19115-3-2018` +![](img/add-simpleurl-harvester.png) + +Providing the following information: + +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. + +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to Simple URL** + - *URL* - The URL of the server to be harvested. This can include pagination params like `?start=0&rows=20` + - *Remote authentication*: If checked, should be provided the credentials for basic HTTP authentication on the server. + - *Element to loop on*: Propery/element containing a list of the record entries. (Indicated as an absolute path from the document root.) eg. `/datasets` + - *Element for the UUID of each record* : Property containing the record id. eg. `datasetid` + - *Pagination parameters*: (optional). + - *Element for the number of records to collect*: Property indicating the total count of record entries. (Indicated as an absolute path from the document root.) eg. `/nhits` + - *From URL parameter*: Property indicating the first record item on the current "page" eg. `start` + - *Size URL parameter*: Property indicating the number of records containned in the current "page" eg. `rows` + - *XSL transformation to apply*: Name of the conversion schema to use, which must be available as XSL on the GeoNetwork instance. eg. `OPENDATASOFT-to-ISO19115-3-2018` !!! note GN looks for schemas by name in . These schemas might internally include schemas from other locations like . To indicate the `fromJsonOpenDataSoft` schema for example, from the latter location directly in the admin UI the following syntax can be used: `schema:iso19115-3.2018:convert/fromJsonOpenDataSoft`. + - *Batch edits*: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element. + - *Category*: (Optional) A GeoNetwork category to assign to each metadata record. + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. - **Sample configuration for opendatasoft** +- **Privileges** - Assign privileges to harvested metadata. - - *loopElement* - `/datasets` - - *numberOfRecordPath* : `/nhits` - - *recordIdPath* : `datasetid` - - *pageFromParam* : `start` - - *pageSizeParam* : `rows` - - *toISOConversion* : `OPENDATASOFT-to-ISO19115-3-2018` - **Sample configuration for ESRI** +## Sample configurations - - *loopElement* - `/dataset` - - *numberOfRecordPath* : `/result/count` - - *recordIdPath* : `landingPage` - - *pageFromParam* : `start` - - *pageSizeParam* : `rows` - - *toISOConversion* : `ESRIDCAT-to-ISO19115-3-2018` +### Sample configuration for opendatasoft - **Sample configuration for DKAN** +- *Element to loop on* - `/datasets` +- *Element for the number of records to collect* : `/nhits` +- *Element for the UUID of each record* : `datasetid` +- *From URL parameter* : `start` +- *Size URL parameter* : `rows` +- *XSL transformation to apply* : `OPENDATASOFT-to-ISO19115-3-2018` - - *loopElement* - `/result/0` - - *numberOfRecordPath* : `/result/count` - - *recordIdPath* : `id` - - *pageFromParam* : `start` - - *pageSizeParam* : `rows` - - *toISOConversion* : `DKAN-to-ISO19115-3-2018` +### Sample configuration for ESRI -- **Privileges** - Assign privileges to harvested metadata. +- *Element to loop on* - `/dataset` +- *Element for the number of records to collect* : `/result/count` +- *Element for the UUID of each record* : `landingPage` +- *From URL parameter* : `start` +- *Size URL parameter* : `rows` +- *XSL transformation to apply* : `ESRIDCAT-to-ISO19115-3-2018` + +### Sample configuration for DKAN + +- *Element to loop on* - `/result/0` +- *Element for the number of records to collect* : `/result/count` +- *Element for the UUID of each record* : `id` +- *From URL parameter* : `start` +- *Size URL parameter* : `rows` +- *XSL transformation to apply* : `DKAN-to-ISO19115-3-2018` diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-thredds.md b/docs/manual/docs/user-guide/harvesting/harvesting-thredds.md index 2c988d58e340..bb4716c75089 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-thredds.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-thredds.md @@ -4,35 +4,33 @@ THREDDS catalogs describe inventories of datasets. They are organised in a hiera ## Adding a THREDDS Catalog Harvester -The available options are: - -- **Site** - - *Name* - This is a short description of the THREDDS catalog. It will be shown in the harvesting main page as the name of this THREDDS harvester instance. - - *Catalog URL* - The remote URL of the THREDDS Catalog from which metadata will be harvested. This must be the xml version of the catalog (i.e. ending with .xml). The harvester will crawl through all datasets and services defined in this catalog creating metadata for them as specified by the options described further below. - - *Metadata language* - Use this option to specify the language of the metadata to be harvested. - - *ISO topic category* - Use this option to specify the ISO topic category of service metadata. - - *Create ISO19119 metadata for all services in catalog* - Select this option to generate iso19119 metadata for services defined in the THREDDS catalog (eg. OpenDAP, OGC WCS, ftp) and for the THREDDS catalog itself. - - *Create metadata for Collection datasets* - Select this option to generate metadata for each collection dataset (THREDDS dataset containing other datasets). Creation of metadata can be customised using options that are displayed when this option is selected as described further below. - - *Create metadata for Atomic datasets* - Select this option to generate metadata for each atomic dataset (THREDDS dataset not containing other datasets -- for example cataloguing a netCDF dataset). Creation of metadata can be customised using options that are displayed when this option is selected as described further below. - - *Ignore harvesting attribute* - Select this option to harvest metadata for selected datasets regardless of the harvest attribute for the dataset in the THREDDS catalog. If this option is not selected, metadata will only be created for datasets that have a harvest attribute set to true. - - *Extract DIF metadata elements and create ISO metadata* - Select this option to generate ISO metadata for datasets in the THREDDS catalog that have DIF metadata elements. When this option is selected a list of schemas is shown that have a DIFToISO.xsl stylesheet available (see for example `GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/DIFToISO.xsl`). Metadata is generated by reading the DIF metadata items in the THREDDS into a DIF format metadata record and then converting that DIF record to ISO using the DIFToISO stylesheet. - - *Extract Unidata dataset discovery metadata using fragments* - Select this option when the metadata in your THREDDS or netCDF/ncml datasets follows Unidata dataset discovery conventions (see ). You will need to write your own stylesheets to extract this metadata as fragments and define a template to combine with the fragments. When this option is selected the following additional options will be shown: - - *Select schema for output metadata records* - choose the ISO metadata schema or profile for the harvested metadata records. Note: only the schemas that have THREDDS fragment stylesheets will be displayed in the list (see the next option for the location of these stylesheets). - - *Stylesheet to create metadata fragments* - Select a stylesheet to use to convert metadata for the dataset (THREDDS metadata and netCDF ncml where applicable) into metadata fragments. These stylesheets can be found in the directory convert/ThreddsToFragments in the schema directory eg. for iso19139 this would be `GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/ThreddsToFragments`. - - *Create subtemplates for fragments and XLink them into template* - Select this option to create a subtemplate (=metadata fragment stored in GeoNetwork catalog) for each metadata fragment generated. - - *Template to combine with fragments* - Select a template that will be filled in with the metadata fragments generated for each dataset. The generated metadata fragments are used to replace referenced elements in the templates with an xlink to a subtemplate if the *Create subtemplates* option is checked. If *Create subtemplates* is not checked, then the fragments are simply copied into the template metadata record. - - For Atomic Datasets , one additional option is provided *Harvest new or modified datasets only*. If this option is checked only datasets that have been modified or didn't exist when the harvester was last run will be harvested. - - *Create Thumbnails* - Select this option to create thumbnails for WMS layers in referenced WMS services - - *Icon* - An icon to assign to harvested metadata. The icon will be used when showing search results. -- **Options** - Scheduling Options. -- **Privileges** -- **Category for Service** - Select the category to assign to the ISO19119 service records for the THREDDS services. -- **Category for Datasets** - Select the category to assign the generated metadata records (and any subtemplates) to. - -At the bottom of the page there are the following buttons: - -- **Back** - Go back to the main harvesting page. The harvesting definition is not added. -- **Save** - Saves this harvester definition creating a new harvesting instance. After the save operation has completed, the main harvesting page will be displayed. +To create a THREDDS Catalog harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `Thredds Catalog`: + +![](img/add-threddscatalog-harvester.png) + +Providing the following information: + +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. + +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to Thredds catalog** + - *Service URL*: The remote URL of the THREDDS Catalog from which metadata will be harvested. This must be the xml version of the catalog (i.e. ending with .xml). The harvester will crawl through all datasets and services defined in this catalog creating metadata for them as specified by the options described further below. + +- **Configure response processing for thredds** + - *Language*: Use this option to specify the language of the metadata to be harvested. + - *ISO19115 Topic category for output metadata records*: Use this option to specify the ISO topic category of service metadata. + - *Create ISO19119 metadata for all services in the thredds catalog*: Select this option to generate iso19119 metadata for services defined in the THREDDS catalog (eg. OpenDAP, OGC WCS, ftp) and for the THREDDS catalog itself. + - *Select schema for output metadata records*: The metadata standard to create the metadata. It should be a valid metadata schema installed in GeoNetwork, by default `iso19139`. + - *Dataset title*: (Optional) Title for the dataset. Default is catalog url. + - *Dataset abstract*: (Optional) Abstract for the dataset. Default is 'Thredds Dataset'. + - *Geonetwork category to assign to dataset metadata records* - Select the category to assign to the ISO19119 service records for the THREDDS services. + - *Geonetwork category to assign to dataset metadata records* - Select the category to assign the generated metadata records (and any subtemplates) to. + +- **Privileges** - Assign privileges to harvested metadata. ## More about harvesting THREDDS DIF metadata elements with the THREDDS Harvester diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-webdav.md b/docs/manual/docs/user-guide/harvesting/harvesting-webdav.md index 209d5e13c610..cdd6b12434ac 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-webdav.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-webdav.md @@ -11,11 +11,11 @@ To create a WebDAV harvester go to `Admin console` > `Harvesting` and select `Ha Providing the following information: - **Identification** - - *Node name and logo*: A unique name for the harvester and optionally a logo to assign to the harvester. + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. - *User*: User who owns the harvested records. -- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester should be executed manually from the harvesters page. If enabled a schedule expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). - **Configure connection to WebDAV / WAF** - *URL*: The remote URL from which metadata will be harvested. Each file found that has the extension `.xml` is assumed to be a metadata record. @@ -29,7 +29,10 @@ Providing the following information: It could be composed of parameter which will be sent to XSL transformation using the following syntax: `anonymizer?protocol=MYLOCALNETWORK:FILEPATH&email=gis@organisation.org&thesaurus=MYORGONLYTHEASURUS` - - *Validate records before import*: If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped. + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. - *Category*: (Optional) A GeoNetwork category to assign to each metadata record. - **Privileges** - Assign privileges to harvested metadata. diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-wfs-features.md b/docs/manual/docs/user-guide/harvesting/harvesting-wfs-features.md index 16abfa13bb74..c198e5f59669 100644 --- a/docs/manual/docs/user-guide/harvesting/harvesting-wfs-features.md +++ b/docs/manual/docs/user-guide/harvesting/harvesting-wfs-features.md @@ -2,26 +2,43 @@ Metadata can be present in the tables of a relational databases, which are commonly used by many organisations. Putting an OGC Web Feature Service (WFS) over a relational database will allow metadata to be extracted via standard query mechanisms. This harvesting type allows the user to specify a GetFeature query and map information from the features to fragments of metadata that can be linked or copied into a template to create metadata records. +An OGC web feature service (WFS) implements a GetFeature query operation that returns data in the form of features (usually rows from related tables in a relational database). GeoNetwork, acting as a client, can read the GetFeature response and apply a user-supplied XSLT stylesheet to produce metadata fragments that can be linked or copied into a user-supplied template to build metadata records. + ## Adding an OGC WFS GetFeature Harvester -An OGC web feature service (WFS) implements a GetFeature query operation that returns data in the form of features (usually rows from related tables in a relational database). GeoNetwork, acting as a client, can read the GetFeature response and apply a user-supplied XSLT stylesheet to produce metadata fragments that can be linked or copied into a user-supplied template to build metadata records. +To create a OGC WFS GetFeature harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `OGC WFS GetFeature`: + +![](img/add-wfsgetfeature-harvester.png) -The available options are: +Providing the following information: -- **Site** - - *Name* - This is a short description of the harvester. It will be shown in the harvesting main page as the name for this WFS GetFeature harvester. - - *Service URL* - The bare URL of the WFS service (no OGC params required) - - *Metadata language* - The language that will be used in the metadata records created by the harvester +- **Identification** + - *Node name and logo*: A unique name for the harvester and, optionally, a logo to assign to the harvester. + - *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester. + - *User*: User who owns the harvested records. + +- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)). + +- **Configure connection to OGC CSW 2.0.2** + - *Service URL*: The bare URL of the WFS service (no OGC params required). + - *Remote authentication*: If checked, should be provided the credentials for basic HTTP authentication on the WFS server. - *OGC WFS GetFeature Query* - The OGC WFS GetFeature query used to extract features from the WFS. - - *Schema for output metadata records* - choose the metadata schema or profile for the harvested metadata records. Note: only the schemas that have WFS fragment stylesheets will be displayed in the list (see the next option for the location of these stylesheets). - - *Stylesheet to create fragments* - User-supplied stylesheet that transforms the GetFeature response to a metadata fragments document (see below for the format of that document). Stylesheets exist in the WFSToFragments directory which is in the convert directory of the selected output schema. eg. for the iso19139 schema, this directory is `GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/WFSToFragments`. - - *Save large response to disk* - Check this box if you expect the WFS GetFeature response to be large (eg. greater than 10MB). If checked, the GetFeature response will be saved to disk in a temporary file. Each feature will then be extracted from the temporary file and used to create the fragments and metadata records. If not checked, the response will be held in RAM. - - *Create subtemplates* - Check this box if you want the harvested metadata fragments to be saved as subtemplates in the metadata catalog and xlink'd into the metadata template (see next option). If not checked, the fragments will be copied into the metadata template. - - *Template to use to build metadata using fragments* - Choose the metadata template that will be combined with the harvested metadata fragments to create metadata records. This is a standard GeoNetwork metadata template record. - - *Category for records built with linked fragments* - Choose the metadata template that will be combined with the harvested metadata fragments to create metadata records. This is a standard GeoNetwork metadata template record. -- **Options** -- **Privileges** -- **Category for subtemplates** - When fragments are saved to GeoNetwork as subtemplates they will be assigned to the category selected here. + +- **Configure response processing for wfsfeatures** + - *Language*: The language that will be used in the metadata records created by the harvester. + - *Metadata standard*: The metadata standard to create the metadata. It should be a valid metadata schema installed in GeoNetwork, by default `iso19139`. + - *Save large response to disk*: Check this box if you expect the WFS GetFeature response to be large (eg. greater than 10MB). If checked, the GetFeature response will be saved to disk in a temporary file. Each feature will then be extracted from the temporary file and used to create the fragments and metadata records. If not checked, the response will be held in RAM. + - *Stylesheet to create fragments*: User-supplied stylesheet that transforms the GetFeature response to a metadata fragments document (see below for the format of that document). Stylesheets exist in the WFSToFragments directory which is in the convert directory of the selected output schema. eg. for the iso19139 schema, this directory is `GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/WFSToFragments`. + - *Create subtemplates*: Check this box if you want the harvested metadata fragments to be saved as subtemplates in the metadata catalog and xlink'd into the metadata template (see next option). If not checked, the fragments will be copied into the metadata template. + - *Select template to combine with fragments*: Choose the metadata template that will be combined with the harvested metadata fragments to create metadata records. This is a standard GeoNetwork metadata template record. + - *Category for directory entries*: (Optional) When fragments are saved to GeoNetwork as subtemplates they will be assigned to the category selected here. + - *Validate records before import*: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron). + - Accept all metadata without validation. + - Accept metadata that are XSD valid. + - Accept metadata that are XSD and schematron valid. + +- **Privileges** - Assign privileges to harvested metadata. + ## More about turning the GetFeature Response into metadata fragments diff --git a/docs/manual/docs/user-guide/harvesting/img/add-arcsde-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-arcsde-harvester.png new file mode 100644 index 000000000000..258c163bfdac Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-arcsde-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-csw-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-csw-harvester.png new file mode 100644 index 000000000000..e6e484359b92 Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-csw-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-filesystem-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-filesystem-harvester.png new file mode 100644 index 000000000000..0e0f0d66bfdc Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-filesystem-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-geoportalrest-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-geoportalrest-harvester.png new file mode 100644 index 000000000000..31d60f997e76 Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-geoportalrest-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-oaipmh-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-oaipmh-harvester.png new file mode 100644 index 000000000000..a6ad14e6a544 Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-oaipmh-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-ogcwebservices-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-ogcwebservices-harvester.png new file mode 100644 index 000000000000..2734781c718e Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-ogcwebservices-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-simpleurl-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-simpleurl-harvester.png new file mode 100644 index 000000000000..6f7af0255a95 Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-simpleurl-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-threddscatalog-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-threddscatalog-harvester.png new file mode 100644 index 000000000000..a326a4b7c790 Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-threddscatalog-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/img/add-wfsgetfeature-harvester.png b/docs/manual/docs/user-guide/harvesting/img/add-wfsgetfeature-harvester.png new file mode 100644 index 000000000000..bd3646bc0cf3 Binary files /dev/null and b/docs/manual/docs/user-guide/harvesting/img/add-wfsgetfeature-harvester.png differ diff --git a/docs/manual/docs/user-guide/harvesting/index.md b/docs/manual/docs/user-guide/harvesting/index.md index c4d7884c32fc..abea85ff38c6 100644 --- a/docs/manual/docs/user-guide/harvesting/index.md +++ b/docs/manual/docs/user-guide/harvesting/index.md @@ -136,43 +136,43 @@ The script will add the certificate to the JVM keystore, if you run it as follow ## Harvesting page -To access the harvesting main page you have to be logged in with a profile `Administrator` or `UserAdmin`. From the `Admin console` menu, select the `Harvesting`. The harvesting page will then be displayed. +To access the harvesting main page you have to be logged in with a profile `Administrator` or `UserAdmin`. From the `Admin console` menu, select the option `Harvesting`. -The page shows a list of the currently defined harvesters with information about the harvesters statuses: +The page shows a list of the currently defined harvesters with information about the status of the harvesters: ![](img/harvesters.png) -For each harvester is displayed the following information: +The following information is shown for each harvester: -- **Last run**: Date when the harvester was executed last time. -- **Total**: This is the total number of metadata found remotely. Metadata with the same id are considered as one. -- **Updated**: Number of metadata that are present locally but that needed to be updated because their last change date was different from the remote one. -- **Unchanged**: Local metadata left unchanged. Their remote last change date did not change. +- **Last run**: Date on which the harvester was last run. +- **Total**: It is the total number of metadata found remotely. Metadata with the same id are considered as one. +- **Updated**: Number of metadata that are present locally but needed to be updated because their last modification date was different from the remote one. +- **Unchanged**: Number of local metadata that have not been modified. Its remote last modification date has not changed. -At the bottom of the list of harvesters there are the following buttons: +At the bottom of the harvester list there are the following buttons: -1. *Harvest from*: Allows to select the type of harvester to create. +1. *Harvest from*: Allows you to select the type of harvester to create. 2. *Clone*: Creates a new harvester, using the information of an existing harvester. 3. *Refresh*: Refreshes the list of harvesters. ### Adding new harvesters -To add a new harvester click in the `Harvest from` button. A dropdown list is then shown with all the available harvester protocols. +To add a new harvester, click on the `Harvest from` button. A drop-down list with all available harvesting protocols will appear. ![](img/add-harvester.png) -You can choose the type of harvest you intend to perform. The supported harvesters and details of what to do next are in the following sections: +You can choose the type of harvesting you want to do. Supported harvesters and details on what to do next can be found in the following sections. ### Harvester History {#harvest_history} -Each time a harvester is run, it generates a log file of what was harvested and/or what went wrong (eg. exception report). To view the harvester history, select a harvester in the harvesters list and select the tab `Harvester history` in the harvester page: +Each time a harvester is run, a log file is generated of what was harvested and/or what went wrong (e.g., an exception report). To view the harvester history, select a harvester in the harvester list and select the `Harvester history` tab on the harvester page: ![](img/harvester-history.png) -Once the harvest history has been displayed it is possible to download the log file of the harvester execution and delete the harvester history. +Once the harvester history is displayed, it is possible to download the log file of the harvester run and delete the harvester history. ### Harvester records -When a harvester is executed, you can view the list of metadata harvested and some statistics about the metadata. Select a harvester in the harvesters list and select the tab `Metadata records` in the harvester page: +When a harvester is executed, you can see the list of harvested metadata and some statistics about the metadata. Select a harvester in the list of harvesters and select the `Metadata records` tab on the harvester page: ![](img/harvester-statistics.png)