Database harvester #8247

josegar74 · 2024-07-08T08:33:10Z

A harvester to retrieve metadata from a database table using JDBC, currently supporting Postgres and Oracle connections.

Configuration is described in https://github.com/GeoCat/core-geonetwork/blob/7b65eecc5b77c35620c505655d7e75d6c2046f85/docs/manual/docs/user-guide/harvesting/harvesting-database.md

Checklist

Funded by: Rijkswaterstaat

cmangeat

Thank you !

cmangeat · 2024-10-17T12:28:48Z

...esters/src/main/java/org/fao/geonet/kernel/harvest/harvester/database/DatabaseHarvester.java

+        harvesterSettingsManager.add("id:" + siteId, "server", params.getServer());
+        harvesterSettingsManager.add("id:" + siteId, "port", params.getPort());
+        harvesterSettingsManager.add("id:" + siteId, "username", params.getUsername());
+        harvesterSettingsManager.add("id:" + siteId, "password", params.getPassword());


Hello, are we sure this is encrypted in datatase ?

The passwords in the seetings and settings table are encrypted using Jasypt

cmangeat · 2024-10-17T12:33:16Z

...src/main/java/org/fao/geonet/kernel/harvest/harvester/database/DatabaseHarvesterAligner.java

+        //--- retrieve harvested uuids for given harvesting node
+        localCateg = new CategoryMapper(context);
+        localGroups = new GroupMapper(context);
+        localUuids = new UUIDMapper(context.getBean(IMetadataUtils.class), params.getUuid());


Is it possible to use the one defined line 92 ?

Done in 0d74819

cmangeat · 2024-10-17T12:35:12Z

...src/main/java/org/fao/geonet/kernel/harvest/harvester/database/DatabaseHarvesterAligner.java

+
+    private void deleteLocalMetadataNotInDatabase(List<Integer> idsForHarvestingResult) throws Exception {
+        Set<Integer> idsResultHs = Sets.newHashSet(idsForHarvestingResult);
+        List<Integer> existingMetadata = context.getBean(MetadataRepository.class).findIdsBy(MetadataSpecs.hasHarvesterUuid(params.getUuid()));


Could a class member be made from MetadataRepository bean ?

Done in 0d74819

cmangeat · 2024-10-17T12:47:13Z

...rc/main/java/org/fao/geonet/kernel/harvest/harvester/database/DatabaseMetadataRetriever.java

+            sqlQuery = String.format("SELECT %s FROM %s", columnName, metadataTable);
+        }
+
+        getJdbcTemplate().query(sqlQuery, param, rs -> {


Can please say how is DatabaseMetadataRetriever related to ArcSDEJdbcConnection ?

DatabaseMetadataRetriever is quite similar indeed to ArcSDEJdbcConnection.

Probably at some point we should remove the ArcSDE harvester as the API connection mode is only useful for ArcSDE 9 and below, which is probably not used much nowadays.

The direct connection mode of the ArcSDE harvester is analogous to the database harvester, the only advantage is that in the ArcSDE harvester the user does not have to provide table and field configuration.

+1 for deprecating ArcSDE

Sure, but this will not be part of this pull request. That's for another pull request.

cmangeat · 2024-10-17T12:50:23Z

...a/org/fao/geonet/kernel/harvest/harvester/database/DatabaseMetadataRetrieverFactoryTest.java

+
+import static org.junit.Assert.*;
+
+public class DatabaseMetadataRetrieverFactoryTest {


+1 for testing !

sebr72 · 2024-10-17T13:03:22Z

docs/manual/docs/user-guide/harvesting/harvesting-database.md

+
+-   **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester should be executed manually from the harvesters page. If enabled a schedule expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)).
+
+-   **Configure connection to Database**


I was having a look at all the code and I could not workout how the connection to the db is closed. Since a Factory is used to create a new instance of the DatabaseMetadataRetriever is created each time and its database connection is encapsulated. Could clarify how the disconnection is done ? (Are we expected to have special database auto disconnect setup ? or is it missing ?)

I can be wrong, as not an expert about this, but getJdbcTemplate().query seems handling that

https://github.com/spring-projects/spring-framework/blob/0da8dee289e4cbe353d4bd6cc0935cd62ab27901/spring-jdbc/src/main/java/org/springframework/jdbc/core/JdbcTemplate.java#L627-L634

, it uses this code

https://github.com/spring-projects/spring-framework/blob/0da8dee289e4cbe353d4bd6cc0935cd62ab27901/spring-jdbc/src/main/java/org/springframework/jdbc/core/JdbcTemplate.java#L605-L606

That closes the statement and releases underlying connection.

Is it not ResultSet connection closing? (it allows to fetch a bunch of rows at a time, as there might be too many for one fetch). Usually you open a connection to a DB. Then you send a lot of requests. Then you close it. I don't know of a case where a single query (especially select queries) are expected to close the db connection. We need a third opinion or some test that will validate it ?

As indicated, afaik the Spring JdbcTemplate manages this internally. It's not like using JDBC directly.

…e IMetadataUtils class member instead of context.getBean

sebr72 · 2024-10-17T15:57:17Z

docs/manual/docs/user-guide/harvesting/harvesting-database.md

+    -   *Batch edits*: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element.
+    -   *Translate metadata content*: (Optional) Allows to translate metadata elements. It requires a translation service provider configured in the System settings.
+
+-   **Privileges** - Assign privileges to harvested metadata.


Maybe you could add a brief description of the structure of the DB to be harvested, in order to avoid having to read the code ?

The documentation had already some entries for this:

https://github.com/geonetwork/core-geonetwork/pull/8247/files#diff-b88936cc370d94d9d97a131a54e816da6de85236d379f5470a2b1c769835c8abR24-R25

https://github.com/geonetwork/core-geonetwork/pull/8247/files#diff-b88936cc370d94d9d97a131a54e816da6de85236d379f5470a2b1c769835c8abR29-R31

Please check if requires some improvements or examples.

Yes this part is good. I was referring to the part which lists the column types supported which are not mentionned and I came accross them in DatabaseMetadataRetriever.

Updated the documentation: 32c6dce

… supported

josegar74 added the backport 4.2.x label Jul 8, 2024

josegar74 added this to the 4.4.6 milestone Jul 8, 2024

josegar74 requested review from juanluisrp and fxprunayre July 8, 2024 08:33

josegar74 added changelog Documentation Documentation writing & improvements labels Jul 8, 2024

josegar74 force-pushed the 44-jdbc-harvester branch from 7b65eec to 249edde Compare July 8, 2024 08:49

Database harvester

6b7fb4d

josegar74 force-pushed the 44-jdbc-harvester branch from 249edde to 6b7fb4d Compare July 8, 2024 09:14

fxprunayre modified the milestones: 4.4.6, 4.4.7 Oct 15, 2024

cmangeat reviewed Oct 17, 2024

View reviewed changes

sebr72 reviewed Oct 17, 2024

View reviewed changes

Database harvester - Define MetadataRepository as class member and us…

0d74819

…e IMetadataUtils class member instead of context.getBean

josegar74 force-pushed the 44-jdbc-harvester branch from eab298f to 0d74819 Compare October 17, 2024 15:53

sebr72 reviewed Oct 17, 2024

View reviewed changes

josegar74 added 2 commits October 17, 2024 20:40

Merge remote-tracking branch 'upstream/main' into 44-jdbc-harvester

ce0f334

Database harvester - update documentation to document the field types…

32c6dce

… supported

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database harvester #8247

Database harvester #8247

josegar74 commented Jul 8, 2024

cmangeat left a comment

cmangeat Oct 17, 2024

josegar74 Oct 17, 2024

cmangeat Oct 17, 2024

josegar74 Oct 17, 2024

cmangeat Oct 17, 2024

josegar74 Oct 17, 2024

cmangeat Oct 17, 2024

josegar74 Oct 17, 2024

sebr72 Oct 17, 2024

josegar74 Oct 18, 2024

cmangeat Oct 17, 2024

sebr72 Oct 17, 2024

josegar74 Oct 17, 2024

sebr72 Oct 17, 2024

josegar74 Oct 18, 2024

sebr72 Oct 17, 2024 •

edited

Loading

josegar74 Oct 17, 2024

sebr72 Oct 18, 2024

josegar74 Oct 18, 2024


		import static org.junit.Assert.*;

		public class DatabaseMetadataRetrieverFactoryTest {


		- Schedule: Scheduling options to execute the harvester. If disabled, the harvester should be executed manually from the harvesters page. If enabled a schedule expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)).

		- Configure connection to Database

Database harvester #8247

Are you sure you want to change the base?

Database harvester #8247

Conversation

josegar74 commented Jul 8, 2024

Checklist

cmangeat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebr72 Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebr72 Oct 17, 2024 •

edited

Loading