Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database harvester #8247

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

josegar74
Copy link
Member

A harvester to retrieve metadata from a database table using JDBC, currently supporting Postgres and Oracle connections.

Configuration is described in https://github.com/GeoCat/core-geonetwork/blob/7b65eecc5b77c35620c505655d7e75d6c2046f85/docs/manual/docs/user-guide/harvesting/harvesting-database.md

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

Funded by: Rijkswaterstaat

@josegar74 josegar74 added this to the 4.4.6 milestone Jul 8, 2024
@josegar74 josegar74 added changelog Documentation Documentation writing & improvements labels Jul 8, 2024
Copy link
Collaborator

@cmangeat cmangeat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !

harvesterSettingsManager.add("id:" + siteId, "server", params.getServer());
harvesterSettingsManager.add("id:" + siteId, "port", params.getPort());
harvesterSettingsManager.add("id:" + siteId, "username", params.getUsername());
harvesterSettingsManager.add("id:" + siteId, "password", params.getPassword());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, are we sure this is encrypted in datatase ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passwords in the seetings and settings table are encrypted using Jasypt

//--- retrieve harvested uuids for given harvesting node
localCateg = new CategoryMapper(context);
localGroups = new GroupMapper(context);
localUuids = new UUIDMapper(context.getBean(IMetadataUtils.class), params.getUuid());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use the one defined line 92 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0d74819


private void deleteLocalMetadataNotInDatabase(List<Integer> idsForHarvestingResult) throws Exception {
Set<Integer> idsResultHs = Sets.newHashSet(idsForHarvestingResult);
List<Integer> existingMetadata = context.getBean(MetadataRepository.class).findIdsBy(MetadataSpecs.hasHarvesterUuid(params.getUuid()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a class member be made from MetadataRepository bean ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0d74819

sqlQuery = String.format("SELECT %s FROM %s", columnName, metadataTable);
}

getJdbcTemplate().query(sqlQuery, param, rs -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can please say how is DatabaseMetadataRetriever related to ArcSDEJdbcConnection ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatabaseMetadataRetriever is quite similar indeed to ArcSDEJdbcConnection.

Probably at some point we should remove the ArcSDE harvester as the API connection mode is only useful for ArcSDE 9 and below, which is probably not used much nowadays.

The direct connection mode of the ArcSDE harvester is analogous to the database harvester, the only advantage is that in the ArcSDE harvester the user does not have to provide table and field configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for deprecating ArcSDE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but this will not be part of this pull request. That's for another pull request.


import static org.junit.Assert.*;

public class DatabaseMetadataRetrieverFactoryTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for testing !


- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester should be executed manually from the harvesters page. If enabled a schedule expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)).

- **Configure connection to Database**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having a look at all the code and I could not workout how the connection to the db is closed. Since a Factory is used to create a new instance of the DatabaseMetadataRetriever is created each time and its database connection is encapsulated. Could clarify how the disconnection is done ? (Are we expected to have special database auto disconnect setup ? or is it missing ?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not ResultSet connection closing? (it allows to fetch a bunch of rows at a time, as there might be too many for one fetch). Usually you open a connection to a DB. Then you send a lot of requests. Then you close it. I don't know of a case where a single query (especially select queries) are expected to close the db connection. We need a third opinion or some test that will validate it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As indicated, afaik the Spring JdbcTemplate manages this internally. It's not like using JDBC directly.

…e IMetadataUtils class member instead of context.getBean
- *Batch edits*: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element.
- *Translate metadata content*: (Optional) Allows to translate metadata elements. It requires a translation service provider configured in the System settings.

- **Privileges** - Assign privileges to harvested metadata.
Copy link
Contributor

@sebr72 sebr72 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add a brief description of the structure of the DB to be harvested, in order to avoid having to read the code ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this part is good. I was referring to the part which lists the column types supported which are not mentionned and I came accross them in DatabaseMetadataRetriever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the documentation: 32c6dce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 4.2.x changelog Documentation Documentation writing & improvements
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

4 participants