Improvements in Glossaries module #1557

dlpzx · 2024-09-17T14:34:13Z

Here is a list of improvements that can be currently be taken in the Glossaries module:

Implement server-side permissions for glossary operations
This should be the first priority. For glossaries there are no permissions decorating the service. Which means that for example if a team owns glossary1, another team could delete glossary1. This is avoided in the UI but not programmatically.
Ensure business logic is in GlossariesService and not in db.repositories
The service layer in the glossaries module offloads all the business logic to the GlossariesRepository. Instead we should execute the business logic in the service and use the repository exclusively for db operations.
Clearly split Glossaries and Catalog. At the moment in the backend there is a catalog module which is in fact handling all APIs for Glossaries which are not necessary the same. In the frontend we separate among Glossaries and Catalog.
Remove duplicated gql fields for Glossaries/Categories/Terms
Categories, Terms and Glossaries share most of their gql types and input_types. We should reduce the lines of code and re-use a single input_type and single type for them. This is also applicable to the frontend query definition.
Simplify API calls and make them more straight-forward
Secondly, the way the API calls are designed to get information from a glossary can be simplified. We currently define 3 APIs in the frontend:

getGlossary
getGlossaryTree - which in reality is getGlossary with a bunch of extra fields
listAssociations - which in reality is getGlossaryTree - which in reality is getGlossary

getGlossary is doing too many things. We should break it down by responsibility:

getGlossary - input: glossary.nodeUri, output: glossary db metadata + resolve_stats
NEW `getGlossaryTree - input: glossary.nodeUri, output: tree object of categories, terms with their parentUri
NEW listAssociations - input: glossary.nodeUri, output: paginated list of associations

Remove duplicated code
Glossaries API calls return a great number of unused fields. First of all we should remove unused fields from both backend and frontend to improve performance. I am mostly referring to the Glossary object.

The text was updated successfully, but these errors were encountered:

### Feature or Bugfix - Feature:Testing ### Detail Implement tests for Glossaries/Catalog as part of #1220 ⚠️ To test glossary associations, we need resources that use the glossary terms (associate a term to a resource). Possible resources include: s3_datasets, s3_tables, s3_folders, dashboards, redshift_datasets and redshift_tables. All of which are part of other modules that might be enabled or disabled. In this PR we assume that the s3_datasets module is enabled! If it was not enabled, then the glossary tests would fail! During the implementation several enhancement ideas came up and are collected in #1557 Tested in local with connection to real AWS deployment: ![image](https://github.com/user-attachments/assets/d759cc3d-7e44-4057-b16b-0fc94edd2290) ### Relates - #1220 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

dlpzx added type: enhancement Feature enhacement priority: medium effort: medium labels Sep 18, 2024

dlpzx mentioned this issue Sep 18, 2024

Add Glossaries integration tests #1556

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements in Glossaries module #1557

Improvements in Glossaries module #1557

dlpzx commented Sep 17, 2024 •

edited

Loading

Improvements in Glossaries module #1557

Improvements in Glossaries module #1557

Comments

dlpzx commented Sep 17, 2024 • edited Loading

dlpzx commented Sep 17, 2024 •

edited

Loading