Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of controlled vocabularies in a separate module #205

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
151ca33
Fix sorting
orviz Jul 3, 2024
546a695
Validate i1-02m according to the serialization format
orviz Jul 3, 2024
d30175c
Use one method for all FAIRsharing requests
orviz Jul 3, 2024
8045e07
Option to search within fairsharing by domain='resource metadata'
orviz Jul 4, 2024
e181ce7
Fix method naem
orviz Jul 4, 2024
905008f
Rely on FAIRsharing cache if not specific query is needed
orviz Jul 4, 2024
152cce1
Change username/pass type from list to string
orviz Jul 4, 2024
feaf496
Use @property decorator to obtain FAIRsharing records
orviz Jul 8, 2024
18472c0
Turn FAIRsharing config from lists to strings
orviz Jul 8, 2024
ea4d4a7
Fix: return results
orviz Jul 9, 2024
09f0897
Improve reporting
orviz Jul 9, 2024
b3ac2b3
Use FAIRsharingAPIUtils class to get metadata standards and formats
orviz Jul 9, 2024
b1f9af9
Fix in search_item
orviz Jul 9, 2024
e5901d6
Return empty content when 'local' is not defined
orviz Jul 9, 2024
d7b5266
New proposal for tracking validation checks (e.g. IANA media types) u…
orviz Jul 12, 2024
1fa6c4d
Implement VocabularyBase
orviz Jul 17, 2024
0871bdd
Add vocabulary classes in a individual file
orviz Jul 19, 2024
825e93e
Use Vocabulary class
orviz Jul 19, 2024
5ebbb9d
Fix: module file name
orviz Jul 19, 2024
43bb7d0
Fix imports
orviz Aug 14, 2024
576a3dd
Fix: use logger
orviz Aug 14, 2024
13c1f7f
Move call to get_metadata() to the end of __init__ method
orviz Aug 19, 2024
f2f71dc
Increase logging
orviz Aug 19, 2024
00ab2a8
Return check data in raw format
orviz Aug 19, 2024
6307025
Turn VocabularyConnection.collect as classmethod
orviz Aug 19, 2024
d636180
Define common set of properties for VocabularyConnection class
orviz Aug 19, 2024
a04fcd8
Fix: missing imports
orviz Aug 19, 2024
8f469f0
IANA media types gathering from local cache/file
orviz Aug 19, 2024
2d15372
Merge branch 'main' into feature/i1_02m
orviz Aug 19, 2024
eeb5848
Disable by default remote checking for gathering IANA media types
orviz Aug 19, 2024
f25429f
Docstring for _remote_collect(), including expected return
orviz Aug 20, 2024
314215b
Use logger
orviz Aug 20, 2024
88855bd
FAIRsharing vocabulary implementation
orviz Aug 20, 2024
c34fb8e
Fix style
orviz Aug 20, 2024
8a63c7d
Config parameters for 'vocabularies:fairsharing'
orviz Aug 20, 2024
af7feb9
Fix: do not track logs through 'evaluator_logs' property
orviz Aug 20, 2024
83d8790
Use Vocabulary.get_fairsharing() for retrieving FAIRsharing content
orviz Aug 20, 2024
a51e7e0
Add static content of FAIRsharing registry
orviz Aug 20, 2024
dac3286
Change static path to IANA media types info
orviz Aug 20, 2024
4c76902
Fix: remove HTML headers from JSON
orviz Aug 20, 2024
326b25b
Fix: 'True' and 'False' values are required in config.ini so to be su…
orviz Aug 20, 2024
49314f3
Get _config_items in FAIRsharingRegistry.collect()
orviz Aug 20, 2024
aec1950
Get _config_items in IANAMediaTypes.collect()
orviz Aug 20, 2024
7a48e4f
Implement _local_collect for FAIRsharing
orviz Aug 20, 2024
6a24368
Fix: variable name
orviz Aug 20, 2024
cf1e2fd
Set _config_items as class property
orviz Aug 20, 2024
96f5738
Implement XML parsing from string
orviz Aug 20, 2024
1d7291a
Reuse logger from plugins
orviz Aug 21, 2024
281343f
Improve logging
orviz Aug 21, 2024
e380a06
Pass config to IANAMediaTypes
orviz Sep 16, 2024
0aca6bb
Pass config to FAIRSharing vocabulary
orviz Sep 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ repos:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
# - id: check-added-large-files
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 23.12.0
hooks:
Expand Down
2 changes: 1 addition & 1 deletion api/rda.py
Original file line number Diff line number Diff line change
Expand Up @@ -1346,7 +1346,7 @@ def rda_all(body, eva):
try:
with open(api_config, "r") as f:
documents = yaml.full_load(f)
logging.debug("API configuration successfully loaded: %s" % api_config)
logger.debug("API configuration successfully loaded: %s" % api_config)
except Exception as e:
message = "Could not find API config file: %s" % api_config
logger.error(message)
Expand Down
73 changes: 3 additions & 70 deletions api/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import json
import logging
import os
import re
import sys
import urllib
Expand All @@ -12,6 +13,8 @@
import requests
from bs4 import BeautifulSoup

from fair import app_dirname, load_config

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)


Expand Down Expand Up @@ -805,76 +808,6 @@ def make_http_request(url, request_type="GET", verify=False):
return payload


def get_fairsharing_metadata(offline=True, username="", password="", path=""):
if offline == True:
f = open(path)
fairlist = json.load(f)
f.close()

else:
url = "https://api.fairsharing.org/users/sign_in"
payload = {"user": {"login": username, "password": password}}
headers = {"Accept": "application/json", "Content-Type": "application/json"}

response = requests.request(
"POST", url, headers=headers, data=json.dumps(payload)
)

# Get the JWT from the response.text to use in the next part.
data = response.json()
jwt = data["jwt"]

url = "https://api.fairsharing.org/search/fairsharing_records?page[size]=2500&fairsharing_registry=standard&user_defined_tags=metadata standardization"

headers = {
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": "Bearer {0}".format(jwt),
}

response = requests.request("POST", url, headers=headers)
fairlist = response.json()
user = open(path, "w")
json.dump(fairlist, user)
user.close()
return fairlist


def get_fairsharing_formats(offline=True, username="", password="", path=""):
if offline == True:
f = open(path)
fairlist = json.load(f)
f.close()

else:
url = "https://api.fairsharing.org/users/sign_in"
payload = {"user": {"login": username, "password": password}}
headers = {"Accept": "application/json", "Content-Type": "application/json"}

response = requests.request(
"POST", url, headers=headers, data=json.dumps(payload)
)

# Get the JWT from the response.text to use in the next part.
data = response.json()
jwt = data["jwt"]

url = "https://api.fairsharing.org/search/fairsharing_records?page[size]=2500&user_defined_tags=Geospatial data"

headers = {
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": "Bearer {0}".format(jwt),
}

response = requests.request("POST", url, headers=headers)
fairlist = response.json()
user = open(path, "w")
json.dump(fairlist, user)
user.close()
return fairlist


def check_fairsharing_abbreviation(fairlist, abreviation):
for standard in fairlist["data"]:
if abreviation == standard["attributes"]["abbreviation"]:
Expand Down
223 changes: 223 additions & 0 deletions api/vocabulary.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
import ast
import json
import logging
import os
import sys

import requests

from fair import app_dirname, load_config

logger = logging.getLogger("plugin.py")


class VocabularyConnection:
def __init__(self, **config_items):
self.vocabulary_name = config_items.get("vocabulary_name", "")
self.enable_remote_check = ast.literal_eval(
config_items.get("enable_remote_check", "True")
)
self.remote_path = config_items.get("remote_path", "")
self.remote_username = config_items.get("remote_username", "")
self.remote_password = config_items.get("remote_password", "")
self.local_path = config_items.get("local_path", "")
self.local_path_full = ""

def _get_token(self):
return NotImplementedError

def _login(self):
return NotImplementedError

def _remote_collect(self):
"""Performs the remote call to the vocabulary registry. It shall return a tuple (error_on_request, content), where 'error_on_request' is a boolean and 'content' is the actual content returned by the request when successful."""
raise NotImplementedError

def _local_collect(self):
raise NotImplementedError

def collect(self, search_item=None, perform_login=False):
content = []
# Get content from remote endpoint
error_on_request = False
if self.enable_remote_check:
logger.debug(
"Accessing vocabulary '%s' remotely through %s"
% (self.name, self.remote_path)
)
error_on_request, content = self._remote_collect()
# Get content from local cache
if not self.enable_remote_check or error_on_request:
logger.debug(
"Accessing vocabulary '%s' from local cache: %s"
% (self.name, self.local_path)
)
self.local_path_full = os.path.join(app_dirname, self.local_path)
logger.debug("Full path to local cache: %s" % self.local_path_full)
content = self._local_collect()

return content


class IANAMediaTypes(VocabularyConnection):
def __init__(self, config):
self.name = "IANA Media Types"
self._config_items = dict(config.items("vocabularies:iana_media_types"))

def _parse_xml(self, from_file=False, from_string=""):
property_key_xml = self._config_items.get(
"property_key_xml", "{http://www.iana.org/assignments}file"
)
logger.debug(
"Using XML property key '%s' to gather the list of media types"
% property_key_xml
)

import xml.etree.ElementTree as ET

tree = None
if from_file:
tree = ET.parse(self.local_path_full)
root = tree.getroot()
elif from_string:
root = ET.fromstring(from_string)
else:
logger.error("Could not get IANA Media Types from %s" % self.remote_path)
return []

media_types_list = [
media_type.text for media_type in root.iter(property_key_xml)
]
logger.debug("Found %s items for IANA media types" % len(media_types_list))

return media_types_list

def _remote_collect(self):
error_on_request = False
content = []
headers = {"Content-Type": "application/xml"}
response = requests.request("GET", self.remote_path, headers=headers)
if response.ok:
content = response.text
media_types_list = self._parse_xml(from_string=content)
if not media_types_list:
error_on_request = True
else:
error_on_request = True

return error_on_request, content

def _local_collect(self):
return self._parse_xml(self, from_file=True)

def collect(self):
super().__init__(**self._config_items)
content = super().collect()

return content


class FAIRsharingRegistry(VocabularyConnection):
def __init__(self, config):
self.name = "FAIRsharing registry"
self._config_items = dict(config.items("vocabularies:fairsharing"))

def _login(self):
url_api_login = "https://api.fairsharing.org/users/sign_in"
payload = {
"user": {"login": self.remote_username, "password": self.remote_password}
}
login_headers = {
"Accept": "application/json",
"Content-Type": "application/json",
}
response = requests.request(
"POST", url_api_login, headers=login_headers, data=json.dumps(payload)
)
# Get the JWT from the response.text to use in the next part.
headers = {}
if response.ok:
data = response.json()
token = data["jwt"]
logger.debug("Get token from FAIRsharing API: %s" % token)
headers = {
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": "Bearer {0}".format(token),
}
else:
logger.warning(
"Could not get token from FAIRsharing API: %s" % response.text
)

return headers

def _remote_collect(self):
error_on_request = False
content = []
if not (self.remote_username and self.remote_password):
logger.error(
"Could not get required 'username' and 'password' properties for accessing FAIRsharing registry API"
)
else:
headers = self._login()
logger.debug("Got headers from sign in process: %s" % headers)
response = requests.request("POST", self.remote_path, headers=headers)
if response.ok:
content = response.json().get("data", [])
if content:
logger.debug(
"Successfully returned %s items from search query: %s"
% (len(content), self.remote_path)
)
else:
error_on_request = True
else:
logger.warning(
"Failed to obtain records from endpoint: %s" % response.text
)
error_on_request = True

return error_on_request, content

def _local_collect(self):
with open(self.local_path, "r") as f:
content = json.load(f).get("data", [])
logger.debug("Successfully loaded local cache: %s" % content)

return content

def collect(self, search_topic):
# Set specific query parameters for remote requests
remote_path = self._config_items.get("remote_path", "")
if not remote_path:
logger.warning(
"Could not get FAIRsharing API endpoint from configuration (check 'remote_path' property)"
)
else:
query_parameter = "q=%s" % search_topic
remote_path_with_query = "?page[size]=2500&".join(
[remote_path, query_parameter]
)
self._config_items["remote_path"] = remote_path_with_query
logger.debug(
"Request URL to FAIRsharing API with search topic '%s': %s"
% (search_topic, self._config_items["remote_path"])
)
super().__init__(**self._config_items)
content = super().collect()

return content


class Vocabulary:
def __init__(self, config):
self.config = config

def get_iana_media_types(self):
vocabulary = IANAMediaTypes(self.config)
return vocabulary.collect()

def get_fairsharing(self, search_topic):
vocabulary = FAIRsharingRegistry(self.config)
return vocabulary.collect(search_topic=search_topic)
15 changes: 14 additions & 1 deletion config.ini.template
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ doi_url = https://doi.org/
api_config = fair-api.yaml

[local]
only_local = false
only_local = False
repo = digital_csic
logo_url = 'https://ifca.unican.es'
title = FAIR EVA: Evaluator, Validator & Advisor
Expand All @@ -19,6 +19,19 @@ epos= 'epos'
example_plugin = Example_Plugin
signposting = Signposting

[vocabularies:iana_media_types]
enable_remote_check = True
property_key_xml = {http://www.iana.org/assignments}file
remote_path = https://www.iana.org/assignments/media-types/media-types.xml
local_path = static/controlled_vocabularies/IANA-media-types.xml

[vocabularies:fairsharing]
enable_remote_check = True
remote_username =
remote_password =
remote_path = https://api.fairsharing.org/search/fairsharing_records
local_path = static/controlled_vocabularies/fairsharing.json

[dspace7]
base_url = http://localhost:8080/server/

Expand Down
10 changes: 3 additions & 7 deletions plugins/epos/config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ doi_url = https://doi.org/
api_config = fair-api.yaml
endpoint=https://ics-c.epos-ip.org/development/k8s-epos-deploy/dt-geo/api/v1
[local]
only_local = false
only_local = False
repo = digital_csic
logo_url = 'https://ifca.unican.es'
title = FAIR EVA: Evaluator, Validator & Advisor
Expand Down Expand Up @@ -168,12 +168,8 @@ username = ['']
password = ['']

#_path is variable that stores the path to the file in which the fairsharing-approved metadatata standards or formasts are stored

metadata_path = ['static/fairsharing_metadata_standards20240214.json']

formats_path = ['static/fairsharing_formats20240226.txt']


metadata_path = static/fairsharing_metadata_standards20240214.json
formats_path = static/fairsharing_formats20240226.txt

[internet media types]
#path to internet media files file
Expand Down
Loading