Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eduid_user_cleaner skatteverket worker #314

Closed
wants to merge 62 commits into from
Closed
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
53d6da9
Init commit.
masv3971 Sep 28, 2022
1eddece
Update hashes.
masv3971 Sep 28, 2022
6fae0d5
First.
masv3971 Sep 30, 2022
4ab0f89
Add first layout.
masv3971 Oct 3, 2022
97cf1dc
Merge
masv3971 Nov 10, 2022
fae6838
Skeleton worker structure.
masv3971 Nov 14, 2022
a9d0d23
Merge masv_amapi-client
masv3971 Nov 18, 2022
38e02f4
Merge branch 'masv_amapi-client' into masv-cleaner-worker
masv3971 Nov 21, 2022
6ab8095
Add navet stuff to worker.
masv3971 Nov 22, 2022
12cff53
user cleaner barebone.
masv3971 Nov 29, 2022
76cb98a
Merge from masv_amapi-client.
masv3971 Nov 29, 2022
91f82cd
Add delay and change quota.
masv3971 Nov 30, 2022
9dd0d10
Rename file.
masv3971 Dec 1, 2022
ce5b3be
Add stats.
masv3971 Dec 5, 2022
4392f41
Add init files.
masv3971 Dec 5, 2022
a72be83
Cleand away unused folder.
masv3971 Dec 5, 2022
503bef0
Add dry run and change log to debug.
masv3971 Dec 6, 2022
486159e
Merge branch 'masv_amapi-client' into masv-cleaner-worker
masv3971 Dec 6, 2022
e62b015
Add teleadress and some spelling.
masv3971 Dec 6, 2022
90b0ae7
Add health check.
masv3971 Dec 6, 2022
949032c
Add words to vscode.
masv3971 Dec 7, 2022
ec76bd5
Merge from masv_amapi-client.
masv3971 Dec 12, 2022
5cc7f84
Add comments and general sleep method for faster kills of the app.
masv3971 Dec 12, 2022
297531a
Add words to dictionary.
masv3971 Dec 12, 2022
393ed71
Better syntax.
masv3971 Dec 12, 2022
e878033
Add milliseconds sleep.
masv3971 Dec 13, 2022
4638578
Merge from main.
masv3971 Dec 14, 2022
fac9091
Merge branch 'main' into masv-cleaner-worker
masv3971 Dec 19, 2022
dce92b8
untangle files: cleaner related files.
masv3971 Dec 23, 2022
bbc85a9
Merge from main.
masv3971 Jan 11, 2023
38ac3a8
Add wait, meta and cache.
masv3971 Jan 26, 2023
e7a02ad
Merge from main.
masv3971 Jan 26, 2023
a1a790e
Change auth url due to change.
masv3971 Jan 29, 2023
942f3a9
Merge branch 'main' into masv-cleaner-worker
masv3971 Jan 29, 2023
86aef2c
Lost method replaced.
masv3971 Jan 30, 2023
c7fa42f
Correct comment.
masv3971 Jan 30, 2023
ace25ce
Remove test cases from app, since they should not be in app testing s…
masv3971 Jan 30, 2023
d7d29b0
Add test and add working fastapi client mock.
masv3971 Jan 31, 2023
2df3f6b
Rename queue_user to cache_user in order to bring ORDER.
masv3971 Jan 31, 2023
f8ab129
Use TUserDbDocument instead of dict.
masv3971 Jan 31, 2023
6f18a76
Add routes for amapi_client mock.
masv3971 Jan 31, 2023
3879339
Since almost nothing load in app, it's more efficient to test in skv.
masv3971 Jan 31, 2023
d21f3b6
Fix typing issue.
masv3971 Jan 31, 2023
87fbf30
Add queue test.
masv3971 Jan 31, 2023
9bc8af6
Change to __future__ annotations.
masv3971 Feb 1, 2023
6062809
Change to use isoformat instead of strfrime.
masv3971 Feb 1, 2023
3b0072a
Rename queue to cache due to legacy.
masv3971 Feb 1, 2023
1a75778
Remove mock from self, since it's not used globaly.
masv3971 Feb 1, 2023
f270aaa
Add stat collection for each completed task.
masv3971 Feb 1, 2023
35897db
Remove dubble fnutts around Queue type, not necessary.
masv3971 Feb 1, 2023
2bf2ec7
Remove trivial method and apply its logic directly.
masv3971 Feb 1, 2023
4ad0778
Optimize db query.
masv3971 Feb 1, 2023
521432c
Use _get_all_docs() istead of _coll.find({}].
masv3971 Feb 1, 2023
ac5a6cf
Use BaseDB drop method.
masv3971 Feb 1, 2023
a94bb52
Use BaseDB method instead of _coll.
masv3971 Feb 1, 2023
e8f0a07
Replace int (unix epoch) with datetime.
masv3971 Feb 1, 2023
4378c29
Remove unused dep.
masv3971 Feb 1, 2023
37b1a71
Merge branch 'main' into masv-cleaner-worker
helylle Oct 3, 2023
67397b9
Merge branch 'main' into masv-cleaner-worker
helylle Oct 9, 2023
44ffed6
move function to common utils
helylle Oct 11, 2023
05b4e09
make reformat
helylle Oct 11, 2023
b36dbeb
make the worker startable
helylle Oct 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
"python.envFile": "${workspaceFolder}/.vscode/.env",
"cSpell.words": [
"amapi",
"amdb",
"authlib",
"authns",
"baar",
"backchannel",
"backupdir",
"bson",
"checkpw",
"chpass",
"csrft",
Expand All @@ -32,34 +34,46 @@
"eppn",
"fastapi",
"Freja",
"gnap",
"hashname",
"hashpw",
"httpx",
"hubba",
"IDNIN",
"jsonify",
"jwks",
"klass",
"ladok",
"lundberg",
"masv",
"mischttp",
"nameid",
"navet",
"nins",
"NORDUnet",
"oidc",
"orcid",
"prid",
"proofings",
"proquint",
"PWAUTH",
"pydantic",
"pysaml2",
"reauthn",
"replicaset",
"respx",
"scim",
"SCIMAPI",
"sess",
"SIGTERM",
"skatteverket",
"Svenska",
"svipe",
"svipeid",
"swamid",
"swedenconnect",
"TELE",
"teleadress",
"Testsson",
"timeutil",
"Unmarshal",
Expand All @@ -74,4 +88,4 @@
"zxcvbn"
],
"python.formatting.provider": "black"
}
}
2 changes: 1 addition & 1 deletion requirements/test_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2044,4 +2044,4 @@ zxcvbn==4.4.28 \
--hash=sha256:151bd816817e645e9064c354b13544f85137ea3320ca3be1fb6873ea75ef7dc1
# via
# -c main.txt
# -r main.in
# -r main.in
6 changes: 3 additions & 3 deletions src/eduid/common/clients/amapi_client/amapi_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@


class AMAPIClient(GNAPClient):
def __init__(self, amapi_url: str, app, auth_data=GNAPClientAuthData, **kwargs):
super().__init__(auth_data=auth_data, app=app, **kwargs)
def __init__(self, amapi_url: str, auth_data=GNAPClientAuthData, verify_tls: bool = True, **kwargs):
super().__init__(auth_data=auth_data, verify=verify_tls, **kwargs)
self.amapi_url = amapi_url

def _users_base_url(self) -> str:
return urlappend(self.amapi_url, "users")

def _put(self, base_path: str, user: str, endpoint: str, body: Any) -> httpx.Response:
return self.put(urlappend(base_path, f"{user}/{endpoint}"), json=body.json())
return self.put(url=urlappend(base_path, f"{user}/{endpoint}"), content=body.json())

def update_user_email(self, user: str, body: UserUpdateEmailRequest) -> UserUpdateResponse:
ret = self._put(base_path=self._users_base_url(), user=user, endpoint="email", body=body)
Expand Down
25 changes: 20 additions & 5 deletions src/eduid/common/clients/amapi_client/testing.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,34 @@
from typing import Optional
from typing import Mapping, Optional

import respx
from httpx import Response, Request
from eduid.common.models.amapi_user import UserUpdateNameRequest

from eduid.common.clients.gnap_client.testing import MockedSyncAuthAPIMixin
from eduid.userdb.userdb import AmDB


class MockedAMAPIMixin(MockedSyncAuthAPIMixin):
def start_mock_amapi(self, access_token_value: Optional[str] = None):
def start_mock_amapi(self, central_user_db: Optional[AmDB] = None, access_token_value: Optional[str] = None):
self.start_mock_auth_api(access_token_value=access_token_value)

self.mocked_users = respx.mock(base_url="http://localhost", assert_all_called=False)
self.central_user_db = central_user_db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vet inte om mypy eller våra IDEer kommer att vara så imponerade av att vi sätter något på self här.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vscode brydde sig inte. Men tagit bort dom eftersom dom inte används utanför metoden.

self.mocked_users = respx.mock(base_url="http://localhost/amapi", assert_all_called=False)
put_users_name_route = self.mocked_users.put(
url="/users/hubba-bubba/name",
name="put_users_name_request",
)
put_users_name_route.pass_through()
put_users_name_route.mock(side_effect=self._save)
self.mocked_users.start()
self.addCleanup(self.mocked_users.stop) # type: ignore

def _save(self, request: Request) -> Response:
if self.central_user_db is None:
raise ValueError("save user side affect was called but self.amdb is None")
mock_request = UserUpdateNameRequest.parse_raw(request.content)

db_user = self.central_user_db.get_user_by_eppn("hubba-bubba")
db_user.given_name = mock_request.given_name
db_user.surname = mock_request.surname
db_user.display_name = mock_request.display_name
self.central_user_db.save(user=db_user)
return Response(200, text='{"status": "true"}')
5 changes: 3 additions & 2 deletions src/eduid/common/clients/gnap_client/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ class GNAPClientException(Exception):


class GNAPClientAuthData(BaseModel):
authn_server_url: str
auth_server_url: str
auth_server_verify: bool = True
key_name: str
client_jwk: JWK
access: list[Union[str, Access]] = Field(default_factory=list)
Expand All @@ -51,7 +52,7 @@ class GNAPBearerTokenMixin(ABC):

@property
def transaction_uri(self) -> str:
return urlappend(self._auth_data.authn_server_url, "transaction")
return urlappend(self._auth_data.auth_server_url, "transaction")

def _create_grant_request_jws(self) -> str:
req = GrantRequest(
Expand Down
3 changes: 1 addition & 2 deletions src/eduid/common/clients/gnap_client/sync_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ def __init__(self, auth_data: GNAPClientAuthData, **kwargs):

super().__init__(**kwargs)

self.verify = kwargs.get("verify", True)
self._auth_data = auth_data

@staticmethod
Expand All @@ -34,7 +33,7 @@ def _request_bearer_token(self) -> GrantResponse:
url=self.transaction_uri,
content=data,
headers={"Content-Type": "application/jose+json"},
verify=self.verify,
verify=self._auth_data.auth_server_verify,
)
resp.raise_for_status()
return GrantResponse.parse_raw(resp.text)
Expand Down
14 changes: 9 additions & 5 deletions src/eduid/common/rpc/msg_relay.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
import logging
from enum import Enum
from typing import Optional
Expand Down Expand Up @@ -121,17 +122,20 @@ def is_deregistered(self) -> bool:
return bool(self.deregistration_information.cause_code or self.deregistration_information.date)


class NavetData(NavetModelConfig):
case_information: CaseInformation = Field(alias="CaseInformation")
person: Person = Field(alias="Person")


# Used to parse data from get_postal_address
class FullPostalAddress(NavetModelConfig):
name: Name = Field(default_factory=Name, alias="Name")
official_address: OfficialAddress = Field(default_factory=OfficialAddress, alias="OfficialAddress")


class NavetData(NavetModelConfig):
case_information: CaseInformation = Field(alias="CaseInformation")
person: Person = Field(alias="Person")

def get_full_postal_address(self) -> FullPostalAddress:
return FullPostalAddress(name=self.person.name, official_address=self.person.postal_addresses)


class MsgRelay:
"""
This is the interface to the RPC task to fetch data from NAVET, and to send SMSs.
Expand Down
36 changes: 36 additions & 0 deletions src/eduid/common/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
__author__ = "lundberg"

from eduid.common.rpc.msg_relay import FullPostalAddress
from eduid.userdb.user import TUserSubclass
import logging

logger = logging.getLogger(__name__)


def urlappend(base: str, path: str) -> str:
"""
Expand Down Expand Up @@ -40,3 +46,33 @@ def removesuffix(s: str, suffix: str) -> str:
return s[: -len(suffix)]
else:
return s[:]


def set_user_names_from_official_address(user: TUserSubclass, user_postal_address: FullPostalAddress) -> TUserSubclass:
"""
:param user: Proofing app private userdb user
:param user_postal_address: user postal address

:returns: User object
"""
user.given_name = user_postal_address.name.given_name
user.surname = user_postal_address.name.surname
if user.given_name is None or user.surname is None:
# please mypy
raise RuntimeError("No given name or surname found in proofing log user postal address")
given_name_marking = user_postal_address.name.given_name_marking
user.display_name = f"{user.given_name} {user.surname}"
if given_name_marking:
_name_index = (int(given_name_marking) // 10) - 1 # ex. "20" -> 1 (second GivenName is real given name)
try:
_given_name = user.given_name.split()[_name_index]
user.display_name = f"{_given_name} {user.surname}"
except IndexError:
# At least occasionally, we've seen GivenName 'Jan-Erik Martin' with GivenNameMarking 30
pass
logger.info("User names set from official address")
logger.debug(
f"{user_postal_address.name} resulted in given_name: {user.given_name}, "
f"surname: {user.surname} and display_name: {user.display_name}"
)
return user
14 changes: 14 additions & 0 deletions src/eduid/userdb/tests/test_user_cleaner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from unittest import TestCase

from eduid.userdb.testing import MongoTemporaryInstance
from eduid.userdb.user_cleaner.cachedb import CacheDB
from eduid.userdb.user_cleaner.cache import CacheUser


class TestUserCleanerCache(TestCase):
def setUp(self):
self.tmp_db = MongoTemporaryInstance.get_instance()
self.user_cleaner_meta_db = CacheDB(db_uri=self.tmp_db.uri, collection="skv_cache")

def tearDown(self):
self.user_cleaner_meta_db._drop_whole_collection()
44 changes: 44 additions & 0 deletions src/eduid/userdb/user_cleaner/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from typing import Any, Mapping, Optional
from pydantic import BaseModel, Field
from datetime import datetime
from eduid.common.misc.timeutil import utc_now

from eduid.userdb.db import TUserDbDocument
from eduid.userdb.user import User


class CacheUser(BaseModel):
eppn: str
created_ts: datetime = Field(default_factory=utc_now)
next_run_ts: Optional[int] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Varför är inte next_run_ts en datetime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jo, för att de är enklare att räkna epoch när det ändå handlar om sekunder.


def to_dict(self) -> TUserDbDocument:
"""
Convert Element to a dict in eduid format, that can be used to reconstruct the
Element later.
"""
data = self.dict(exclude_none=True)

return TUserDbDocument(data)

@classmethod
def from_dict(cls, data: Mapping[str, Any]) -> "CacheUser":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stället för att göra -> "CacheUser" så kan man göra from __future__ import annotations och sen använda CacheUser utan "". Tills vi kommer upp på python 3.11 då man kan köra med -> Self.

"""Convert a dict to a Element object."""
return cls(**data)

def next_run_ts_iso8601(self) -> str:
"""
Convert the next_run_ts to ISO8601 format.
"""
if self.next_run_ts is None:
return ""
return datetime.utcfromtimestamp(self.next_run_ts).strftime("%Y-%m-%d %H:%M:%S")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strftime kan bytas ut mot .isoformat()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixat!


def from_user(self, data: User) -> "CacheUser":
"""
Convert a User object to a CacheUser object.
"""
queue_user = CacheUser(
eppn=data.eppn,
)
return queue_user
73 changes: 73 additions & 0 deletions src/eduid/userdb/user_cleaner/cachedb.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
from abc import ABC
from time import time
import logging
from eduid.userdb.db import BaseDB, TUserDbDocument
from eduid.userdb.user import User
from eduid.userdb.user_cleaner.cache import CacheUser

logger = logging.getLogger(__name__)


class CacheDB(BaseDB):
"""Database class for the user cleaning database cache."""

def __init__(self, db_uri: str, collection: str, db_name: str = "eduid_user_cleaner"):
super().__init__(db_uri, db_name, collection)
indexes = {"unique-eppn": {"key": [("eppn", 1)], "unique": True}}

self.setup_indexes(indexes)

def save(self, queue_user: CacheUser) -> bool:
"""Save a CacheUser object to the database."""
if self.exists(queue_user.eppn):
logger.debug(f"User {queue_user.eppn} already exists in the queue")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"exists in the queue" -> "exists in the cache"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixat, det fanns massa ställen där det stod Queue ist. för cache.

return False
self._coll.insert_one(queue_user.to_dict())
return True

def exists(self, eppn: str) -> bool:
"""Check if a user exists in the cache."""
return self._coll.count_documents({"eppn": eppn}) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

du kan använda db_count på BaseDB med limit=1 för att slippa räkna alla dokument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixat


def get_all(self) -> list[CacheUser]:
"""Get all users from the cache."""
res = self._coll.find({})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

här kan kan du använda BaseDBs _get_all_docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixat

return [CacheUser.from_dict(data=doc) for doc in res]

def count(self) -> int:
"""Count the number of users in the queue."""
return self._coll.count_documents({})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_count finns redan på BaseDB.


def delete(self, eppn: str) -> None:
"""delete one user from the cache."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_document finns redan på BaseDB

self._coll.delete_one({"eppn": eppn})

def delete_all(self) -> None:
"""Delete all users from the cache."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_drop_whole_collection finns redan på BaseDB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_drop_whole_collection stoppade jag i delete_all() så använder appen bara databas methoder från de egna paketet. De kändes bra iaf.

self._coll.delete_many({})

def populate(
self,
am_users: list[User],
periodicity: int,
) -> None:
"""Populate cache database with the user from AMDB."""

cache_size = len(am_users)

periodicity_in_seconds = 22 * 60 * 60 * periodicity # strip 2 hours of each day in order to give us some slack
time_constant = int(periodicity_in_seconds / cache_size)

next_run_ts = int(time()) + (60 * 60) # first process window starts in 1 hour, then the population is done
for am_user in am_users:
next_run_ts += time_constant
queue_user = CacheUser(
eppn=am_user.eppn,
next_run_ts=next_run_ts,
)

self.save(queue_user)

def is_empty(self) -> bool:
"""Check if the cache is empty."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

se kommentar till exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixat

return self.count() == 0
Loading