Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate project and vlab #19

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Validate project and vlab #19

wants to merge 10 commits into from

Conversation

BoBer78
Copy link
Contributor

@BoBer78 BoBer78 commented Oct 10, 2024

No description provided.

Comment on lines +26 to +27
vlab_id = Column(String, default=None)
project_id = Column(String, default=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the default=None do anything? AFAIK by default nullable=True

https://docs.sqlalchemy.org/en/20/core/metadata.html#sqlalchemy.schema.Column.params.nullable

But should we allow for null values? I would even go for nullable=False which will force us to always provide it

"vlab_id": request.headers["x-virtual-lab-id"],
"project_id": request.headers["x-project-id"],
}
elif settings.keycloak.validate_token:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best logic to be in "dev" mode

Copy link
Collaborator

@jankrepl jankrepl Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, I personally don't mind passing the headers manually even when developing. But if it simplifies your life then sure.

Also, the P + V ids should not be leaking anything on their own so I guess it is fine:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be if not settings.keycloak.validate_token: ? Here we return default vlab and proj if keycloak security is active 😅

Copy link
Contributor

@WonderPG WonderPG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of conceptual questions but very good MR, thank you !

Comment on lines +397 to +404
def thread_to_vp(
user_id: Annotated[str, Depends(get_user_id)],
session: Annotated[Session, Depends(get_session)],
request: Request,
settings: Annotated[Settings, Depends(get_settings)],
) -> dict[str, str]:
"""From the current thread, get the corresponding vlab and project."""
if "x-project-id" in request.headers and "x-virtual-lab-id" in request.headers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing but the name and description of the dependency is a bit misleading since the V + P can come from the headers and not only from the thread.

"project_id": "eff09ea1-be16-47f0-91b6-52a3ea3ee575",
}
else:
thread_id = str(request.url).split("/")[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a conceptual question, do we want to validate_project only before running the agent ? For instance we might need to get the project_id for every endpoint to do accounting at some point (I don't know if we want to track CRUD requests too), the issue is that this wouldn't give the thread_id for endpoints with more path parameters.

For now I would say no actions to take but maybe something to keep in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smaller thing, if we run the one shot endpoint and forget the headers we get an error 500.

Comment on lines +26 to +27
vlab_id = Column(String, default=None)
project_id = Column(String, default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since V + P should always be associated to a thread and are unique ids, shouldn't they be primary keys ? @jankrepl you're the sql boss here 😄

Copy link
Collaborator

@jankrepl jankrepl Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I think calling the ticket 1-to-1 mapping was wrong.

A given T will uniquely determine the P + V. However, given P + V there can be multiple Ts.
So thread id is the real "identifier" of this table. However, I would just simply force the user to provide V + P (nullable=False) to avoid having "phantom" threads:D. See my other comment.

Comment on lines +416 to +421
query = (
select(Threads)
.where(Threads.user_sub == user_id)
.where(Threads.thread_id == thread_id)
)
result = session.execute(query).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user_id and thread_id are primary key, so their combination uniquely define an entry in the db. Meaning that:

thread = session.get(Threads, (thread_id, user_id))

should directly get you the expected thread.

Another solution would be to migrate get_object in sql.py into the dependencies to avoid cyclic imports, but I believe there is a reason if it is in a separate file. I don't remember which one, but most likely some cyclic import issues as well.

Comment on lines +435 to +454
async def validate_project(
httpx_client: Annotated[AsyncClient, Depends(get_httpx_client)],
thread_to_vp: Annotated[dict[str, str], Depends(thread_to_vp)],
token: Annotated[str, Depends(get_kg_token)],
settings: Annotated[Settings, Depends(get_settings)],
) -> str:
"""Check user appartenance to vlab and project before running agent."""
response = await httpx_client.get(
f'{settings.virtual_lab.get_project_url}/{thread_to_vp["vlab_id"]}/projects/{thread_to_vp["project_id"]}',
headers={"Authorization": f"Bearer {token}"},
)

# There is a lot of information about the project in this response.
if response.status_code != 200:
raise HTTPException(
status_code=HTTP_401_UNAUTHORIZED,
detail="User does not belong to the project.",
)
else:
return thread_to_vp["project_id"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you telling me that this function might be reused somewhere else once we start playing more with V + P, but in a way we will always have to first get V + P before being able to validate it. I am wondering if it is not better to remove this dependency and put the logic inside of thread_to_vp at the end, since I expect that we will always have to run thread_to_vp to get V + P from now on. i.e. we will never call validate_project without calling thread_to_vp first since it handles headers too.

Also it feels a bit like an anti-pattern to take as input V + P and return P. imo
need V + P -> call thread_to_vp with validation is better than need V + P -> call validate_project -> call thread_to_vp -> validate V + P but feel free to object.

Last thing the tools (at least the one I am doing) will need project_id and vlab_id, not just project_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that the function validate_project would be used in the create_threads function. So another proposition would be to not make it a dependency, rather just a normal function. Then we call this normal function in thread_to_vp (which is a dependency) and inside the create_thread endpoint. What do you think ? If we don't want to mix dependencies and non dependencies, it could be moved to some utils file if needed.

"vlab_id": request.headers["x-virtual-lab-id"],
"project_id": request.headers["x-project-id"],
}
elif settings.keycloak.validate_token:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be if not settings.keycloak.validate_token: ? Here we return default vlab and proj if keycloak security is active 😅

create_output = app_client.post("/threads/").json()
create_output = app_client.post(
"/threads/?virtual_lab_id=test_vlab&project_id=test_project"
).json()
assert create_output["thread_id"]
assert create_output["user_sub"] == "dev"
assert create_output["title"] == "title"
assert create_output["timestamp"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe

assert create_output["vlab_id"] = "test_vlab"
assert create_output["project_id"] = "test_project"

for completeness

assert create_output["thread_id"]
assert create_output["user_sub"] == "dev"
assert create_output["title"] == "title"
assert create_output["timestamp"]


def test_get_threads(patch_required_env, app_client, db_connection):
@pytest.mark.httpx_mock(can_send_already_matched_responses=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this decorator do ?

}
)
with pytest.raises(HTTPException) as error:
_ = thread_to_vp(user_id, session, bad_request, test_settings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for _ =

status_code=404,
)
with pytest.raises(HTTPException) as error:
_ = await validate_project(httpx_client, test_vp, token, test_settings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants