api: initial implementation of headless API (Bug 1941363) #194

cgsheeh · 2025-01-13T17:07:38Z

Build out the basic functionality of the headless API for Lando.
Using django-ninja we define two API endpoints, to POST automation
jobs and GET job statuses after submission. The API endpoints take
a set of actions defined in the request body which are stored in
the database for processing by a worker. Authentication is handled by
an API key associated with a user profile. A single action, add-commit
is implemented which can be used to test adding patches to the repo
as commits.

zzzeid

Couple of very early comments:

the new api functionality would be better located in the lando.api app not the lando.main app. It could also go into a more specific headless_api app if we want to separate it from other functionality that was ported or will be ported from the old API.
I noticed some boilerplate functionality around generating / saving / getting API tokens, as well as some model fields to encrypt / decrypt / store those. Seems like potentially something that should be offloaded to the auth system (and the API framework), and not something that we should manually implement as libraries already exist that do this.
This PR should be broken up into multiple PRs (e.g., adding API functionality, adding "action" endpoints and worker functionality).

Again very early feedback based on a quick first pass.

cgsheeh · 2025-01-13T18:21:21Z

Thanks for looking it over, the PR is still a draft but I wanted to get something up to show progress.

* the new api functionality would be better located in the `lando.api` app not the `lando.main` app. It could also go into a more specific `headless_api` app if we want to separate it from other functionality that was ported or will be ported from the old API.

I had originally put the new code in lando.api but I ran into some issues with creating new models and referencing the existing models from them. In particular Django didn't like that I was trying to set a foreign key reference to the Repo model across apps. In the interest of staying focused on the headless API I moved everything into lando.main, but I will look into this further.

* I noticed some boilerplate functionality around generating / saving / getting API tokens, as well as some model fields to encrypt / decrypt / store those. Seems like potentially something that should be offloaded to the auth system (and the API framework), and not something that we should manually implement as libraries already exist that do this.

I just re-used the prior art around Phabricator API token storage/retrieval, but I haven't implemented anything for the API token generation. I was expecting to add the token management to the "settings page" where the Phab API tokens are managed.

* This PR should be broken up into multiple PRs (e.g., adding API functionality, adding "action" endpoints and worker functionality).

I will try and split out some of the changes around test fixtures, and perhaps the token generation/storage/etc could go in a separate PR. Moving the API definition out of the PR where the worker/job/etc is defined seems like it will make things harder to review since the API will do nothing but write to a DB, but I'll look into it.

shtrom

Looking good. Lots of half-baked musing in my comments. Feel free to consider or discard as you see fit (:

shtrom · 2025-01-21T05:35:08Z

src/lando/main/tests/test_automation_api.py

+    # TODO test a few more things? formatting?
+
+
+PATCH_NORMAL_1 = r"""


It might be worth grabbing this from the existing conftests (there's a normal_patch fixture that return the desired patch).

Also wondering if we should have a Git-formatted patch in the mix, too.

src/lando/main/models/profile.py

shtrom · 2025-01-21T05:41:06Z

src/lando/main/models/automation_job.py

+        AutomationJob, on_delete=models.CASCADE, related_name="actions"
+    )
+
+    action_type = models.CharField()


Would it be better as a models.ChoiceField?

We could use a ChoiceField, but the tradeoff is that we would have to do a DB migration when we add a new action type. Perhaps not a big deal, I left it as-is for the moment. Is there any advantage to the ChoiceField aside from being more explicit? :)

It's semantically clearer, and would help build forms... but we don't really need forms for this anyway, I don't think.

Depending on the implementation, I guess it may be using ENUMs in the underlying DB, which could have a marginal space and lookup advantage, but I don't think that'd be a major selling point.

I see that we can add choices to CharFields, too, e.g.

lando/src/lando/headless_api/models/automation_job.py

Lines 24 to 29 in c0f6019

# Current status of the job.

status = models.CharField(

max_length=32,

choices=LandingJobStatus,

default=None,

)

, maybe that would be enough.

Curiosity: what is the cost of a migration? My initial thought is that adding a new action would require a code change and a deployment anyway, so the migration would tag along with it.

shtrom · 2025-01-21T05:42:16Z

src/lando/main/models/automation_job.py

+class AutomationJob(BaseModel):
+    """An automation job.
+
+    TODO write better docstring


Suggested change

TODO write better docstring

TODO Write better docstring.

q:

shtrom · 2025-01-21T06:50:08Z

src/lando/api/legacy/workers/automation_worker.py

+        )
+
+    def run_automation_job(self, job: AutomationJob) -> bool:
+        """Run an automation job."""


Suggested change

"""Run an automation job."""

"""Run an automation job.

Returns True if the job is in a permanent state and should not be retried.

"""

src/lando/api/legacy/workers/automation_worker.py

shtrom · 2025-01-21T06:52:53Z

src/lando/api/legacy/workers/automation_worker.py

+                # TODO should we always update to the latest pull_path for a repo?
+                # or perhaps we need to specify some commit SHA?
+                scm.update_repo(repo.pull_path)


At the moment it cleans the repo back to a known state, and pulls the default branch.

Correction: now it creates a new branch in git. This may be a problem we want to come back to?

FWIW, update_repo accepts an optional target_cset in case we want to provide a commit SHA

lando/src/lando/main/scm/git.py

Line 272 in ff0dbd3

def update_repo(self, pull_path: str, target_cset: Optional[str] = None) -> str:

src/lando/api/legacy/workers/automation_worker.py

shtrom · 2025-01-21T06:57:44Z

src/lando/api/legacy/workers/automation_worker.py

+        patch_helper = HgPatchHelper(StringIO(action.content))
+
+        date = patch_helper.get_header("Date")
+        user = patch_helper.get_header("User")
+
+        try:


Wondering if there's a way to modularise the code from the landing worker, as this is essentially the same feature...

Maybe we could add the run() method as suggested in https://github.com/mozilla-conduit/lando/pull/194/files#r1923160292, and then just have the landing_worker also create an AddCommitAction and run it, so we have a single codepath for both.

Not request for change. Just comments.

zzzeid

A few more comments related to our discussion on tokens.

zzzeid · 2025-01-21T16:53:43Z

src/lando/main/api.py

+class HeadlessAPIAuthentication(HttpBearer):
+    """Authentication class to verify API token."""
+
+    def authenticate(self, request, token: str) -> str:


As we need a user name, I wonder if we'd be better off with Basic Auth, so we have a standard carrier for user+secret, rather than piggy-backing on the User-Agent (or other) header to pass the user identifier.

I agree with this. We can require a user-agent field, but we shouldn't use it as a username. At that point, we basically do have a username / password basic authentication.

However, if we want to continue with this username + token approach (which would have the added benefit of being able to manage the token more easily, than managing a user's password), I think it would make sense to create an api_token table to store these tokens in, and we can have a foreign key to a user profile. This is if we don't want to use any third party libraries.

src/lando/main/api.py

zzzeid · 2025-01-21T16:59:15Z

src/lando/main/api.py

+        # some APIs may have authentication without user management. Our
+        # API tokens always correspond to a specific user, so set that on
+        # the request here.
+        request.user = user


Wondering if we should be using something like django.auth.login here to properly mark the user as authenticated. I.e., request.user.is_authenticated should be True.

I have yet to look into this but it is on my radar.

src/lando/api/legacy/workers/automation_worker.py

zzzeid · 2025-01-21T17:01:57Z

src/lando/api/legacy/workers/automation_worker.py

+            return
+
+        with job_processing(job):
+            job.status = LandingJobStatus.IN_PROGRESS


Is this a copy/paste error?

What do you mean by "this"?

src/lando/main/api.py

src/lando/main/models/profile.py

src/lando/main/api.py

Build out the basic functionality of the headless API for Lando. Using django-ninja we define two API endpoints, to POST automation jobs and GET job statuses after submission. The API endpoints take a set of `actions` defined in the request body which are stored in the database for processing by a worker. Authentication is handled by an API key associated with a user profile. A single action, `add-commit` is implemented which can be used to test adding patches to the repo as commits.

shtrom · 2025-03-06T05:49:21Z

src/lando/headless_api/api.py

+    commit: str
+
+
+Action = Union[AddCommitAction, MergeOntoAction, AddBranchAction, TagAction]


We could make Action a parent class that the others inherit from, with a process @abstractmethod.

This would also allow to make map_to_pydantic_action more programatic, by looping over Action.__subclasses__() and mapping klass.action -> klass when building the dict. This means we won't need to update map_to_pydantic_action if adding more actions.

src/lando/api/legacy/workers/automation_worker.py

shtrom · 2025-03-06T06:02:47Z

src/lando/api/legacy/workers/automation_worker.py

+                # TODO should we always update to the latest pull_path for a repo?
+                # or perhaps we need to specify some commit SHA?
+                scm.update_repo(repo.pull_path)


FWIW, update_repo accepts an optional target_cset in case we want to provide a commit SHA

lando/src/lando/main/scm/git.py

Line 272 in ff0dbd3

def update_repo(self, pull_path: str, target_cset: Optional[str] = None) -> str:

shtrom · 2025-03-06T06:14:41Z

src/lando/headless_api/api.py

+        user_agent = request.headers.get("User-Agent")
+        if not user_agent:
+            raise APIPermissionDenied("`User-Agent` header is required.")


Do we still need this?

shtrom · 2025-03-06T06:16:52Z

src/lando/headless_api/api.py

+    action: Literal["add-commit"]
+    content: str
+
+    def process(
+        self, job: AutomationJob, repo: Repo, scm: AbstractSCM, index: int
+    ) -> bool:


🤔 I'm now thinking we could just use this in the normal Landing Worker.

Rather than using the HgPatchHelper directly, here, I'd suggest using Revision.new_from_patch

lando/src/lando/main/models/revision.py

Lines 87 to 100 in ff0dbd3

@classmethod

def new_from_patch(cls, raw_diff: str, patch_data: dict[str, str]) -> Revision:

"""Construct a new Revision from patch data.

`patch_data` is expected to contain the following keys:

- author_name

- author_email

- commit_message

- timestamp (unix timestamp as a string)

"""

rev = Revision()

rev.set_patch(raw_diff, patch_data)

rev.save()

return rev

, so we can then just use the revision object to extract the patch details, like the LandingWorker now does

lando/src/lando/api/legacy/workers/landing_worker.py

Lines 256 to 261 in ff0dbd3

scm.apply_patch(

revision.diff,

revision.commit_message,

revision.author,

revision.timestamp,

)

...

However, Revision.new_from_patch does a save to the DB, which I don't think we want here... We may need to rejig that a little bit if we want to go down that path.

shtrom · 2025-03-06T06:23:35Z

src/lando/headless_api/api.py

+class MergeOntoAction(Schema):
+    """Merge the current branch into the target commit."""
+
+    action: Literal["merge-onto"]
+    target: str
+    message: str
+
+
+class TagAction(Schema):
+    """Create a new tag with the given name."""
+
+    action: Literal["tag"]
+    name: str
+
+
+class AddBranchAction(Schema):
+    """Create a new branch at the given commit."""
+
+    action: Literal["add-branch"]
+    name: str
+    commit: str


I assume the process implementation is yet to be added?

shtrom · 2025-03-06T06:52:33Z

src/lando/headless_api/tests/test_automation_api.py

+@pytest.mark.django_db
+def test_auth_missing_user_agent(client, headless_user):
+    user, token = headless_user
+
+    # Create a job and actions
+    job = AutomationJob.objects.create(status=LandingJobStatus.SUBMITTED)
+    AutomationAction.objects.create(
+        job_id=job, action_type="add-commit", data={"content": "test"}, order=0
+    )
+
+    # Fetch job status.
+    response = client.get(
+        f"/api/job/{job.id}",
+        headers={
+            "Authorization": f"Bearer {token}",
+        },
+    )
+
+    assert response.status_code == 401, "Missing `User-Agent` should result in 401."
+    assert response.json() == {"details": "`User-Agent` header is required."}


I don't think we want to enforce this anymore.

shtrom · 2025-03-06T07:00:27Z

src/lando/headless_api/tests/test_automation_api.py

+PATCH_NORMAL_1 = r"""
+# HG changeset patch
+# User Test User <[email protected]>
+# Date 0 0
+#      Thu Jan 01 00:00:00 1970 +0000
+# Diff Start Line 7
+add another file.
+diff --git a/test.txt b/test.txt
+--- a/test.txt
+++ b/test.txt
+@@ -1,1 +1,2 @@
+ TEST
+adding another line
+""".strip()


We can use the normal_patch fixture https://github.com/mozilla-conduit/lando/blob/ff0dbd3c06af3830975e3f3e70f6949b9c964938/src/lando/api/tests/conftest.py#L131-L143to get this one out, rather than duplicating it.

shtrom · 2025-03-06T07:02:02Z

src/lando/headless_api/tests/test_automation_api.py

+
+@pytest.mark.django_db
+def test_automation_job_add_commit_success(
+    hg_server, hg_clone, hg_automation_worker, repo_mc, monkeypatch


Might be worth adding test for git repos upfront, too.

zzzeid requested changes Jan 13, 2025

View reviewed changes

shtrom previously requested changes Jan 21, 2025

View reviewed changes

zzzeid requested changes Jan 21, 2025

View reviewed changes

shtrom reviewed Jan 22, 2025

View reviewed changes

src/lando/main/api.py Outdated Show resolved Hide resolved

shtrom mentioned this pull request Jan 23, 2025

landing_worker: add and use metadata from Revision rather than HgPatchHelper (bug 1936171) #200

Merged

cgsheeh force-pushed the headless-api branch from f1433a1 to c0f6019 Compare March 6, 2025 00:48

cgsheeh marked this pull request as ready for review March 6, 2025 00:48

cgsheeh requested review from shtrom and zzzeid March 6, 2025 00:48

shtrom reviewed Mar 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api: initial implementation of headless API (Bug 1941363) #194

api: initial implementation of headless API (Bug 1941363) #194

cgsheeh commented Jan 13, 2025

zzzeid left a comment

cgsheeh commented Jan 13, 2025

shtrom left a comment

shtrom Jan 21, 2025

shtrom Jan 21, 2025

shtrom Jan 21, 2025

cgsheeh Mar 5, 2025

shtrom Mar 6, 2025 •

edited

Loading

shtrom Jan 21, 2025

shtrom Jan 21, 2025 •

edited

Loading

shtrom Jan 21, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Jan 21, 2025

shtrom Jan 23, 2025

zzzeid left a comment

zzzeid Jan 21, 2025

zzzeid Jan 21, 2025

cgsheeh Mar 6, 2025

zzzeid Jan 21, 2025

cgsheeh Mar 4, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

shtrom Mar 6, 2025

		# TODO test a few more things? formatting?


		PATCH_NORMAL_1 = r"""

	# Current status of the job.
	status = models.CharField(
	max_length=32,
	choices=LandingJobStatus,
	default=None,
	)

		commit: str


		Action = Union[AddCommitAction, MergeOntoAction, AddBranchAction, TagAction]

	@classmethod
	def new_from_patch(cls, raw_diff: str, patch_data: dict[str, str]) -> Revision:
	"""Construct a new Revision from patch data.

	`patch_data` is expected to contain the following keys:
	- author_name
	- author_email
	- commit_message
	- timestamp (unix timestamp as a string)
	"""
	rev = Revision()
	rev.set_patch(raw_diff, patch_data)
	rev.save()
	return rev

	scm.apply_patch(
	revision.diff,
	revision.commit_message,
	revision.author,
	revision.timestamp,
	)

api: initial implementation of headless API (Bug 1941363) #194

Are you sure you want to change the base?

api: initial implementation of headless API (Bug 1941363) #194

Conversation

cgsheeh commented Jan 13, 2025

zzzeid left a comment

Choose a reason for hiding this comment

cgsheeh commented Jan 13, 2025

shtrom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shtrom Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shtrom Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zzzeid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shtrom Mar 6, 2025 •

edited

Loading

shtrom Jan 21, 2025 •

edited

Loading