Provenance #101

greenw0lf · 2024-09-26T12:31:21Z

Closes #81

phew

remove whitespace on empty line

greenw0lf · 2024-09-26T12:58:51Z

Maybe it'd be nice for you @mwigham to review this PR since you also worked with provenance, but for the ASR exporter. I made some choices that I'm not so sure are the best in terms of what to report and in what field of the provenance.

And I know the target branch is service-decouple and not main, but I built on top of that so I didn't want you to have to bother with reviewing those changes as well.

mwigham

Great!
I like it very much that each step returns the provenance in its output
I also really like the inclusion of config elements as parameters

Some of your choices in modelling the provenance are interesting (and I mean that in a positive way). Something for later discussion in the team.

As far as I'm concerned this is very good for the PoC. You might want to check with @jblom as to whether all the elements that he is interested in about the worker are included.

mwigham · 2024-09-27T11:33:06Z

asr.py

    else:
        logger.info(f"Whisper transcript already present in {output_path}")
+        provenance = {
+            "activity_name": "Whisper transcript already exists",


interesting point for later discussion with the team as to whether we model this as a different activity, or as a different output of the Whisper speech processing activity. There's something to be said for both

transcode.py

mwigham · 2024-09-27T11:43:44Z

transcode.py

+        end_time = (time.time() - start_time) * 1000
+        provenance["processing_time_ms"] = end_time
+        provenance["output_data"] = input_path
+        provenance["steps"].append("No transcode required, input is audio")


Here again we need to think (in the future) a bit more clearly as a team about how we want to communicate about steps that were skipped (but not failed).

mwigham · 2024-09-27T11:47:55Z

asr.py

@@ -11,50 +25,111 @@

 logger = logging.getLogger(__name__)
 os.environ["HF_HOME"] = model_base_dir  # change dir where model is downloaded
+my_version = pkg_resources.get_distribution(


This is really interesting. How does this work to get the worker version? We have struggled previously with trying to put in a Github version but we had trouble getting that into the Dockerfile.

This approach doesn't put in the Github version. It simply obtains the version from the pyproject.toml file. And also, this approach wasn't actually working, so I'm replacing it with a different one that actually reads pyproject.toml and outputs what is written in the version field.

Previous attempt did not work

jblom · 2024-10-01T13:03:38Z

@greenw0lf @mwigham ok I've globally checked the code and comments here and it seems fine to merge it to the "dane as a service branch". From there I'll also test the code a bit more thoroughly and then merge with main, so we can try it out later this week.

Add provenance

5def84b

phew

greenw0lf changed the base branch from main to 52-setup-service-decouple-from-dane September 26, 2024 12:38

greenw0lf added 2 commits September 26, 2024 14:44

Merge branch '52-setup-service-decouple-from-dane' into 81-prov

d541866

Update download.py

5b832ea

remove whitespace on empty line

greenw0lf marked this pull request as ready for review September 26, 2024 12:57

greenw0lf requested a review from mwigham September 26, 2024 12:59

greenw0lf self-assigned this Sep 26, 2024

greenw0lf added the enhancement New feature or request label Sep 26, 2024

mwigham approved these changes Sep 27, 2024

View reviewed changes

greenw0lf added 2 commits September 30, 2024 11:03

Change how version is obtained

cdce764

Previous attempt did not work

black formatting

15a0d12

jblom merged commit a4e44a2 into 52-setup-service-decouple-from-dane Oct 1, 2024
1 check passed

jblom deleted the 81-prov branch October 1, 2024 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provenance #101

Provenance #101

greenw0lf commented Sep 26, 2024

greenw0lf commented Sep 26, 2024

mwigham left a comment

mwigham Sep 27, 2024

mwigham Sep 27, 2024

mwigham Sep 27, 2024

greenw0lf Sep 30, 2024

jblom commented Oct 1, 2024

Provenance #101

Provenance #101

Conversation

greenw0lf commented Sep 26, 2024

greenw0lf commented Sep 26, 2024

mwigham left a comment

Choose a reason for hiding this comment

mwigham Sep 27, 2024

Choose a reason for hiding this comment

mwigham Sep 27, 2024

Choose a reason for hiding this comment

mwigham Sep 27, 2024

Choose a reason for hiding this comment

greenw0lf Sep 30, 2024

Choose a reason for hiding this comment

jblom commented Oct 1, 2024