Updates to handle SILNLP consistency run

This idea is detailed [here](https://docs.google.com/document/d/1_Hqgaqg6Pu_L87cn2zEie5VoZEjzTPuxd_xxRz4w4ss/edit?tab=t.0).

After the jobs are confirmed to work on SILNLP, they are moved to SF/Serval.  To ensure that the quality has not decreased in the transition (or with successive builds), there need to be some updates to Serval/Machine.py.

The overall proposal would be that when the final configuration is determined by EITL, SILNLP would perform a run and output in a Json file:
* A set of verses for validation
* A description of the training setup (how to handle mixes source and random mixing? - a list of verses per source?  Would that even be doing a real comparison?)
* The bleu score (or set of bleu scores per book/verse, etc., or CHRF++, etc.)

Then, this Json file would be uploaded to SF.  SF would then take the validation split verses and create an engine in Serval to be used for performing this type of evaluation run.  When a new build is made, SF would specify the validation verses explicitly (and early stopping or a number of training steps if desired).  Serval would perform the build and return the Bleu CHRF++ scores per validation verses.  

SF would take these scores and compare them against the known good set and send an email (or other type of alert) to the EITL team to address the issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Updates to handle SILNLP consistency run #557

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Updates to handle SILNLP consistency run #557

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions