Skip to content

PreprocessBuildJob is translation-specific, not generic to all engine types #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Enkidu93 opened this issue Feb 5, 2025 · 5 comments
Closed
Assignees

Comments

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Feb 5, 2025

No description provided.

@Enkidu93 Enkidu93 self-assigned this Feb 5, 2025
@Enkidu93 Enkidu93 added this to Serval Feb 5, 2025
@github-project-automation github-project-automation bot moved this to 🆕 New in Serval Feb 5, 2025
@johnml1135
Copy link
Collaborator

I would argue that there are many commonalities - that we are generally taking corpora and placing them on the S3 bucket for processing by a ClearML job. I believe we can maintain a inheritance relationship between the various preprocess build jobs.

What would be your rationale for splitting it up to be entirely separate entities?

@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Feb 6, 2025

I would argue that there are many commonalities - that we are generally taking corpora and placing them on the S3 bucket for processing by a ClearML job. I believe we can maintain a inheritance relationship between the various preprocess build jobs.

What would be your rationale for splitting it up to be entirely separate entities?

This is a place-holder issue connected to the word alignment PR. We can still keep a PrerprocessBuilldJob, it's just that currently the WriteDataFilesAsync function is overwritten in the word alignment preprocessing job since the generic PreprocessBuildJob still does things specific to pretranslation - for example, writes to the pretranslate.json files. It would be cleaner for that to be moved to a translation-specific job.

If you'd prefer, I can do this as part of the current PR.

@johnml1135
Copy link
Collaborator

Ok - good point. We can wait to do this for the next release.

@BiancaPaw
Copy link

Technical debt.

@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Mar 5, 2025

This was completed here: #506

@Enkidu93 Enkidu93 closed this as completed Mar 5, 2025
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Serval Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

No branches or pull requests

4 participants