-
Notifications
You must be signed in to change notification settings - Fork 725
Possible feature and bugfix contributions from Microsoft research team's fork of Metaseq #726
Comments
@mattmazzola Sorry for delay - I've been on PTO; would be interested in all of the above contributions as they come online (deferring to you on what the best ordering here would be)! |
Ok! I will talk with rest of team and see what we want to do. We are trying to roll off our current work and transition to another project so it is not clear how much time we be able to spend these contributions. This creates a kind of trade-off / conflict between wanting the larger items for impact, but smaller items for less commitment.
These fixes and features from our fork has some non-trivial divergence from metaseq main so it's less easy to judge how much work until we see how many merge conflicts there are. It also makes testing difficult or not possible since our infrastructure was using different dependency set running Azure Machine Learning environment. The list above was an ordered by estimate of how impactful the PR contributions would be to Metaseq; however, given the difficulties I was trying to create PRs with inverse order to increase likelihood they merge. I think I may be able to at least submit PRs to share the ideas, but they may not be directly mergeable. |
That makes a lot of sense - feel free to open up PRs in whatever state you have them; they will be a useful starting point for figuring out how to merge / test them and pull into main over time. |
I have created PRs for all of the items on the initial issue list (except for item 4) and referenced this issue. |
We are a team at @microsoft Research that has a fork Metaseq repo with these additional features:
jsonl_dataset.py#_build_index
properly accounts for multi-byte characters.Questions
We would be happy to answer any questions you have about the above components.
@tupini07
The text was updated successfully, but these errors were encountered: