-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new feature janna-coalesce or janno-join #278
Comments
A suggestion for the syntax: trident jannocoalesce \
--targetFile file/that/should/be/filled.janno \
--sourceFile file/that/should/be/queried.janno \
--outFile new/completed/file.janno \
--fillColumns "Country,Latitude,Longitude,..." \ # (default: All)
--overwriteColumns # default: False
|
Very nice! I have some minor comments, but can be discussed after a first go. I hope to be able to get to it this week. |
Just some quick ideas:
|
These are good observations! 1. is a neat idea and 2. is indeed what I had in mind. For 3. I think we should not validate the output. There might be workflows where the user actually does not need a valid .janno file in the end. And for pipelines and automation it's probably better to keep the two steps clearly separate for error reporting. |
I close this now, because the discussion has moved to the concrete PR implementing the feature: #282 |
We decided that a new feature is needed in trident, to merge janno files. The new command, e.g. named
trident janno-coalesce
would take a source-package and a target-package, match rows on the basis on an ID match (which could by default be using the PoseidonIDs in both packages, but alternatively would allow other janno-columns in the first and second file, similarly to other join-operation functions, e.g. in the tidyverse). It would then fill any fields missing in the target but filled in the source, and perhaps report warnings for conflicting information in the two janno files.The text was updated successfully, but these errors were encountered: