-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional enhancement: Revisit current way of merging private data (via annotations.tsv) #65
Comments
Revisit refactoring out, adding columns Lines 100 to 101 in 9a49047
|
Just connecting to docs for "additional-metadata" here: https://github.com/nextstrain/avian-flu?tab=readme-ov-file#use-additional-metadata-andor-sequences |
Right now the WNV phylo interface defines inputs via: # Sequences must be FASTA and metadata must be TSV
# Both files must be zstd compressed
sequences_url: "https://data.nextstrain.org/files/workflows/WNV/sequences.fasta.zst"
metadata_url: "https://data.nextstrain.org/files/workflows/WNV/metadata.tsv.zst"
# Pull in metadata and sequences from the ingest workflow
input_metadata: "data/metadata.tsv"
input_sequences: "data/sequences.fasta" It's not clear to me which one is used where... looking through the code (but not running) it seems The avian-flu interface for multiple data inputs - and the one I would like to become the nextstrain standard - would be a list of dictionaries: inputs:
- name: <input name>
metadata: <local path, HTTP[S], S3>
sequences: <local path, HTTP[S], S3> These sources would all be merged (via If config overlays are used, then they can use an additional config key with the same structure: additional_inputs:
- name: <input name>
metadata: <local path, HTTP[S], S3>
sequences: <local path, HTTP[S], S3> Which is (hopefully!) self explanatory. We use this rather than |
Context
Optional future work was to revisit the method we're merging private data. Currently we're merging private information during the
ingest
workflow and incorporating private information by aannotations.tsv
file.However, since then, there has been discussion of providing a more consistent pattern of incorporating private user data:
I was personally curious about the
config.additional_inputs
method proposed in nextstrain/avian-flu#106 but was open to discussion of other methods. I understand if there are more pressing priorities, so just logging the potential future work here.Description
Examples
Possible solution
The text was updated successfully, but these errors were encountered: