Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing public ETL steps to depend on private steps #3192

Open
Marigold opened this issue Aug 26, 2024 · 3 comments
Open

Allowing public ETL steps to depend on private steps #3192

Marigold opened this issue Aug 26, 2024 · 3 comments

Comments

@Marigold
Copy link
Collaborator

We have a single case where public dataset (data://garden/covid/latest/combined and hence our full covid dataset) depends on private dataset data-private://garden/covid/latest/sequence.

data://garden/covid/latest/combined:
    - data://garden/covid/latest/testing
    - data://garden/covid/latest/cases_deaths
    - data-private://garden/covid/latest/sequence
    - data://garden/demography/2024-07-15/population

An error is raised when you try to run ETL without using --private flag. So running full ETL etl run fails with

ValueError: Public step data://garden/covid/latest/combined depends on private step data-private://garden/covid/latest/sequence. Use --private flag.

This is a bit annoying as we have to exclude covid dataset from running in nightly builds. It'd also be confusing for anyone trying to build it.

Should we exclude steps depending on private steps by default and raise a warning instead of failing?

@pabloarosado
Copy link
Contributor

@lucasrodes why isdata-private://garden/covid/latest/sequence private? Maybe the solution would be to make it public (given that it's used by a public step).

@lucasrodes
Copy link
Member

lucasrodes commented Sep 5, 2024

hi @pabloarosado

why isdata-private://garden/covid/latest/sequence private?

It must be private, as requested by the data provider since they have a very restrictive license. That's GISAID.

Maybe the solution would be to make it public (given that it's used by a public step).

That's not possible; we cannot share this data publicly. The data://garden/covid/latest/combined processes and aggregates a private indicator to compute a ratio ー that's fine as public.

Copy link

stale bot commented Nov 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Nov 5, 2024
@stale stale bot closed this as completed Nov 16, 2024
@lucasrodes lucasrodes reopened this Feb 3, 2025
@stale stale bot closed this as completed Feb 11, 2025
@lucasrodes lucasrodes reopened this Feb 11, 2025
@lucasrodes lucasrodes removed the wontfix This will not be worked on label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants