Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Optimize Pipeline Configuration #96

Open
1 task done
ankeko opened this issue Dec 8, 2023 · 3 comments
Open
1 task done

[Feature]: Optimize Pipeline Configuration #96

ankeko opened this issue Dec 8, 2023 · 3 comments
Assignees
Labels
✨ feature Requests or discussions about new features 📋 needs review Requires code or design review ♻️ refactoring Related to refactoring or restructuring code 🧪 testing Related to testing or test cases

Comments

@ankeko
Copy link
Collaborator

ankeko commented Dec 8, 2023

✏️ Problem Description

Improved Config Management and Usage

💡 Feature Request

Config Requirements:

  • Default Values are available via dagster scaffolding
  • A Warning is raised (in the Launchpad), if a required value is missing
  • Each op should have its own scope
  • Each config should have its own scope
  • Configs should be inheritable
  • Configs, which are used in multiple places and ops should only be defined once
  • Env Variables should be injectable in the Configs
  • Dagster Configs should be usable for pipeline
  • Critical review when to use Env Variables and Dagster Configs
  • As a programmer, I want to be able to directly jump to the classes defined in the configs
  • As a programmer, I want to be able to rename a class or function used in configs automatically
  • All tests should run with all pipeline configs

🌍 Context

No response

🔍 Additional Information

Have a look at:

  • dagster resources
  • pydantic
  • dagster io manager

👍 Code of Conduct

  • I agree to follow this project's Code of Conduct
@ankeko ankeko added ✨ feature Requests or discussions about new features 🧪 testing Related to testing or test cases 📋 needs review Requires code or design review ♻️ refactoring Related to refactoring or restructuring code labels Dec 8, 2023
@ankeko ankeko self-assigned this Dec 8, 2023
@dstalzjohn
Copy link
Collaborator

I made a short overview what is possible in dagster.

image

For each job we have the possibility to provide a ConfigMapping. Everything else is determined from dagster. Thus, we need to start each configuration at the yaml or DefaultSchema (provided by the ConfigMapping).

Currently, we have only implemented the config_fn and the goal should be to provide a feasible schema generation.

@aiakide
Copy link
Collaborator

aiakide commented Jan 3, 2024

Todos for the next mob programming session

  • The current implementation in feature/improve_config_transparency works very well so far. However, there are some classes for which the configuration creation does not work. These are in particular the Keras classes (Callbacks, Optimizer, ...) . These classes have attributes without a type hint. Therefore, pydantic cannot create a model for these classes.

@ankeko
Copy link
Collaborator Author

ankeko commented Jan 10, 2024

Update:

Done:

  • transferred configuration from Hydra to pydantic
  • integrated OpConfigs for ops
  • Hydra will not be used for yaml configuration in the future
  • implemented class InitConfig to represent arbitrary classes in a Dagster Configuration
    • classes are accessible in the dagster webservice / UI
    • initialization of Configuration Classes via Hydra instantiate function
    • added type checking for configured instances
  • implemented Configurable class to create an InitConfig from a custom class via 'create_config' method
  • summarizing Dagster ops into graphs

Next steps:

  • implement configuration parsing for foreign classes (e.g. tensorflow classes)
  • testing and debugging Dagster webservice / UI
    • continue with niceml/config/examples/clsbinarytrainexample.py
    • test other train examples afterwards
  • discussion about class best practices
  • improve InitConfig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ feature Requests or discussions about new features 📋 needs review Requires code or design review ♻️ refactoring Related to refactoring or restructuring code 🧪 testing Related to testing or test cases
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants