Releases: dagster-io/dagster
0.7.12
Bugfix
- We now only render the subset of an execution plan that has actually executed, and persist that subset information along with the snapshot.
- @pipeline and @composite_solid now correctly capture doc from the function they decorate.
- Fixed a bug with using solid subsets in the Dagit playground
0.7.11
0.7.11
Bugfix
- Fixed an issue with strict snapshot ID matching when loading historical snapshots, which caused
errors on the Runs page when viewing historical runs. - Fixed an issue where
dagster_celery
had introduced a spurious dependency ondagster_k8s
(#2435) - Fixed an issue where our Airflow, Celery, and Dask integrations required S3 or GCS storage and
prevented use of filesystem storage. Filesystem storage is now also permitted, to enable use of
these integrations with distributed filesystems like NFS (#2436).
0.7.10
New
RepositoryDefinition
now takesschedule_defs
andpartition_set_defs
directly. The loading
scheme for these definitions viarepository.yaml
under thescheduler:
andpartitions:
keys
is deprecated and expected to be removed in 0.8.0.- Mark published modules as python 3.8 compatible.
- The dagster-airflow package supports loading all Airflow DAGs within a directory path, file path,
or Airflow DagBag. - The dagster-airflow package supports loading all 23 DAGs in Airflow example_dags folder and
execution of 17 of them (see:make_dagster_repo_from_airflow_example_dags
). - The dagster-celery CLI tools now allow you to pass additional arguments through to the underlying
celery CLI, e.g., runningdagster-celery worker start -n my-worker -- --uid=42
will pass the
--uid
flag to celery. - It is now possible to create a
PresetDefinition
that has no environment defined. - Added
dagster schedule debug
command to help debug scheduler state. - The
SystemCronScheduler
now verifies that a cron job has been successfully been added to the
crontab when turning a schedule on, and shows an error message if unsuccessful.
Breaking Changes
- A
dagster instance migrate
is required for this release to support the new experimental assets
view. - Runs created prior to 0.7.8 will no longer render their execution plans as DAGs. We are only
rendering execution plans that have been persisted. Logs are still available. Path
is no longer valid in config schemas. Usestr
ordagster.String
instead.- Removed the
@pyspark_solid
decorator - its functionality, which was experimental, is subsumed by
requiring a StepLauncher resource (e.g. emr_pyspark_step_launcher) on the solid.
Dagit
- Merged "re-execute", "single-step re-execute", "resume/retry" buttons into one "re-execute" button
with three dropdown selections on the Run page.
Experimental
- Added new
asset_key
string parameter to Materializations and created a new “Assets” tab in Dagit
to view pipelines and runs associated with these keys. The API and UI of these asset-based are
likely to change, but feedback is welcome and will be used to inform these changes. - Added an
emr_pyspark_step_launcher
that enables launching PySpark solids in EMR. The
"simple_pyspark" example demonstrates how it’s used.
Bugfix
- Fixed an issue when running Jupyter notebooks in a Python 2 kernel through dagstermill with dagster
running in Python 3. - Improved error messages produced when dagstermill spins up an in-notebook context.
- Fixed an issue with retrieving step events from
CompositeSolidResult
objects.
0.7.9
Breaking Changes
- If you are launching runs using
DagsterInstance.launch_run
, this method now takes a run id instead of an instance ofPipelineRun
. Additionally,DagsterInstance.create_run
andDagsterInstance.create_empty_run
have been replaced byDagsterInstance.get_or_create_run
andDagsterInstance.create_run_for_pipeline
. - If you have implemented your own
RunLauncher
, there are two required changes:RunLauncher.launch_run
takes a pipeline run that has already been created. You should remove any calls toinstance.create_run
in this method.- Instead of calling
startPipelineExecution
(defined in thedagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION
) in the run launcher, you should callstartPipelineExecutionForCreatedRun
(defined in dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION` - Refer to the
RemoteDagitRunLauncher
for an example implementation.
New
- Improvements to preset and solid subselection in the playground. An inline preview of the pipeline instead of a modal when doing subselection, and the correct subselection is chosen when selecting a preset.
- Improvements to the log searching. Tokenization and autocompletion for searching messages types and for specific steps.
- You can now view the structure of pipelines from historical runs, even if that pipeline no longer exists in the loaded repository or has changed structure.
- Historical execution plans are now viewable, even if the pipeline has changed structure.
- Added metadata link to raw compute logs for all StepStart events in PipelineRun view and Step view.
- Improved error handling for the scheduler. If a scheduled run has config errors, the errors are persisted to the event log for the run and can be viewed in Dagit.
Bugfix
- No longer manually dispose sqlalchemy engine in dagster-postgres
- Made boto3 dependency in dagster-aws more flexible (#2418)
- Fixed tooltip UI cleanup in partitioned schedule view
Documentation
- Brand new documentation site, available at https://docs.dagster.io
- The tutorial has been restructured to multiple sections, and the examples in intro_tutorial have been rearranged to separate folders to reflect this.
0.7.8
Breaking Changes
- The
execute_pipeline_with_mode
andexecute_pipeline_with_preset
APIs have been dropped in
favor of new top level arguments toexecute_pipeline
,mode
andpreset
. - The use of
RunConfig
to pass options toexecute_pipeline
has been deprecated, andRunConfig
will be removed in 0.8.0. - The
execute_solid_within_pipeline
andexecute_solids_within_pipeline
APIs, intended to support
tests, now take new top level argumentsmode
andpreset
.
New
- The dagster-aws Redshift resource now supports providing an error callback to debug failed
queries. - We now persist serialized execution plans for historical runs. They will render correctly even if
the pipeline structure has changed or if it does not exist in the current loaded repository. - Clicking on a pipeline tag in the
Runs
view will apply that tag as a filter.
Bugfix
- Fixed a bug where telemetry logger would create a log file (but not write any logs) even when
telemetry was disabled.
Experimental
- The dagster-airflow package supports ingesting Airflow dags and running them as dagster pipelines
(see:make_dagster_pipeline_from_airflow_dag
). This is in the early experimentation phase. - Improved the layout of the experimental partition runs table on the
Schedules
detailed view.
Documentation
- Fixed a grammatical error (Thanks @flowersw!)
0.7.7
Breaking Changes
-
The default sqlite and
dagster-postgres
implementations have been altered to extract the
eventstep_key
field as a column, to enable faster per-step queries. You will need to run
dagster instance migrate
to update the schema. You may optionally migrate your historical event
log data to extract thestep_key
using themigrate_event_log_data
function. This will ensure
that your historical event log data will be captured in future step-key based views. This
event_log
data migration can be invoked as follows:from dagster.core.storage.event_log.migration import migrate_event_log_data from dagster import DagsterInstance migrate_event_log_data(instance=DagsterInstance.get())
-
We have made pipeline metadata serializable and persist that along with run information.
While there are no user-facing features to leverage this yet, it does require an instance migration.
dagster instance migrate
. If you have already run the migration for theevent_log
changes
above, you do not need to run it again. Any unforeseen errors related the the newsnapshot_id
in theruns
table or the newsnapshots
table are related to this migration. -
dagster-pandas
ColumnTypeConstraint
has been removed in favor ofColumnDTypeFnConstraint
and
ColumnDTypeInSetConstraint
.
New
- You can now specify that dagstermill output notebooks be yielded as an output from dagstermill
solids, in addition to being materialized. - You may now set the extension on files created using the
FileManager
machinery. - dagster-pandas typed
PandasColumn
constructors now support pandas 1.0 dtypes. - The Dagit Playground has been restructured to make the relationship between Preset, Partition
Sets, Modes, and subsets more clear. All of these buttons have be reconciled and moved to the
left side of the Playground. - Config sections that are required but not filled out in the Dagit playground are now detected
and labeled in orange. - dagster-celery config now support using
env:
to load from environment variables.
Bugfix
- Fixed a bug where selecting a preset in
dagit
would not populate tags specified on the pipeline
definition. - Fixed a bug where metadata attached to a raised
Failure
was not displayed in the error modal in
dagit
. - Fixed an issue where reimporting dagstermill and calling
dagstermill.get_context()
outside of
the parameters cell of a dagstermill notebook could lead to unexpected behavior. - Fixed an issue with connection pooling in dagster-postgres, improving responsiveness when using
the Postgres-backed storages.
Experimental
- Added a longitudinal view of runs for on the
Schedule
tab for scheduled, partitioned pipelines.
Includes views of run status, execution time, and materializations across partitions. The UI is
in flux and is currently optimized for daily schedules, but feedback is welcome.
0.7.6
Breaking Changes
default_value
inField
no longer accepts native instances of python enums. Instead
the underlying string representation in the config system must be used.default_value
inField
no longer accepts callables.- The
dagster_aws
imports have been reorganized; you should now import resources from
dagster_aws.<AWS service name>
.dagster_aws
providess3
,emr
,redshift
, andcloudwatch
modules. - The
dagster_aws
S3 resource no longer attempts to model the underlying boto3 API, and you can
now just use any boto3 S3 API directly on a S3 resource, e.g.
context.resources.s3.list_objects_v2
. (#2292)
New
- New
Playground
view indagit
showing an interactive config map - Improved storage and UI for showing schedule attempts
- Added the ability to set default values in
InputDefinition
- Added CLI command
dagster pipeline launch
to launch runs using a configuredRunLauncher
- Added ability to specify pipeline run tags using the CLI
- Added a
pdb
utility toSolidExecutionContext
to help with debugging, available within a solid ascontext.pdb
- Added
PresetDefinition.with_additional_config
to allow for config overrides - Added resource name to log messages generated during resource initialization
- Added grouping tags for runs that have been retried / reexecuted.
Bugfix
- Fixed a bug where date range partitions with a specified end date was clipping the last day
- Fixed an issue where some schedule attempts that failed to start would be marked running forever.
- Fixed the
@weekly
partitioned schedule decorator - Fixed timezone inconsistencies between the runs view and the schedules view
- Integers are now accepted as valid values for Float config fields
- Fixed an issue when executing dagstermill solids with config that contained quote characters.
dagstermill
- The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel flag.
dagster-dbt
dbt_solid
now has aNothing
input to allow for sequencing
dagster-k8s
- Added
get_celery_engine_config
to select celery engine, leveraging Celery infrastructure
Documentation
- Improvements to the airline and bay bikes demos
- Improvements to our dask deployment docs (Thanks jswaney!!)
0.7.5
New
-
Added the
IntSource
type, which lets integers be set from environment variables in config. -
You may now set tags on pipeline definitions. These will resolve in the following cases:
- Loading in the playground view in Dagit will pre-populate the tag container.
- Loading partition sets from the preset/config picker will pre-populate the tag container with
the union of pipeline tags and partition tags, with partition tags taking precedence. - Executing from the CLI will generate runs with the pipeline tags.
- Executing programmatically using the
execute_pipeline
api will create a run with the union
of pipeline tags andRunConfig
tags, withRunConfig
tags taking precedence. - Scheduled runs (both launched and executed) will have the union of pipeline tags and the
schedule tags function, with the schedule tags taking precedence.
-
Output materialization configs may now yield multiple Materializations, and the tutorial has
been updated to reflect this. -
We now export the
SolidExecutionContext
in the public API so that users can correctly type hint
solid compute functions.
Dagit
- Pipeline run tags are now preserved when resuming/retrying from Dagit.
- Scheduled run stats are now grouped by partition.
- A "preparing" section has been added to the execution viewer. This shows steps that are in
progress of starting execution. - Markers emitted by the underlying execution engines are now visualized in the Dagit execution
timeline.
Bugfix
- Resume/retry now works as expected in the presence of solids that yield optional outputs.
- Fixed an issue where dagster-celery workers were failing to start in the presence of config
values that wereNone
. - Fixed an issue with attempting to set
threads_per_worker
on Dask distributed clusters.
dagster-postgres
- All postgres config may now be set using environment variables in config.
dagster-aws
- The
s3_resource
now exposes alist_objects_v2
method corresponding to the underlying boto3
API. (Thanks, @basilvetas!) - Added the
redshift_resource
to access Redshift databases.
dagster-k8s
- The
K8sRunLauncher
config now includes theload_kubeconfig
andkubeconfig_file
options.
Documentation
- Fixes and improvements.
Dependencies
- dagster-airflow no longer pins its werkzeug dependency.
Community
-
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform
development priorities. Telemetry data will motivate projects such as adding features in
frequently-used parts of the CLI and adding more examples in the docs in areas where users
encounter more errors.We will not see or store solid definitions (including generated context) or pipeline definitions
(including modes and resources). We will not see or store any data that is processed within solids
and pipelines.If you'd like to opt in to telemetry, please add the following to
$DAGSTER_HOME/dagster.yaml
:telemetry: enabled: true
-
Thanks to @basilvetas and @hspak for their contributions!
0.7.4
New
- It is now possible to use Postgres to back schedule storage by configuring
dagster_postgres.PostgresScheduleStorage
on the instance. - Added the
execute_pipeline_with_mode
API to allow executing a pipeline in test with a specific
mode without having to specifyRunConfig
. - Experimental support for retries in the Celery executor.
- It is now possible to set run-level priorities for backfills run using the Celery executor by
passing--celery-base-priority
todagster pipeline backfill
. - Added the
@weekly
schedule decorator.
Deprecations
- The
dagster-ge
library has been removed from this release due to drift from the underlying
Great Expectations implementation.
dagster-pandas
PandasColumn
now includes anis_optional
flag, replacing the previous
ColumnExistsConstraint
.- You can now pass the
ignore_missing_values flag
toPandasColumn
in order to apply column
constraints only to the non-missing rows in a column.
dagster-k8s
- The Helm chart now includes provision for an Ingress and for multiple Celery queues.
Documentation
- Improvements and fixes.
0.7.3
New
- It is now possible to configure a dagit instance to disable executing pipeline runs in a local
subprocess. - Resource initialization, teardown, and associated failure states now emit structured events
visible in Dagit. Structured events for pipeline errors and multiprocess execution have been
consolidated and rationalized. - Support Redis queue provider in
dagster-k8s
Helm chart. - Support external postgresql in
dagster-k8s
Helm chart.
Bugfix
- Fixed an issue with inaccurate timings on some resource initializations.
- Fixed an issue that could cause the multiprocess engine to spin forever.
- Fixed an issue with default value resolution when a config value was set using
SourceString
. - Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
- Fixed an issue with where the CLI command
dagster schedule up
would fail in certain scenarios
with theSystemCronScheduler
.
Pandas
- Column constraints can now be configured to permit NaN values.
Dagstermill
- Removed a spurious dependency on sklearn.
Docs
- Improvements and fixes to docs.
- Restored dagster.readthedocs.io.
Experimental
- An initial implementation of solid retries, throwing a
RetryRequested
exception, was added.
This API is experimental and likely to change.
Other
- Renamed property
runtime_type
todagster_type
in definitions. The following are deprecated
and will be removed in a future version.InputDefinition.runtime_type
is deprecated. UseInputDefinition.dagster_type
instead.OutputDefinition.runtime_type
is deprecated. UseOutputDefinition.dagster_type
instead.CompositeSolidDefinition.all_runtime_types
is deprecated. UseCompositeSolidDefinition.all_dagster_types
instead.SolidDefinition.all_runtime_types
is deprecated. UseSolidDefinition.all_dagster_types
instead.PipelineDefinition.has_runtime_type
is deprecated. UsePipelineDefinition.has_dagster_type
instead.PipelineDefinition.runtime_type_named
is deprecated. UsePipelineDefinition.dagster_type_named
instead.PipelineDefinition.all_runtime_types
is deprecated. UsePipelineDefinition.all_dagster_types
instead.