With the new asset_selection parameter on @sensor and SensorDefinition, you can now define a sensor that directly targets a selection of assets, instead of targeting a job.
materialize and materialize_to_memory now accept a raise_on_error argument, which allows you to determine whether to raise an Error if the run hits an error or just return as failed.
(experimental) Dagster now supports multi-dimensional asset partitions, through a new MultiPartitionsDefinition object. An optional schema migration enables support for this feature (run via dagster instance migrate). Users who are not using this feature do not need to run the migration.
You can now launch a run that targets a range of asset partitions, by supplying the "dagster/asset_partition_range_start" and "dagster/asset_partition_range_end" tags.
[dagit] Asset and op graphs in Dagit now show integration logos, making it easier to identify assets backed by notebooks, DBT, Airbyte, and more.
[dagit] a -db-pool-recycle cli flag (and dbPoolRecycle helm option) have been added to control how long the pooled connection dagit uses persists before recycle. The default of 1 hour is now respected by postgres (mysql previously already had a hard coded 1hr setting). Thanks @adam-bloom!
[dagster-airbyte] Introduced the ability to specify output IO managers when using load_assets_from_airbyte_instance and load_assets_from_airbyte_project.
[dagster-dbt] the dbt_cloud_resource resource configuration account_id can now be sourced from the environment. Thanks @sowusu-ba!
[dagster-duckdb] The DuckDB integration improvements: PySpark DataFrames are now fully supported, “schema” can be specified via IO Manager config, and API documentation has been improved to include more examples
[dagster-fivetran] Introduced experimental load_assets_from_fivetran_instance helper which automatically pulls assets from a Fivetran instance.
[dagster-k8s] Fixed an issue where setting the securityContext configuration of the Dagit pod in the Helm chart didn’t apply to one of its containers. Thanks @jblawatt!
Fixed a bug that caused the asset_selection parameter of RunRequest to not be respected when used inside a schedule.
Fixed a bug with health checks during delayed Op retries with the k8s_executor and docker_executor.
[dagit] The asset graph now live-updates when assets fail to materialize due to op failures.
[dagit] The "Materialize" button now respects the backfill permission for multi-run materializations.
[dagit] Materializations without metadata are padded correctly in the run logs.
[dagster-aws] Fixed an issue where setting the value of task_definition field in the EcsRunLauncher to an environment variable stopped working.
[dagster-dbt] Add exposures in load_assets_from_dbt_manifest. This fixed then error when load_assets_from_dbt_manifest failed to load from dbt manifest with exposures. Thanks @sowusu-ba!
[dagster-duckdb] In some examples, the duckdb config was incorrectly specified. This has been fixed.
The behavior of the experimental asset reconciliation sensor, which is accessible via build_asset_reconciliation_sensor has changed to be more focused on reconciliation. It now materializes assets that have never been materialized before and avoids materializing assets that are “Upstream changed”. The build_asset_reconciliation_sensor API no longer accepts wait_for_in_progress_runs and wait_for_all_upstream arguments.
[dagit] The new Overview and Workspace pages have been enabled for all users, after being gated with a feature flag for the last several releases. These changes include design updates, virtualized tables, and more performant querying.
The top navigation has been updated to improve space allocation, with main nav links moved to the left.
“Overview” is the new Dagit home page and “factory floor” view, were you can find the run timeline, which now offers time-based pagination. The Overview section also contains pages with all of your jobs, schedules, sensors, and backfills. You can filter objects by name, and collapse or expand repository sections.
“Workspace” has been redesigned to offer a better summary of your repositories, and to use the same performant table views, querying, and filtering as in the Overview pages.
@asset and @multi_asset now accept a retry_policy argument. (Thanks @adam-bloom!)
When loading an input that depends on multiple partitions of an upstream asset, the fs_io_manager will now return a dictionary that maps partition keys to the stored values for those partitions. (Thanks @andrewgryan!).
JobDefinition.execute_in_process now accepts a run_config argument even when the job is partitioned. If supplied, the run config will be used instead of any config provided by the job’s PartitionedConfig.
The run_request_for_partition method on jobs now accepts a run_config argument. If supplied, the run config will be used instead of any config provided by the job’s PartitionedConfig.
The new NotebookMetadataValue can be used to report the location of executed jupyter notebooks, and Dagit will be able to render the notebook.
Resolving asset dependencies within a group now works with multi-assets, as long as all the assets within the multi-asset are in the same group. (Thanks @peay!)
UPathIOManager, a filesystem-agnostic IOManager base class has been added - (Thanks @danielgafni!)
A threadpool option has been added for the scheduler daemon. This can be enabled via your dagster.yaml file; check out the docs.
The default LocalComputeLogManager will capture compute logs by process instead of by step. This means that for the in_process executor, where all steps are executed in the same process, the captured compute logs for all steps in a run will be captured in the same file.
[dagstermill] Added define_dagstermill_asset which loads a notebook as an asset.
[dagster-airflow] make_dagster_job_from_airflow_dag now supports airflow 2, there is also a new mock_xcom parameter that will mock all calls to made by operators to xcom.
[helm] volume and volumeMount sections have been added for the dagit and daemon sections of the helm chart.
For partitioned asset jobs whose config is a hardcoded dictionary (rather than a PartitionedConfig), previously run_request_for_partition would produce a run with no config. Now, the run has the hardcoded dictionary as its config.
Previously, asset inputs would be resolved to upstream assets in the same group that had the same name, even if the asset input already had a key prefix. Now, asset inputs are only resolved to upstream assets in the same group if the input path only has a single component.
Previously, asset inputs could get resolved to outputs of the same AssetsDefinition, through group-based asset dependency resolution, which would later error because of a circular dependency. This has been fixed.
Previously, the “Partition Status” and “Backfill Status” fields on the Backfill page in dagit were always incomplete and showed missing partitions. This has been fixed to accurately show the status of the backfill runs.
Executors now compress step worker arguments to avoid CLI length limits with large DAGs.
[dagit] When viewing the config dialog for a run with a very long config, scrolling was broken and the “copy” button was not visible. This has been fixed.
[dagster-msteams] Longer messages can now be used in Teams HeroCard - thanks @jayhale
Fixed a bug that broke asset partition mappings when using the key_prefix with methods like load_assets_from_modules.
[dagster-dbt] When running dbt Cloud jobs with the dbt_cloud_run_op, the op would emit a failure if the targeted job did not create a run_results.json artifact, even if this was the expected behavior. This has been fixed.
Improved performance by adding database indexes which should speed up the run view as well as a range of asset-based queries. These migrations can be applied by running dagster instance migrate.
An issue that would cause schedule/sensor latency in the daemon during workspace refreshes has been resolved.
[dagit] Shift-clicking Materialize for partitioned assets now shows the asset launchpad, allowing you to launch execution of a partition with config.
[dagster-dbt] Added a display_raw_sql flag to the dbt asset loading functions. If set to False, this will remove the raw sql blobs from the asset descriptions. For large dbt projects, this can significantly reduce the size of the generated workspace snapshots.
[dagit] A “New asset detail pages” feature flag available in Dagit’s settings allows you to preview some upcoming changes to the way historical materializations and partitions are viewed.
Tags can now be provided to an asset reconciliation sensor and will be applied to all RunRequests returned by the sensor.
If you don’t explicitly specify a DagsterType on a graph input, but all the inner inputs that the graph input maps to have the same DagsterType, the graph input’s DagsterType will be set to the the DagsterType of the inner inputs.
[dagster-airbyte] load_assets_from_airbyte_project now caches the project data generated at repo load time so it does not have to be regenerated in subprocesses.
[dagster-airbyte] Output table schema metadata is now generated at asset definition time when using load_assets_from_airbyte_instance or load_assets_from_airbyte_project.
[dagit] The run timeline now groups all jobs by repository. You can collapse or expand each repository in this view by clicking the repository name. This state will be preserved locally. You can also hold Shift while clicking the repository name, and all repository groups will be collapsed or expanded accordingly.
[dagit] In the launchpad view, a “Remove all” button is now available once you have accrued three or more tabs for that job, to make it easier to clear stale configuration tabs from view.
[dagit] When scrolling through the asset catalog, the toolbar is now sticky. This makes it simpler to select multiple assets and materialize them without requiring you to scroll back to the top of the page.
[dagit] A “Materialize” option has been added to the action menu on individual rows in the asset catalog view.
[dagster-aws] The EcsRunLauncher now allows you to pass in a dictionary in the task_definition config field that specifies configuration for the task definition of the launched run, including role ARNs and a list of sidecar containers to include. Previously, the task definition could only be configured by passing in a task definition ARN or by basing the the task definition off of the task definition of the ECS task launching the run. See the docs for the full set of available config.
Previously, yielding a SkipReason within a multi-asset sensor (experimental) would raise an error. This has been fixed.
[dagit] Previously, if you had a partitioned asset job and supplied a hardcoded dictionary of config to define_asset_job, you would run into a CheckError when launching the job from Dagit. This has been fixed.
[dagit] When viewing the Runs section of Dagit, the counts displayed in the tabs (e.g. “In progress”, “Queued”, etc.) were not updating on a poll interval. This has been fixed.
AssetMaterialization now has a metadata property, which allows accessing the materialization’s metadata as a dictionary.
DagsterInstance now has a get_latest_materialization_event method, which allows fetching the most recent materialization event for a particular asset key.
RepositoryDefinition.load_asset_value and AssetValueLoader.load_asset_value now work with IO managers whose load_input implementation accesses the op_def and name attributes on the InputContext.
RepositoryDefinition.load_asset_value and AssetValueLoader.load_asset_value now respect the DAGSTER_HOME environment variable.
InMemoryIOManager, the IOManager that backs mem_io_manager, has been added to the public API.
The multi_asset_sensor (experimental) now supports marking individual partitioned materializations as “consumed”. Unconsumed materializations will appear in future calls to partitioned context methods.
The build_multi_asset_sensor_context testing method (experimental) now contains a flag to set the cursor to the newest events in the Dagster instance.
TableSchema now has a static constructor that enables building it from a dictionary of column names to column types.
Added a new CLI command dagster run migrate-repository which lets you migrate the run history for a given job from one repository to another. This is useful to preserve run history for a job when you have renamed a repository, for example.
[dagit] The run timeline view now shows jobs grouped by repository, with each repository section collapsible. This feature was previously gated by a feature flag, and is now turned on for everyone.
[dagster-airbyte] Added option to specify custom request params to the Airbyte resource, which can be used for auth purposes.
[dagster-airbyte] When loading Airbyte assets from an instance or from YAML, a filter function can be specified to ignore certain connections.
[dagster-airflow] DagsterCloudOperator and DagsterOperator now support Airflow 2. Previously, installing the library on Airflow 2 would break due to an import error.
[dagster-duckdb] A new integration with DuckDB allows you to store op outputs and assets in an in-process database.
Previously, if retries were exceeded when running with execute_in_process, no error would be raised. Now, a DagsterMaxRetriesExceededError will be launched off.
[dagster-airbyte] Fixed generating assets for Airbyte normalization tables corresponding with nested union types.
[dagster-dbt] When running assets with load_assets_from_...(..., use_build=True), AssetObservation events would be emitted for each test. These events would have metadata fields which shared names with the fields added to the AssetMaterialization events, causing confusing historical graphs for fields such as Compilation Time. This has been fixed.
[dagster-dbt] The name for the underlying op for load_assets_from_... was generated in a way which was non-deterministic for dbt projects which pulled in external packages, leading to errors when executing across multiple processes. This has been fixed.
Added an example, underneath examples/assets_smoke_test, that shows how to write a smoke test that feeds empty data to all the transformations in a data pipeline.
Added documentation for build_asset_reconciliation_sensor.
Added documentation for monitoring partitioned materializations using the multi_asset_sensor and kicking off subsequent partitioned runs.
[dagster-cloud] Added documentation for running the Dagster Cloud Docker agent with Docker credential helpers.
[dagster-dbt] The class methods of the dbt_cli_resource are now visible in the API docs for the dagster-dbt library.
[dagster-dbt] Added a step-by-step tutorial for using dbt models with Dagster software-defined assets