Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog

We are very excited to announce a lot of new improvements to data destinations in dataflows gen 2. Here is an overview of the new improvements and how to get started.

After you have cleaned and prepared your data with dataflows gen 2, you want to land your data into a destination. This is possible with the data destination capabilities in dataflows gen 2. With this capability you can pick from different destinations, like Azure SQL, Fabric Lakehouse and many more. Dataflows gen2 will write your data to the destination, and from there you can use your data for further analysis and reporting.

Supported data destinations.

Azure SQL databases
Azure data explorer (Kusto)
Fabric Lakehoue
Fabric Warehouse
Fabric KQL database

Entry points

Every data query in your dataflow gen 2 can have a data destination. Functions and list are not supported. You can specify the data destination for every query individually and you can use multiple different destinations within the dataflow.

There are 3 main entry-points to specify the data destination.

Through the top ribbon

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (1)

Through query settings

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (2)

Through the diagram view

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (3)

Connect to the data destination

Connecting to the data destination is similar to connecting to a data source. Connections can be used for both reading and writing your data, given that you have the right permissions on the data source. You need to create a new connection or pick an existing connection and click next.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (4)

Create a new table or pick an existing table

When loading into your data destination, you have the possibility to either create a new table or pick an existing table.

Managed settings for new tables

When loading into a new table, by default the automatic settings are on. Using the automatic settings, dataflows gen 2 manages the mapping for you. This will allow you the following behavior:

Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

NOTE: currently this is only supported for Lakehouse and Azure SQL database as data destination.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (7)

Manual settings

By un-toggle the use automatic setting, you get full control over how to load your data into the data destination. You can make any changes to the column mapping by changing the source type or excluding any column that you do not need in your data destination.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (8)

Update methods.

Most destinations support both Append and Replace as update methods. Fabric KQL databases and Azure data explorer do not support replace as update method.

Replace: On every dataflow refresh, your data will be dropped from the destination and replaced by the output data of the dataflow.

Append: On every dataflow refresh, the output data from the dataflow will be appended to the existing data in the data destination table.

Schema options on publish
Schema options on publish only apply when the update method in replace. When appending data, changes to the schema are not possible.

Dynamic schema: When choosing dynamic schema, you allow for schema changes in the data destination when you republish the dataflow. Because you are not using managed mapping, you still need to update the column mapping in dataflow destination flow when you make any changes to your query. When the dataflow is refreshed, your table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

Fixed schema: When choosing fixed schema, schema changes are not possible. When the dataflow gets refreshed, only the rows in the table will be dropped and replaced with the output data from the dataflow. Any relationships or measures on the table will stay intact. If you make any changes to your query in the dataflow, the dataflow publish will fail if it detects that the query schema does not match the data destination schema. Use this setting when you do not plan to change the schema and have relationships or measure added to you destination table.

NOTE: when loading data into the warehouse, only fixed schema is supported:

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (9)

Supported data source types per destination

Supported data types per storage location:	DataflowStagingLakehouse	Azure DB (SQL) Output	Azure Data Explorer Output	Fabric Lakehouse (LH) Output	Fabric Warehouse (WH) Output
Action	No	No	No	No	No
Any	No	No	No	No	No
Binary	No	No	No	No	No
Currency	Yes	Yes	Yes	Yes	No
DateTimeZone	Yes	Yes	Yes	No	No
Duration	No	No	Yes	No	No
Function	No	No	No	No	No
None	No	No	No	No	No
Null	No	No	No	No	No
Time	Yes	Yes	No	Yes	Yes
Type	No	No	No	No	No
Structured (List, Record, Table)	No	No	No	No	No

Advanced topics:

Using staging before loading to a destination

To enhance performance of query processing, staging can be used within dataflows gen 2 to leverage Fabric compute to execute your queries.

When staging is enabled on your queries (this is the default behavior), you data will be loading into the staging location, which is an internal Lakehouse only accessible by dataflows itself.

Using staging locations can be either beneficial or the opposite when loading data into the data destination.

Loading data into the Lakehouse

When you are loading data into the Lakehouse, it is advised to disable staging on the query to avoid loading twice into a similar destination, once for staging and once for data destination. To improve the dataflow performance, disable staging for any query that has Lakehouse as the data destination.

To disable staging, right-click on the query and disable staging, by clicking on the enable staging button. Your query will turn grey.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (10)

Loading data into the Warehouse

When loading data into the Warehouse, staging is required before the write operation to the data destination. This is to improve performance. Currently, only loading into the same warehouse as the dataflow is supported. Ensure staging is enabled for all queries that load into the warehouse.

When staging is disabled, and you choose warehouse as output destination, you will get a warning to enable staging first before you can configure the data destination:

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (11)

If you already have a warehouse as destination and try to disable staging, a warning will be shown where you can either remove warehouse as the destination or dismiss the staging action:

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (12)

Nullable issues

When you have a nullable column, in some cases, it gets detected by Power Query as non-nullable and when writing to the data destination, the column type will be non-nullable. During refresh, the following error will occur:

E104100 Couldn’t refresh entity because of an issue with the mashup document MashupException.Error: DataFormat.Error: Error in replacing table’s content with new data in a version: #{0}., InnerException: We can’t insert null data into a non-nullable column., Underlying error: We can’t insert null data into a non-nullable column. Details: Reason = DataFormat.Error;Message = We can’t insert null data into a non-nullable column.; Message.Format = we can’t insert null data into a non-nullable column.

To force nullable columns you can try to do the following steps:

Delete the table from the data destination
Remove the data destination from the dataflow
Go into the dataflow and update the data types by leveraging the following PQ code:
Table.TransformColumnTypes(#”PREVIOUS STEP”, {{“COLLUMNNAME1”, type nullable text}, {“COLLUMNNAME2”, type nullable Int64.Type}})
Add the data destination

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (2024)

FAQs

What is Microsoft Fabrics? ›

Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building.

Show Me More ›

What is the difference between dataflow and dataset in PBI? ›

Data flows are collections of tables, but they do not form a relational model without relationships between them. On the other hand, a Dataset is a collection of tables with relationships between them and calculated metrics, all of which are prepared and ready to be used for reporting.

Know More ›

What is Microsoft Data Flow? ›

Dataflows are a self-service, cloud-based, data preparation technology. Dataflows enable customers to ingest, transform, and load data into Microsoft Dataverse environments, Power BI workspaces, or your organization's Azure Data Lake Storage account.

Why do I need Microsoft Fabric? ›

Microsoft Fabric is being used to solve data warehousing, integration, real-time analytics, data science and machine learning, and other such requirements.

Discover More ›

What is the difference between Microsoft Fabric and Azure? ›

Fabric is built on an open, lake-centric design called OneLake. Meanwhile, Azure Synapse is a PaaS for enterprise data warehousing, integration, and analytics. It was launched as a one-stop shop for all warehousing and analytics workloads.

Get More Info ›

What is dataflow gen 2? ›

Dataflows Gen 2 are the new version of Power BI dataflows. There are so many changes in relation to the previous version they are considered a new feature. The main difference is the possibility to set a target for the result of each query in the dataflow.

Is dataflow an ETL? ›

Dataflow can also run custom ETL solutions since it has: building blocks for Operational Data Store and data warehousing; pipelines for data filtering and enrichment; pipelines to de-identify PII datasets; features to detect anomalies in financial transactions; and log exports to external systems.

See Details ›

When should I use dataflow? ›

Use Dataflow to create data pipelines that read from one or more sources, transform the data, and write the data to a destination. Typical use cases for Dataflow include the following: Data movement: Ingesting data or replicating data across subsystems.

Learn More ›

How to use Dataflows in fabric? ›

Create a dataflow

Switch to the Data Factory experience.
Go to your Fabric enabled workspace.
Select Dataflow Gen2 in the create menu.
Ingest the data from the OData source. Select Get data, and then select More. From Choose data source, search for OData, and then select the OData connector.

Jan 3, 2024

See Details ›

What are the three types of Dataflows? ›

There are three types of Data Flows:

Batch Data Flows (Batch processing)
Real-time Data Flows (Real-time processing)
Single Case Data Flows (Single Case processing)

Read The Full Story ›

Is dataflow fully managed? ›

Google Cloud Dataflow is a fully-managed, scalable data processing service for executing batch, stream, and ETL processing patterns.

Discover More ›

Is Microsoft Fabric a competitor to Snowflake? ›

Two major contenders in the world of cloud data warehousing and big data processing are Microsoft Fabric and Snowflake. Both platforms offer a suite of powerful features designed to handle large volumes of data, but understanding the nuances of each can lead to a more informed decision for your business.

Keep Reading ›

Is Microsoft Fabric free? ›

Microsoft Fabric is provided free of charge when you sign up for a Microsoft Fabric trial capacity. Your use of the Microsoft Fabric trial capacity includes access to the Fabric product workloads and the resources to create and host Fabric items. The Fabric trial lasts for 60 days.

What is Microsoft Fabric vs Databricks? ›

Databricks is a cloud-based data processing platform that provides a collaborative environment for data scientists, engineers, and analysts. On the other hand, Fabric is a unified analytics platform that brings together all the data and analytics tools that organizations need.

Discover More ›

What is the difference between Fabric and synapse? ›

While Fabric offers a broad range of data management and analytics capabilities, Synapse specializes in analytics and data warehousing solutions. Organizations might use Microsoft Fabric for its integrated data platform capabilities while leveraging Synapse for specific analytics and warehousing needs.

Discover More Details ›

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (2024)

Supported data destinations.

Entry points

Connect to the data destination

Create a new table or pick an existing table

Managed settings for new tables

Manual settings

Supported data source types per destination

Advanced topics:

Nullable issues

FAQs

What is Microsoft Fabrics? ›

Is dataflow fully managed? ›

References