Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (2024)

We are very excited to announce a lot of new improvements to data destinations in dataflows gen 2. Here is an overview of the new improvements and how to get started.

After you have cleaned and prepared your data with dataflows gen 2, you want to land your data into a destination. This is possible with the data destination capabilities in dataflows gen 2. With this capability you can pick from different destinations, like Azure SQL, Fabric Lakehouse and many more. Dataflows gen2 will write your data to the destination, and from there you can use your data for further analysis and reporting.

Supported data destinations.

  • Azure SQL databases
  • Azure data explorer (Kusto)
  • Fabric Lakehoue
  • Fabric Warehouse
  • Fabric KQL database

Entry points

Every data query in your dataflow gen 2 can have a data destination. Functions and list are not supported. You can specify the data destination for every query individually and you can use multiple different destinations within the dataflow.

There are 3 main entry-points to specify the data destination.

  1. Through the top ribbon
Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (1)
  • Through query settings
Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (2)
  • Through the diagram view
Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (3)

Connect to the data destination

Connecting to the data destination is similar to connecting to a data source. Connections can be used for both reading and writing your data, given that you have the right permissions on the data source. You need to create a new connection or pick an existing connection and click next.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (4)

Create a new table or pick an existing table

When loading into your data destination, you have the possibility to either create a new table or pick an existing table.

  • New table

When choosing to create a new table, during the dataflow gen 2 refresh, a new table will be created in your data destination. If the table gets deleted in the future by manually going into the destination, the dataflow will recreate the table on the next dataflow refresh.

By default, you table name will have the same name as your query name. If you have any invalid characters in your table name that are not supported by the destination, it will automatically be adjusted. For example, many destinations do not support spaces or special characters.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (5)

Next, you must select the destination container. If you chose any of the Fabric data destination, you could use the navigator to select the Fabric artifact you want to load your data into. For Azure destinations, you can either specify the database during connection creation, or select the database from the navigator experience.

  • Existing table

To choose an existing table, use the toggle at the top of the navigator. When choosing an existing table, you need to pick both the Fabric artifact/database and table using the navigator.

When using an existing table, the table will not be recreated in any scenario. If you delete the table manually from the data destination, the dataflow gen 2 will not recreate the table on the next refresh.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (6)

Managed settings for new tables

When loading into a new table, by default the automatic settings are on. Using the automatic settings, dataflows gen 2 manages the mapping for you. This will allow you the following behavior:

  • Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
  • Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
  • Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

NOTE: currently this is only supported for Lakehouse and Azure SQL database as data destination.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (7)

Manual settings

By un-toggle the use automatic setting, you get full control over how to load your data into the data destination. You can make any changes to the column mapping by changing the source type or excluding any column that you do not need in your data destination.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (8)
  • Update methods.

Most destinations support both Append and Replace as update methods. Fabric KQL databases and Azure data explorer do not support replace as update method.

Replace: On every dataflow refresh, your data will be dropped from the destination and replaced by the output data of the dataflow.

Append: On every dataflow refresh, the output data from the dataflow will be appended to the existing data in the data destination table.

  • Schema options on publish
    Schema options on publish only apply when the update method in replace. When appending data, changes to the schema are not possible.

Dynamic schema: When choosing dynamic schema, you allow for schema changes in the data destination when you republish the dataflow. Because you are not using managed mapping, you still need to update the column mapping in dataflow destination flow when you make any changes to your query. When the dataflow is refreshed, your table will be dropped and recreated. Your dataflow refresh will fail if you have any relationships or measures added to your table.

Fixed schema: When choosing fixed schema, schema changes are not possible. When the dataflow gets refreshed, only the rows in the table will be dropped and replaced with the output data from the dataflow. Any relationships or measures on the table will stay intact. If you make any changes to your query in the dataflow, the dataflow publish will fail if it detects that the query schema does not match the data destination schema. Use this setting when you do not plan to change the schema and have relationships or measure added to you destination table.

NOTE: when loading data into the warehouse, only fixed schema is supported:

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (9)

Supported data source types per destination

Supported data types per storage location:DataflowStagingLakehouseAzure DB (SQL) OutputAzure Data Explorer OutputFabric Lakehouse (LH) OutputFabric Warehouse (WH) Output
ActionNoNoNoNoNo
AnyNoNoNoNoNo
BinaryNoNoNoNoNo
CurrencyYesYesYesYesNo
DateTimeZoneYesYesYesNoNo
DurationNoNoYesNoNo
FunctionNoNoNoNoNo
NoneNoNoNoNoNo
NullNoNoNoNoNo
TimeYesYesNoYesYes
TypeNoNoNoNoNo
Structured (List, Record, Table)NoNoNoNoNo

Advanced topics:

  • Using staging before loading to a destination

To enhance performance of query processing, staging can be used within dataflows gen 2 to leverage Fabric compute to execute your queries.

When staging is enabled on your queries (this is the default behavior), you data will be loading into the staging location, which is an internal Lakehouse only accessible by dataflows itself.

Using staging locations can be either beneficial or the opposite when loading data into the data destination.

Loading data into the Lakehouse

When you are loading data into the Lakehouse, it is advised to disable staging on the query to avoid loading twice into a similar destination, once for staging and once for data destination. To improve the dataflow performance, disable staging for any query that has Lakehouse as the data destination.

To disable staging, right-click on the query and disable staging, by clicking on the enable staging button. Your query will turn grey.

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (10)

Loading data into the Warehouse

When loading data into the Warehouse, staging is required before the write operation to the data destination. This is to improve performance. Currently, only loading into the same warehouse as the dataflow is supported. Ensure staging is enabled for all queries that load into the warehouse.

When staging is disabled, and you choose warehouse as output destination, you will get a warning to enable staging first before you can configure the data destination:

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (11)

If you already have a warehouse as destination and try to disable staging, a warning will be shown where you can either remove warehouse as the destination or dismiss the staging action:

Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (12)

Nullable issues

When you have a nullable column, in some cases, it gets detected by Power Query as non-nullable and when writing to the data destination, the column type will be non-nullable. During refresh, the following error will occur:

E104100 Couldn’t refresh entity because of an issue with the mashup document MashupException.Error: DataFormat.Error: Error in replacing table’s content with new data in a version: #{0}., InnerException: We can’t insert null data into a non-nullable column., Underlying error: We can’t insert null data into a non-nullable column. Details: Reason = DataFormat.Error;Message = We can’t insert null data into a non-nullable column.; Message.Format = we can’t insert null data into a non-nullable column.

To force nullable columns you can try to do the following steps:

  1. Delete the table from the data destination
  2. Remove the data destination from the dataflow
  3. Go into the dataflow and update the data types by leveraging the following PQ code:

    Table.TransformColumnTypes(#”PREVIOUS STEP”, {{“COLLUMNNAME1”, type nullable text}, {“COLLUMNNAME2”, type nullable Int64.Type}})

  4. Add the data destination
Dataflows gen 2 data destinations and managed settings | Microsoft Fabric Blog | Microsoft Fabric (2024)

FAQs

What is Microsoft Fabrics? ›

Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building.

What is the difference between dataflow and dataset in PBI? ›

Data flows are collections of tables, but they do not form a relational model without relationships between them. On the other hand, a Dataset is a collection of tables with relationships between them and calculated metrics, all of which are prepared and ready to be used for reporting.

What is Microsoft Data Flow? ›

Dataflows are a self-service, cloud-based, data preparation technology. Dataflows enable customers to ingest, transform, and load data into Microsoft Dataverse environments, Power BI workspaces, or your organization's Azure Data Lake Storage account.

Why do I need Microsoft Fabric? ›

Microsoft Fabric is being used to solve data warehousing, integration, real-time analytics, data science and machine learning, and other such requirements.

What is the difference between Microsoft Fabric and Azure? ›

Fabric is built on an open, lake-centric design called OneLake. Meanwhile, Azure Synapse is a PaaS for enterprise data warehousing, integration, and analytics. It was launched as a one-stop shop for all warehousing and analytics workloads.

What is dataflow gen 2? ›

Dataflows Gen 2 are the new version of Power BI dataflows. There are so many changes in relation to the previous version they are considered a new feature. The main difference is the possibility to set a target for the result of each query in the dataflow.

Is dataflow an ETL? ›

Dataflow can also run custom ETL solutions since it has: building blocks for Operational Data Store and data warehousing; pipelines for data filtering and enrichment; pipelines to de-identify PII datasets; features to detect anomalies in financial transactions; and log exports to external systems.

When should I use dataflow? ›

Use Dataflow to create data pipelines that read from one or more sources, transform the data, and write the data to a destination. Typical use cases for Dataflow include the following: Data movement: Ingesting data or replicating data across subsystems.

How to use Dataflows in fabric? ›

Create a dataflow
  1. Switch to the Data Factory experience.
  2. Go to your Fabric enabled workspace.
  3. Select Dataflow Gen2 in the create menu.
  4. Ingest the data from the OData source. Select Get data, and then select More. From Choose data source, search for OData, and then select the OData connector.
Jan 3, 2024

What are the three types of Dataflows? ›

There are three types of Data Flows:
  • Batch Data Flows (Batch processing)
  • Real-time Data Flows (Real-time processing)
  • Single Case Data Flows (Single Case processing)

Is dataflow fully managed? ›

Google Cloud Dataflow is a fully-managed, scalable data processing service for executing batch, stream, and ETL processing patterns.

Is Microsoft Fabric a competitor to Snowflake? ›

Two major contenders in the world of cloud data warehousing and big data processing are Microsoft Fabric and Snowflake. Both platforms offer a suite of powerful features designed to handle large volumes of data, but understanding the nuances of each can lead to a more informed decision for your business.

Is Microsoft Fabric free? ›

Microsoft Fabric is provided free of charge when you sign up for a Microsoft Fabric trial capacity. Your use of the Microsoft Fabric trial capacity includes access to the Fabric product workloads and the resources to create and host Fabric items. The Fabric trial lasts for 60 days.

What is Microsoft Fabric vs Databricks? ›

Databricks is a cloud-based data processing platform that provides a collaborative environment for data scientists, engineers, and analysts. On the other hand, Fabric is a unified analytics platform that brings together all the data and analytics tools that organizations need.

What is the difference between Fabric and synapse? ›

While Fabric offers a broad range of data management and analytics capabilities, Synapse specializes in analytics and data warehousing solutions. Organizations might use Microsoft Fabric for its integrated data platform capabilities while leveraging Synapse for specific analytics and warehousing needs.

References

Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated:

Views: 5316

Rating: 4.2 / 5 (53 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.