Show Table of Contents
Data blending is a method for combining data that supplements a table of data from one data source with columns of data from another data source.
Usually you use joins to perform this kind of data combining, but there are times, depending on factors like the type of data and its granularity, when it's better to use data blending.
For example, suppose you have transactional data stored in Salesforce and quota data stored in an Excel workbook. The data you want to combine is stored in different databases, and the granularity of the data captured in each table is different in the two data sources, so data blending is the best way to combine this data.
Data blending is useful under the following conditions:
You want to combine data from different databases that are not supported by cross-database joins.
Cross-database joins do not support connections to cubes (for example, Oracle Essbase) or to some extract-only connections (for example, Salesforce). In this case, set up individual data sources for the data you want to analyze, and then use data blending to combine the data sources on a single sheet.
Data is at different levels of detail.
Sometimes one data set captures data using greater or lesser granularity than the other data set.
For example, suppose you are analyzing transactional data and quota data. Transactional data might capture all transactions. However, quota data might aggregate transactions at the quarter level. Because the transactional values are captured at different levels of detail in each data set, you should use data blending to combine the data.
Use data blending instead of joins under the following conditions:
Data needs cleaning.
If your tables do not match up with each other correctly after a join, set up data sources for each table, make any necessary customizations (that is, rename columns, change column data types, create groups, use calculations, etc.), and then use data blending to combine the data.
Joins cause duplicate data.
Duplicate data after a join is a symptom of data at different levels of detail. If you notice duplicate data, instead of creating a join, use data blending to blend on a common dimension instead.
You have lots of data.
Typically joins are recommended for combining data from the same database. Joins are handled by the database, which allows joins to leverage some of the database’s native capabilities. However, if you're working with large sets of data, joins can put a strain on the database and significantly affect performance. In this case, data blending might help. Because Tableau handles combining the data after the data is aggregated, there is less data to combine. When there is less data to combine, generally, performance improves.
Note: When you blend on a field with a high level of granularity, for example, date instead of year, queries can be slow.
Your data must meet the following requirements in order for you to use data blending.
Data blending requires a primary data source and at least one secondary data source. When you designate a primary data source, it functions as the main table or main data source. Any subsequent data sources that you use on the sheet are treated as a secondary data source. Only columns from the secondary data source that have corresponding matches in the primary data source appear in the view.
Using the same example from above, you designate the transactional data as the primary data source and the quota data as the secondary data source.
Note: Cube (multidimensional) data sources must be used as the primary data source. Cube data sources cannot be used as a secondary data source.
After designating primary and secondary data sources, you must define the common dimension or dimensions between the two data sources. This common dimension is called the linking field.
Continuing the example from above, when you blend transactional and quota data, the date field might be the linking field between the primary and secondary data sources.
If the date field in the primary and secondary data sources have the same name, Tableau creates the relationship between the two fields and shows a link icon () next to the date field in the secondary data source when the field is in the view.
If the two dimensions don’t have the same name, you can define a relationship that creates the correct mapping between the date fields in the primary and secondary data sources.
Data blending simulates a traditional left join. The main difference between the two is when the join is performed with respect to aggregation.
When you use a left join to combine data, a query is sent to the database where the join is performed. Using a left join returns all rows from the left table and any rows from the right table that has a corresponding row match in the left table. The results of the join are then sent back to and aggregated by Tableau.
For example, suppose you have the following tables. If the common columns are User ID and Patron ID, a left join takes all the data from the left table, as well as all the data from the right table because each row has a corresponding row match in the left table.
When you use data blending to combine data, a query is sent to the database for each data source that is used on the sheet. The results of the queries, including the aggregated data, are sent back to and combined by Tableau. The view uses all rows from the primary data source, the left table, and the aggregated rows from the secondary data source, the right table, based on the dimension of the linking fields. Dimension values are aggregated using the ATTR aggregate function, which means the aggregation returns a single value for all rows in the secondary data source. If there are multiple values for the rows, an asterisk (*) is shown. Measure values are aggregated based on how the field is aggregated in the view.
You can change the linking field or add more linking fields to include different or additional rows of data from the secondary data source in the blend, changing the aggregated values.
For example, suppose you have the following tables. If the linking fields are User ID and Patron ID, blending your data takes all of the data from the left table, and supplements the left table with the data from the right table. In this case, not all values can be a part of the resulting table because of the following:
A row in the left table does not have a corresponding row match in the right table, as indicated by the null value.
There are multiple corresponding values in the rows in the right table, as indicated by the asterisk (*).
Suppose you have the same tables as above, but the secondary data source contains a new field called Fines. Again, if the linking fields are User ID and Patron ID, blending your data takes all of the data from the left table, and supplements it with data from the right table. In this case, you see the same null value and asterisks in the previous example in addition to the following:
Because the Fines field is a measure, you see the row values for the Fines field aggregated before the data in the right table is combined with the data in the left table.
As with the previous example, a row in the left table does not have corresponding row for the Fines field, as indicated by the second null value.
You can use data blending when you have data in separate data sources that you want to analyze together on a single sheet. The following example demonstrates how to blend data from two data sources: an Excel data source and an SQL Server data source.
Connect to a set of data and set up the data source on the data source page. This example uses the Sample - Superstore data source.
Select Data > New data source, connect to the second set of data, and then set up the data source. This example uses a SQL Server data source that contains information about forecasted sales, called Sales Plan.
Click the sheet tab to start building your view.
Drag at least one field from your primary data source into the view to designate it as the primary data source.
In the Data pane, click the data source that you want to designate as the primary data source. In this example, Sample - Superstore is selected.
Drag the fields you want to use from the data source into the view. In this example, a view is created that shows Sales by Segment and Category.
Any fields used in the view from data sources that are not the primary data source or active links automatically designate subsequent data sources as the secondary data source.
In the Data pane, click the data source that you want to designate as the secondary data source. In this example, the Sales Plan data source is selected.
When you complete this step, an orange bar displays down the left side of the Data pane. The orange bar indicates the secondary data source. In addition to the orange bar, broken link icons display next to potential linking fields.
Click a broken link icon () to establish a relationship between the secondary and primary data sources. This is the field that determines the level of detail that Tableau should aggregate to. In this example, Segment is the linking field.
If a broken link icon does not appear next to the field that should be the linking field or no broken links appear, see Step 4: (Optional) Define or edit relationships .
Drag the fields you want to use from the secondary data source into the view. In this example, Sales Plan field is used on the Detail card to change the level of detail of the view.
Tableau detects when a field from the primary data source also exists in a secondary data source, and indicates that the fields are potential linking fields by marking them with a broken link icon in the Data pane. You click a broken link icon to establish a relationship between the primary and secondary data sources, and have Tableau blend data from both data sources on a single sheet.
You must have at least one linked field in order to use data from the secondary data source.
Select Data > Edit Relationships.
In the Relationships dialog box, verify that the primary data source is selected from the Primary data source drop-down list. In this example, the Sample - Superstore data source is selected.
Select the secondary data source in the Secondary data source pane, select Custom in the relationships list, and then click the Add button. In this example, the Sales Plan data source is selected.
In the Add/Edit Field Mapping dialog box, do the following:
Select a field from the primary data source.
Select a field from the secondary data source to establish the linking field or the relationship between the data sources even though the fields do not have the same name.
In this example, a relationship between Segment field in the Superstore – Sample data source is selected and Customer Segment field in the Sales Plan data source is selected. You can map these two fields to create a relationship even though they don't have the same name.
(Optional) Continue to add and remove as many relationships as necessary and then click OK.
The related fields are shown in the secondary data source as potential linking fields.
Click the broken link icon () next to these fields in the data pane to make the relationship between the primary and secondary data sources active. In this case, the broken link icon next to Customer Segment is made into an active link icon ().
If the related field from the primary data source is used in the view, the link becomes active automatically.
When data blending, the relationship matches values based on the member aliases. You can fix fields that don’t match by editing the aliases. For example, when you map a Segment field in the primary data source to the Segment field in the secondary data source, “Small Business” will not map correctly to “S. Business”. You must edit the aliases in one of the data sources. For more information, see Create Aliases to Rename Members in the View.
You can also use a secondary data source to re-alias the field values in a primary data source. For more information, see Alias Field Values Using Data Blending.
There are some data blending limitations around non-additive aggregates, such as COUNTD, MEDIAN, and RAWSQLAGG. For more information, see Troubleshoot Data Blending.
(Back to top)