Plan the Data Source
At the center of Tableau is your data. How successful you are with exploring your data, building views, and answering questions about your data depends on how you set up the Tableau data source.
The data source is the conduit to your data and Tableau—the data source tells Tableau how to interpret and interact with your data. Even though you can set up the data source with just a few clicks, it's helpful to know a few things before you begin that affect how you set up the data source.
A Tableau data source is the link between your data and Tableau. It is essentially the sum of your data, the connection information, and the customizations that you make on top of data to work with it in Tableau. The data source can contain:
Information about where the data is located, such as a file name and path or a network location. Or, details on how to connect to your data, such as database server name and server sign-in information.
The names of any tables in the connection, as well as information about how the tables relate to each other.
A layer of customizations that you make on top of your data but that are not part of the original data itself, such as calculations, groups, and renamed fields. For details, see Organize and Customize Fields in the Data Pane.
After you connect to your data and select the first table to interact with from your database, a basic data source is created for you.
Because your data can be captured and stored in so many different ways, you might need to do some additional set up of the Tableau data source before you begin your analysis.
If your data is stored in one table, there’s typically less set up required for your data source. However, you still might need to prepare the data for analysis. For details, see Tips for Working with Your Data.
When working with multiple tables that contain data that you want, you might need to combine the data. How you combine the data varies based on whether the tables are from the same or different databases (including the same or different Excel workbooks or folders).
When you set up a data source, it’s important to know whether your data is stored in one table or in multiple tables.
Data comes from a single table
If your data comes from one table, you can connect to your data, drag a table onto the canvas to create the data source, and then start building your view. For more information, see Connect to Your Data.
Data comes from multiple tables from the same database
If your data comes from multiple tables in the same database, you can connect to your data and then combine the tables in Tableau. In most cases, when your data is coming from a single database, the data source you set up in Tableau contains a single connection to a database that has all the tables that you need for your analysis.
When your data is stored in multiple tables from the same database, the following methods are available to combine your data depending on the data you're working with:
- Data blending
Data comes from multiple tables from different databases
If your data comes from multiple tables in different databases, you also can connect to your data and then combine the tables in Tableau. But before combining the tables, you must decide between setting up:
- an individual data source for each connection to the database that contains the tables that you need for your analysis, or
- a single data source with multiple independent connections to all the tables you need for your analysis.
When your data is stored in tables from different databases, you can choose from the following methods to combine data:
- Data blending – supports one connection per data source, and one data source per database (or Excel or text file)
- Cross-database join – allows two or more connections per data source
In general, there are three methods for combining data in Tableau: join, blend, or union.
Joining is a method for combining tables related by common fields (that is, common columns). The result of combining data using a join is a virtual table that extends horizontally by adding columns of data.
Tableau supports joins between tables in the same database and between tables in different databases, which is also known as a cross-database join.
For more information about how to join data in Tableau, see Join Your Data.
Data blending is also another method that lets you combine data. When you use data blending to combine your data, you combine data in what is called a primary data source with common fields from one or more secondary data sources.
Data blending is useful when the data sources you’re working with contain data that's at different levels of detail. This means, when one data set captures data using greater or lesser granularity than the other data set. For example, when you want to combine a data source that has daily transactional data with a table that contains quarterly data.
Data blending is also useful when you want to join data but the databases you're using don’t allow joins—for example, data in cube data sources.
As with joins, the result of combining data using data blending is a virtual table that extends horizontally by adding columns of data.
For more information about how data blending works and how to blend your data in Tableau, see Blend Your Data.
Unioning is a method for appending values (that is, rows) to tables. You can union tables if they have the same columns. The result of combining data using a union is a virtual table that has the same columns but extends vertically by adding rows of data.
You can union tables in one of two ways: manually or using wildcard search. For more information about how to union data using one of these methods in Tableau, see Union Your Data.
Whether you are working with multiple tables of data from the same or different databases, you must consider the relationship between the tables you want to combine in conjunction with your analysis goals.
When thinking about the relationship one table has with another, ask yourself questions like “Is one table significantly larger than the other?” Or, “Does one table contain a different level of detail than the other?”
In many cases, it’s important to ask these questions up front, before you set up the data source. But sometimes the problems aren’t highlighted until you’ve set up the data source.
Before you set up the data source
If you haven’t set up the data source, consider the following relationship scenarios and recommendations.
When you need to enrich an existing table (star schema). Use a join to combine data when you need to add more dimensions to a fact table, or a table with the quantitative information, with dimensions from other tables in the same database. This type of relationship between the tables is sometimes referred to as a star schema.
For example, suppose you have the following tables: A, B, C, D, and E.
Table A contains sales data; the quantitative data in this table is the Quantity value. Table B contains customer data, table C contains neighborhood data, table D contains time data, and table E contains product data. All tables are related to the fact table, table A, but none of the dimension tables (B, C, D, and E) are related to each other.
When all tables have a chain of relationships (snowflake schema). Use a join to combine data when table C relates to table A via table B. The relationship between the tables can be described as a snowflake schema.
For example, suppose you have the following tables: A, B, and C.
Table A contains sales data. Table B contains store data. Finally, table C contains geographic data. In this example, table A and table B relate to each by the columns highlighted in pink. Table B and table C relate to each other by the columns highlighted in orange. Tables A and C relate to each other because of their relationship to table B.
When tables have a different level of detail – multi-fact table: Use data blending when the measure values in the tables of data you need to combine are at a different level of detail. Different level of detail means that one data set captures the data using either greater or lesser granularity than the other data set. The relationship between the tables of data can be described as multi-fact tables.
For example, suppose you have the following tables: 1, 2, A, B, C, D, and E. Table 1 contains sales data and table 2 contains purchase data. Table B and C contain retailer and area data. Tables D and E contain supplier and region information. In this case, table 1 and its dimension tables capture transactional data at a different level of detail than table 2 and its dimension tables.
When each table has a large number of records: Cardinality refers to the uniqueness of data contained in a column. Cardinality plays a role when you’re working with tables that have a large number of records. When the tables you want to analyze contain many rows of data, queries can be slow (and performance of the overall data source is affected) so Tableau recommends that you choose a method for combining data based on the cardinality of the related columns between tables.
- When related columns have a lot of repeated data – low cardinality: For example, a table called Student might contain a Gender column that contains only two values: female and male. When related columns between tables have low cardinality, use data blending to combine data.
- When related columns have highly unique data – high cardinality: For example, a table called Policy Holders might contain an ID column that contains a unique value for every subscriber. When related columns between tables have high cardinality, use joins or cross-database joins to combine your data.
When you need to analyze both summary and detail information together: If you need to see both the summary of a calculation and the breakdown of the calculation in the same view, use data blending. For more information, see Blend on Summary Data.
After you set up the data source
If you’ve set up the data source and have already chosen a method for combing your data, consider the following scenarios and alternative recommendations in case your current method doesn't produce the results you expect:
- When you have created multiple connections to the same database: Use data blending if a join doesn’t produce the expected results. Unexpected results from a join can result from rows being counted twice because the data sets are at different levels of detail. Or, you’re trying to create a view that shows both summary and detail information together. An alternative to using blending in this case is to use LOD calculations. For more information, see Level of Detail Expressions and Aggregation.
- When you have created multiple data sources for tables in the same database: Make sure you use a single connection and join tables that belong to the same database. Ensuring that the data source in your workbook uses one connection per database reduces the number of queries sent to the database, which improves the overall performance of the data source, views, and workbook.
- When you need to add more data to an existing data source: Use a join if the data is stored in the same database.
After you understand the relationship your data has to each other, also consider your analysis goals.
When thinking about the data source that you're setting up to use for your analysis, ask yourself questions like, “Does my data need to be published to Tableau Server so that it can be shared?” Or “Does my data, stored in different tables, need to be refreshed in real-time?” Or “Can the different tables of data be updated at different intervals?”
Because the table combining methods that you can use are done at either the data source or workbook level, answers to above questions can also help you determine the best method for combining tables so that you can optimize your efforts when setting up the data sources.
|Tables in the same database||Tables in different databases|
|Join||Union||Data blending||Cross-database join|
Use the data source as a template to reuse within the current workbook.
|Joins can be created for each connection in the data source.||Unions can be created for each connection in the data source.||Blends can be created for each sheet in the workbook.||Cross-database joins can be created for each data source in the workbook.|
Use the data source as a template for reuse in a different workbook.
|Data sources that contain joins can be shared or published for different analysis.||Data sources that contain unions can be shared or published to be used for different analysis.||Because blends are saved to the workbook (and not to the data source), if you need to use a blend, duplicate the sheet that uses the blend to reuse for different analysis.||Data sources that contain cross-database joins can be shared or published.|
Single source of truth
Use the data source for the purpose of sharing the data captured in it as a single source of truth.
|Data sources that contain joins can be used as a single source of truth.||Data sources that contain unions can be used as a single source of truth.||Because blends are comprised of two or more data sources, the workbook (and not the data sources) that contains the blend can be shared as the single source of truth.||Data sources that contain cross-database joins can be used as a single source of truth.|