Connect to Data
To use Tableau Prep Builder to clean and prepare your data, start a new flow by connecting to your data, just like in Tableau Desktop. You can also open an existing flow and pick up where you left off.
You can see and access your most recent flows right on the Start page, so it's easy to find your work in progress. After you connect to your data, use the different options in the Input step to identify the data that you want to work with in your flow.
Tableau Prep Builder supports connections to popular types of data as well as Tableau data extracts (.tde or .hyper). Connectors that are supported in Tableau Desktop might not yet be supported in your version of Tableau Prep Builder. But new connectors are added with each new version, so check the Connections pane to see if your connector is available.
Note: If you open a flow in a version where the connector isn't supported, the flow may open but might have errors or won't run unless the data connections are removed.
To check whether you can connect to your data, open Tableau Prep Builder and click the Add connection button to see if a connector for your data is listed in the left pane under Connect.
Note: Some connectors might require you to download and install a driver before you can connect to your data. See the page on the Tableau website to get driver download links and installation instructions.
Connecting to SAP HANA (version 2019.2.1 and later)
Tableau Prep Builder supports connecting to data using SAP HANA just like Tableau Desktop but with a few differences. Connect to the database using the same procedure you would use in Tableau Desktop. For more information see SAP HANA. After you connect and search for your table, drag the table to the canvas to begin building your flow.
Prompting for variables and parameters when opening a flow isn't supported in Tableau Prep Builder. Instead, in the Input pane, click the Variables and Parameters tab and select the variables and operands you want to use, then select from a list of preset values or enter custom values to query your database and return the values you need.
Note: Starting in Tableau Prep Builder version 2019.2.2 you can use Initial SQL to query your connection and if you have multiple values for a variable, you can select the value you need from a drop-down list.
You can also add additional variables. Click the plus button in the Variables section and select a variable and operand, then enter a custom value.
Note: This connector isn't supported in Tableau Prep Conductor version 2019.1. If you create a flow that uses this connector, you can publish the flow but you won't be able to run the flow using the scheduling features in Tableau Server. Instead you can run the flow using the command line interface to keep your data fresh. For more information about running flows from the command line see Refresh output files from the command line.
If you need to connect to data sources that aren't listed in the Connections pane, you can connect to any data source using the Other Databases (ODBC) connector that supports the SQL standard and implements the ODBC API. Connecting to data using the Other Databases (ODBC) connector works similarly to how you might use it in Tableau Desktop, however there are a few differences:
You can only connect using the DSN (data source name ) option.
To publish and run your flow in Tableau Server, the server must be configured using a matching DSN.
Note: Running flows from the command line (Tableau Prep Builder) that include the Other Databases (ODBC) connector isn't currently supported.
There is a single connection experience for both Windows and MacOS. Prompting for connection attributes for ODBC drivers (Windows) isn't supported.
Only 64-bit drivers are supported by Tableau Prep Builder.
Before you connect
To connect to your data using the Other Databases (ODBC) connector, you must install the database driver and set up and configure your DSN (data source name). To publish and run flows to Tableau Server, the server must also be configured with a matching DSN.
Important: Tableau Prep Builder only supports 64-bit drivers. If you have a 32-bit driver already set up and configured, you may need to uninstall it and then install the 64-bit version if the driver doesn't allow both versions to be installed at the same time.
Create a DSN using either the ODBC Data Source Administrator (64-bit) (Windows) or the using an ODBC Manager utility (MacOS).
If you don't have the utility installed on your Mac, you can download one from (www.odbcmanager.net for example) or you can manually edit the odbc.ini file.
In the ODBC Data Source Administrator (64-bit) (Windows) or the ODBC Manager utility (MacOS), add a new data source then select the driver for the data source then click Finish.
In the ODBC Driver Setup dialog, enter the configuration information such as server name, port, user name and password. Click Test (if your dialog has that option) to verify that your connection is set up correctly, then save your configuration.
Note: Tableau Prep Builder doesn't support prompting for connection attributes so you must set this information when configuring the DNS.
This example shows the configuration dialog for a MySQL Connector.
Connect using Other Databases (ODBC)
Open Tableau Prep Builder and click the Add connection button.
From the list of connectors, select Other Databases (ODBC).
In the Other Databases (ODBC) dialog, select a DSN from the drop-down list and enter the user name and password. Then click Sign In.
From the Connections pane, select your database from the drop-down list.
You can specify an Initial SQL command when connecting to databases that support it, that will run when a connection is made to the database. For example when connecting to Amazon Redshift, you can enter a SQL statement to apply a filter when connecting to the database just like adding filters in the Input step. The SQL command will apply before data is sampled and loaded into Tableau Prep Builder.
To refresh your data and run the Initial SQL command again do one of the following:
•Change the Initial SQL command and refresh the Input step be re-establishing the connection.
• Run the flow. The initial SQL command is run before processing all of the data.
•Schedule the flow to run on Tableau Server. The initial SQL is run every time that the flow is run, as part of the data loading experience.
- In the Connections pane, select a connector in the list that supports Initial SQL.
- Click the Show Initial SQL link to expand the dialog and enter your SQL statements.
Important: Parameters are not yet supported for Initial SQL in Tableau Prep Builder. If you try and enter the parameters like you would in Tableau Desktop, you will receive errors.
Working with Tableau data extracts
When you connect to a Tableau data extract, Tableau Prep Builder unpackages the extract and hyper expands, using a lot of temp space as it applies your flow operations to the resulting raw data.
This means you may need more RAM and disk space to accommodate a file that size. For example, an extract file with 18 columns and 1.2 million rows that is 360MB (8.5 GB uncompressed) may need up to 32GB RAM, 16-core, and 500GB of disk space available to support the file when it is unzipped.
Start a new flow
Open Tableau Prep Builder and click the Add connection button.
From the list of connectors, select the file type or server that hosts your data. If prompted, enter the information needed to sign in and access your data.
From the Connections pane, do one of the following:
If you connected to a file, double-click or drag a table to the Flow pane to start your flow.
For single tables, Tableau Prep Builder automatically creates an Input step for you in the Flow pane when you add data to your flow.
If you connected to a database, select a database or schema, and then double-click or drag a table to the Flow pane to start your flow.
Open an existing flow
To open an existing flow, on the Start page do one of the following :
Under Recent Flows, select a flow.
Click Open a Flow to navigate to your flow file and open it.
Click File > New to open a new workspace.
Refresh data in the Input step
If data changes in your input files or tables after you begin working with your flow, you can refresh the Input step to bring in the new data.
File input step types
Do one of the following options:
In the flow pane on the top menu, click the Refresh button to refresh all Input steps. To refresh a single Input step, click the drop-down arrow next to the refresh button and select the Input step from the list.
In the flow pane, right-click the Input step you want to refresh and select Refresh from the menu.
File, database or Tableau extract input step types
Do one of the following:
Try editing the connection.
Note: To maintain performance, Tableau Prep Builder samples large data sets. If your data is sampled, you may or may not see your new data in the profile pane. You can change the settings for how your data is sampled in the Data Sample tab in the Input step, but it may impact performance. For more information about setting your data sample size, see Set your data sample size.
In the Connections pane, right-click or Ctrl-click (MacOS) on the data source and select Edit.
Re-establish your connection by signing into the database or re-selecting the file or Tableau extract.
Remove and re-add the Input step to the flow.
In the flow pane, right-click the Input step you want to refresh and select Remove from the menu.
This will temporarily put your flow in an error state.
Connect to the updated file again.
Drag the table to the flow pane on top of the second step in the flow where you want to add the Input step. Drop it on the Add option to reconnect it to the flow.
When working with Microsoft Excel files, you can use Data Interpreter to detect sub-tables in your data as well as remove extraneous information to help prepare your data for analysis. When you turn on Data Interpreter, it detects these sub-tables and lists them as new tables in the Tables section of the Connections pane.
You can then drag them into the Flow pane. If you are using Tableau Prep Builder version 2018.1.2 or later, you can select the Wildcard union option in the Multiple Files tab to include all found sub-tables in your flow. For more information about using Wildcard union in the Input step see Union files and database tables in the Input step.
If you turn Data Interpreter off, these tables are removed from the Connections pane. If these tables are already used in the flow, this will result in flow errors from the missing data.
Note: Currently, Data Interpreter only detects sub-tables in your Excel spreadsheets and doesn't support specifying the starting row for text files and spreadsheets.
The example below shows the results of using Data Interpreter on an Excel spreadsheet in the Connections pane. Data Interpreter detected two additional sub-tables.
|Before Data Interpreter||After Data Interpreter|
To use Data Interpreter, complete the following steps:
Select Connect to Data then select Microsoft Excel.
Select your file and click Open.
Select the Use Data Interpreter check box.
Drag the new table to the Flow pane to include it in your flow. To remove the old table, right-click the Input step for the old table and select Remove.
When working with multiple files or database tables from a single data source, you can search for files or tables using a wildcard search and then union the data to include all of the file or table data in the Input step. To union files, the files must be in the same parent or child directory.
To union data tables, the tables must be in the same database and the database connection must support wildcard union. The following databases support this type of union:
Microsoft SQL Server
If you add or remove files or tables after you create the union you can refresh the Input step to update your flow with the new or changed data.
Note: Currently, this feature applies only to Excel and .csv (text) files and data tables stored in the specific databases listed above. This option is not available for Tableau data extracts.
Wildcard union for files is available in Tableau Prep Builder version 2018.1.2 and later. Wildcard union for database tables is available in Tableau Prep Builder version 2018.3.1 and later. Editing a flow connection with this type of union in a prior version can result in errors.
If you need to union data from different data sources, you can do that using a Union step. For more information about creating Union steps, see Join or Union Data.
By default, Tableau Prep Builder unions all .csv files in the same directory as the .csv file you connected to or all the sheets in the Excel file you connected to. If you use Data Interpreter to clean Excel files and are using Tableau Prep Builder version 2018.1.2 or later, you can use the wildcard search to union and add any sub-tables that Data Interpreter found.
If you want to change the default union, use the following criteria to find the files or sheets you want to include in the union:
Search in: Select the directory to use to search for files. Select the Include subfolders check box to include files in the sub-directory of the parent folder.
Files: Select whether to include or exclude the files that match the wildcard search criteria.
Matching Pattern (xxx*): Enter a wildcard search pattern to find files that have those characters in the file name. For example, if you enter ord* all files that include the file name are returned. Leave this field blank to include all of the files in the specified directory.
To use wildcard search to union files:
Click the Add connection button and under Connect, click Text File for .csv files or Microsoft Excel for Excel files, and then select a file to open.
In the Input pane, select the Multiple Files tab, and then select Wildcard union.
The example below shows a wildcard union using a matching pattern. The plus sign on the file icon on the Orders_Central Input step in the Flow pane indicates that this step includes a wildcard union. The files in the union are listed under Included files.
Use the search, file and matching pattern options to find the files that you want to union.
Click Apply to union the files.
When you add a new step to the flow, you can see all the files added to the data set in the File Paths field in the Profile pane. This field is added automatically.
Union database tables (version 2018.3.1 and later)
Click the Add connection button and under Connect,connect to a database that supports wildcard union.
Drag a table to the flow pane.
In the Input pane, select the Multiple Tables tab, and then select Wildcard union.
Use search, Tables and Matching Pattern options to find the tables that you want to union.
Only tables that display in the Connections pane in the Tables section can be included in the union. Wildcard search doesn't search across schemas or across the database connection to find tables.
Click Apply to union the table data.
When you add a new step to the flow, you can see all the tables added to the data set in the Tablea Names field in the Profile pane. This field is added automatically.
Merge fields after a union
After you create a union in the input step, you might want to merge fields. You can do this in any subsequent step, except for the Input or Output steps. For more information, see Additional merge field options.
In Tableau Prep Builder version 2019.1.3 and later, when you connect to databases that include tables with relationship data, Tableau Prep Builder can now detect and show which fields in a table are identified as the unique identifier (primary key) and which fields are identified as a related field (foreign key) as well as show the related table names for these fields.
A new column called Linked Keys shows in the Input pane and shows the following relationships if they exist:
Unique identifier. This field uniquely identifies each row in the table. There can be multiple unique identifiers in a table. The values in the fields must be unique and cannot be blank or null.
Related field (also known as a foreign key). This field relates the table to another table in the database. There can be multiple related fields in a table.
Both Unique Identifier and related field. The field is a unique identifier in this table and also relates the table to another table in the database.
You can leverage these relationships to quickly find and add the related tables to your flow or create joins from the Input step. This feature is available for any supported database connector where table relationships are defined.
- Connect to a database (such as Microsoft SQL Server) that contains relationship data for fields, such as unique identifiers or related fields (foreign key).
In the Input pane, click on a field that is marked as a related field or as both a unique identifier and related field.
A dialog opens that shows a list of related tables.
Hover on the table that you want to add or join and click the plus button to add the table to your flow, or click the join button to create a join with the selected table.
If you create a join, Tableau Prep Builder uses the defined field relationship to join the tables and shows you a preview of the join clauses that it will use to create the join.
Alternatively, you can join related tables from the menu in the Flow pane. Hover over a step until the plus icon appears, then select Add Join to see a list of related tables. Tableau Prep Builder creates the join based on the fields that make up the relationship between the two tables.
Note: If your table doesn't have table relationships defined, this option is not available.
For more information about working with joins, see Join your data.
To determine how much of your data set to include in the flow, you can configure your data set. When you connect to your data or drag tables into the Flow pane, an Input step is automatically added to the flow. This is always the first step in your flow. You can right-click the Input step to rename or remove it. If you're connected to an Excel or text file, you can also refresh the data from the Input step.
In the Input step, you can see the metadata profile for your data set. Here you can search for fields, see sample values, and perform actions to reduce the size of your data set, such as selecting the fields to include, selecting the data sample to work with, or applying filters to selected fields or rows.
You can also configure the field properties by changing the data type or field name for fields. For text files, you can also configure the text settings.
Note: Field values that include square brackets are automatically converted to parentheses.
Connect to a custom SQL query
If your database supports using custom SQL, you will see Custom SQL displayed near the bottom of the Connections pane. Double-click Custom SQL to open the Custom SQL tab where you can enter queries to preselect data and use source-specific operations. After the query retrieves the data set, you can select the fields to include, apply filters, or change the data type before adding the data to your flow.
Select fields to include in the flow
The Input pane shows you a list of fields in your data set. You can use the Search field to find fields in the list, and then use the check boxes to select the fields to include or exclude. To include or exclude all fields from the flow, toggle the check box at the top left of the grid.
Configure field properties
When you work with text files, you see a Text Settings tab where you can edit your connection and configure text properties, such as the field separator for text files. You can also edit the file connection in the Connections pane.
When you work with text or Excel files, you can correct data types that have been inferred incorrectly before you even start your flow. Data types can always be changed in subsequent steps in the Profile pane after you start your flow.
Configure text settings in text files
To change the settings used to parse text files, select from the following options:
First line contains header (default): Select this option to use the first row as the field labels.
Generate field names automatically: Select this option if you want Tableau Prep Builder to auto-generate the field headers. The field naming convention follows the same model as Tableau Desktop. For example F1, F2, and so on.
Field Separator: Select a character from the list to use to separate the columns. Select Other to enter a custom character.
Text Qualifier: Select the character that encloses the values in the file.
Character Set: Select the character set that describes the text file encoding.
Locale: Select the locale to use to parse the file. This setting indicates which decimal and thousand separator to use.
Change data types
To change the data type for a field, do the following:
Click the data type for the field.
Select the new data type from the menu.
Change field names
To change the name of a field, in the Field Name column, select the name, and then type the new name in the field. An annotation is added in the field grid and in the flow pane to the left of the Input step. Your changes are also tracked in the Changes pane.
By default, Tableau Prep Builder limits the data included in the flow to a representative sample of your data set to maintain peak performance. The data sample is determined by calculating the optimal number of rows based on the total number of fields in the data set and the data types for those fields. Tableau Prep Builder then retrieves the top number of rows for the calculated amount as quickly as possible.
The resulting data sample may include all the rows you need, or it may not, depending on how the sample was calculated and returned. If you don't see the data that you expect, you can change the data sample settings to run the query again.
Note: If your data is sampled, a Sampled indicator shows in the Profile pane and persists for every step you add. Any changes you make apply to the sample you are working with in the flow. All changes apply to your entire data set when you run the flow.
To change your data sample settings, select an Input step, then on the Data Sample tab select from the following options:
Default sample amount (default): Tableau Prep Builder calculates the total number of rows to return.
Use all data: Retrieve all rows in your data set regardless of size. This can impact performance or cause Tableau Prep Builder to time out.
Note: To maintain performance, a data sample limit of 1 million rows is always applied to Aggregate, Join, Union and Pivot step types regardless of this setting.
Fixed number of rows: Select the number of rows to return from the data set. The recommended number of rows is 1 million or less. Setting the number of rows to more than 1 million can impact performance.
Quick select (default): The database returns the number of rows requested as quickly as possible. This might be the first N number of rows or the rows that the database had cached in memory from a previous query.
Random sample: The database returns the number of rows requested but looks at every row in the data set and returns a representative sample from all of the rows. This option may impact performance when the data is first retrieved.
Apply filters to fields in the Input step
To filter a field, do the following:
In the toolbar click Filter Values.
Enter your filter criteria in the calculation editor.
Additional filtering capabilities are available in other steps in the flow. For more information, see Filter values.