Connect to Data
To use Tableau Prep Builder to clean and prepare your data, start a new flow by connecting to your data, just like in Tableau Desktop. You can also open an existing flow and pick up where you left off.
You can see and access your most recent flows right on the Start page, so it's easy to find your work in progress. After you connect to your data, use the different options in the Input step to identify the data that you want to work with in your flow.
Tableau Prep Builder supports connections to popular types of data as well as Tableau data extracts (.tde or .hyper). New connectors are added with each new version, so check the Connections pane to see if your connector is available.
Note: If you open a flow in a version where the connector isn't supported, the flow may open but might have errors or won't run unless the data connections are removed.
To check whether you can connect to your data, open Tableau Prep Builder and click the Add connection button to see if a connector for your data is listed in the left pane under Connect.
Note: Some connectors might require you to download and install a driver before you can connect to your data. See the page on the Tableau website to get driver download links and installation instructions.
Working with Tableau data extracts
When you connect to a Tableau data extract, Tableau Prep Builder unpackages the extract and hyper expands, using a lot of temp space as it applies your flow operations to the resulting raw data.
This means you may need more RAM and disk space to accommodate a file that size. For example, an extract file with 18 columns and 1.2 million rows that is 360MB (8.5 GB uncompressed) may need up to 32GB RAM, 16-core, and 500GB of disk space available to support the file when it is unzipped.
Start a new flow
Open Tableau Prep Builder and click the Add connection button.
From the list of connectors, select the file type or server that hosts your data. If prompted, enter the information needed to sign in and access your data.
From the Connections pane, do one of the following:
If you connected to a file, double-click or drag a table to the Flow pane to start your flow.
For single tables, Tableau Prep Builder automatically creates an Input step for you in the Flow pane when you add data to your flow.
If you connected to a database, select a database or schema, and then double-click or drag a table to the Flow pane to start your flow.
Open an existing flow
To open an existing flow, on the Start page do one of the following :
Under Recent Flows, select a flow.
Click Open a Flow to navigate to your flow file and open it.
Click File > New to open a new workspace.
Refresh data in the Input step
If data changes in your input files or tables after you begin working with your flow, you can refresh the Input step to bring in the new data.
Use one of the following options:
In the flow pane on the top menu, click the Refresh button to refresh all Input steps. To refresh a single Input step, click the drop-down arrow next to the refresh button and select the Input step from the list.
In the flow pane, right-click the Input step you want to refresh and select Refresh from the menu.
Remove and re-add the Input step to the flow.
In the flow pane, right-click the Input step you want to refresh and select Remove from the menu.
This will temporarily put your flow in an error state.
Connect to the updated file again.
Drag the table to the flow pane on top of the second step in the flow where you want to add the Input step. Drop it on the Add option to reconnect it to the flow.
When working with Microsoft Excel files, you can use Data Interpreter to detect sub-tables in your data as well as remove extraneous information to help prepare your data for analysis. When you turn on Data Interpreter, it detects these sub-tables and lists them as new tables in the Tables section of the Connections pane.
You can then drag them into the Flow pane. If you are using Tableau Prep Builder version 2018.1.2 or later, you can select the Wildcard union option in the Multiple Files tab to include all found sub-tables in your flow. For more information about using Wildcard union in the Input step see Use wildcard union for files and database tables.
If you turn Data Interpreter off, these tables are removed from the Connections pane. If these tables are already used in the flow, this will result in flow errors from the missing data.
Note: Currently, Data Interpreter only detects sub-tables in your Excel spreadsheets.
The example below shows the results of using Data Interpreter on an Excel spreadsheet in the Connections pane. Data Interpreter detected two additional sub-tables.
|Before Data Interpreter||After Data Interpreter|
To use Data Interpreter, complete the following steps:
Select Connect to Data then select Microsoft Excel.
Select your file and click Open.
Select the Use Data Interpreter check box.
Drag the new table to the Flow pane to include it in your flow. To remove the old table, right-click the Input step for the old table and select Remove.
When working with multiple files or database tables from a single data source, you can search for files or tables using a wildcard search and then union the data to include all of the file or table data in the Input step. To union files, the files must be in the same parent or child directory.
To union data tables, the tables must be in the same database and the database connection must support wildcard union. The following databases support this type of union:
Microsoft SQL Server
If you add or remove files or tables after you create the union you can refresh the Input step to update your flow with the new or changed data.
Note: Currently, this feature applies only to Excel and .csv (text) files and data tables stored in the specific databases listed above.
Wildcard union for files is available in Tableau Prep Builder version 2018.1.2 and later. Wildcard union for database tables is available in Tableau Prep Builder version 2018.3.1 and later. Editing a flow connection with this type of union in a prior version can result in errors.
If you need to union data from different data sources, you can do that using a Union step. For more information about creating Union steps, see Join or Union Data.
By default, Tableau Prep Builder unions all .csv files in the same directory as the .csv file you connected to or all the sheets in the Excel file you connected to. If you use Data Interpreter to clean Excel files and are using Tableau Prep Builder version 2018.1.2 or later, you can use the wildcard search to union and add any sub-tables that Data Interpreter found.
If you want to change the default union, use the following criteria to find the files or sheets you want to include in the union:
Search in: Select the directory to use to search for files. Select the Include subfolders check box to include files in the sub-directory of the parent folder.
Files: Select whether to include or exclude the files that match the wildcard search criteria.
Matching Pattern (xxx*): Enter a wildcard search pattern to find files that have those characters in the file name. For example, if you enter ord* all files that include the file name are returned. Leave this field blank to include all of the files in the specified directory.
To use wildcard search to union files:
Click the Add connection button and under Connect, click Text File for .csv files or Microsoft Excel for Excel files, and then select a file to open.
In the Input pane, select the Multiple Files tab, and then select Wildcard union.
The example below shows a wildcard union using a matching pattern. The plus sign on the file icon on the Orders_Central Input step in the Flow pane indicates that this step includes a wildcard union. The files in the union are listed under Included files.
Use the search, file and matching pattern options to find the files that you want to union.
Click Apply to union the files.
When you add a new step to the flow, you can see all the files added to the data set in the File Paths field in the Profile pane. This field is added automatically.
Union database tables
Click the Add connection button and under Connect,connect to a database that supports wildcard union.
Drag a table to the flow pane.
In the Input pane, select the Multiple Tables tab, and then select Wildcard union.
Use search, Tables and Matching Pattern options to find the tables that you want to union.
Only tables that display in the Connections pane in the Tables section can be included in the union. Wildcard search doesn't search across schemas or across the database connection to find tables.
Click Apply to union the table data.
When you add a new step to the flow, you can see all the tables added to the data set in the Tablea Names field in the Profile pane. This field is added automatically.
Merge fields after a wildcard union
After you create a wildcard union, you might want to merge fields. You can do this in any subsequent step, except for the Input or Output steps. For more information, see Additional merge field options.
To determine how much of your data set to include in the flow, you can configure your data set. When you connect to your data or drag tables into the Flow pane, an Input step is automatically added to the flow. This is always the first step in your flow. You can right-click the Input step to rename or remove it. If you're connected to an Excel or text file, you can also refresh the data from the Input step.
In the Input step, you can see the metadata profile for your data set. Here you can search for fields, see sample values, and perform actions to reduce the size of your data set, such as selecting the fields to include, selecting the data sample to work with, or applying filters to selected fields or rows.
You can also configure the field properties by changing the data type or field name for fields. For text files, you can also configure the text settings.
Note: Field values that include square brackets are automatically converted to parentheses.
Connect to a custom SQL query
If your database supports using custom SQL, you will see Custom SQL displayed near the bottom of the Connections pane. Double-click Custom SQL to open the Custom SQL tab where you can enter queries to preselect data and use source-specific operations. After the query retrieves the data set, you can select the fields to include, apply filters, or change the data type before adding the data to your flow.
Select fields to include in the flow
The Input pane shows you a list of fields in your data set. You can use the Search field to find fields in the list, and then use the check boxes to select the fields to include or exclude. To include or exclude all fields from the flow, toggle the check box at the top left of the grid.
Configure field properties
When you work with text files, you see a Text Settings tab where you can edit your connection and configure text properties, such as the field separator for text files. You can also edit the file connection in the Connections pane.
When you work with text or Excel files, you can correct data types that have been inferred incorrectly before you even start your flow. Data types can always be changed in subsequent steps in the Profile pane after you start your flow.
Configure text settings in text files
To change the settings used to parse text files, select from the following options:
First line contains header (default): Select this option to use the first row as the field labels.
Generate field names automatically: Select this option if you want Tableau Prep Builder to auto-generate the field headers. The field naming convention follows the same model as Tableau Desktop. For example F1, F2, and so on.
Field Separator: Select a character from the list to use to separate the columns. Select Other to enter a custom character.
Text Qualifier: Select the character that encloses the values in the file.
Character Set: Select the character set that describes the text file encoding.
Locale: Select the locale to use to parse the file. This setting indicates which decimal and thousand separator to use.
Change data types
To change the data type for a field, do the following:
Click the data type for the field.
Select the new data type from the menu.
Change field names
To change the name of a field, in the Field Name column, select the name, and then type the new name in the field. An annotation is added in the field grid and in the flow pane to the left of the Input step. Your changes are also tracked in the Changes pane.
By default, Tableau Prep Builder limits the data included in the flow to a representative sample of your data set to maintain peak performance. The data sample is determined by calculating the optimal number of rows based on the total number of fields in the data set and the data types for those fields. Tableau Prep Builder then retrieves the top number of rows for the calculated amount as quickly as possible.
The resulting data sample may include all the rows you need, or it may not, depending on how the sample was calculated and returned. If you don't see the data that you expect, you can change the data sample settings to run the query again.
Note: If your data is sampled, a Sampled indicator shows in the Profile pane and persists for every step you add. Any changes you make apply to the sample you are working with in the flow. All changes apply to your entire data set when you run the flow.
To change your data sample settings, select an Input step, then on the Data Sample tab select from the following options:
Default sample amount (default): Tableau Prep Builder calculates the total number of rows to return.
Use all data: Retrieve all rows in your data set regardless of size. This can impact performance or cause Tableau Prep Builder to time out.
Note: To maintain performance, a data sample limit of 1 million rows is always applied to Aggregate, Join, Union and Pivot step types regardless of this setting.
Fixed number of rows: Select the number of rows to return from the data set. The recommended number of rows is 1 million or less. Setting the number of rows to more than 1 million can impact performance.
Quick select (default): The database returns the number of rows requested as quickly as possible. This might be the first N number of rows or the rows that the database had cached in memory from a previous query.
Random sample: The database returns the number of rows requested but looks at every row in the data set and returns a representative sample from all of the rows. This option may impact performance when the data is first retrieved.
Apply filters to fields in the Input step
To filter a field, do the following:
In the toolbar click Filter Values.
Enter your filter criteria in the calculation editor.
Additional filtering capabilities are available in other steps in the flow. For more information, see Filter values.