Examine and Filter Your Data
Get a good understanding of the composition of your data to better understand changes you need to make, and the effect of the operations you include in the flow.
After you connect to your data, add a table to the flow, and then add a step. You can use the Profile pane to see the current state and structure of your data and spot nulls and outliers.
Number of fields and rows: In the upper-left corner of the Profile pane you can find information that summarizes the number of fields and rows in the data at a particular point in the flow. Tableau Prep Builder rounds to the nearest thousand. In the example below, there are 21 fields and 3ooo rows in the data set.
When you hover over the number of fields and rows, you can see the exact number of rows (in this example, 2848):
Data set size: Work with a subset of your data by specifying the number of rows to include in the Data Sample tab in the Input pane.
Sampled: To enable you to interact directly with your data, Tableau Prep Builder works with a subset of your raw data. The number of rows is determined by the data types and number of fields that are being rendered. String fields take more storage space than integers, so if you have 10 fields of strings in your data set, you might get fewer rows than if you had 10 fields of integers. A Sampled indicator displays next to the size details in the Profile pane to indicate that this is a subset of your data set. For more information about data samples, see "Set your data sample size" in Connect to Data.
Number of unique values: The number next to each field header represents the distinct values that are contained within that field. Tableau Prep Builder rounds to the nearest thousand. In the example below, there are 3,000 distinct values that are represented in the Description field, but if you hover over the number, you can see the exact number of unique values.
Like Tableau Desktop, Tableau Prep Builder interprets the data in your fields when you drag a connection to the Flow pane and automatically assigns a data type to it. Because different databases can handle data in different ways, Tableau Prep Builder’s interpretation might not always be correct.
To change a data type, click the data type icon and select the correct data type from the context menu. You can change string or integer data types to Date or Date & Time, and Tableau Prep Builder will trigger Auto DateParse to change these data types. Like Tableau Desktop, if the change is not successful you will see Null values in the fields instead and you can create a calculation to make the change.
To make it easier to identify field values that aren't valid, you can assign a data role to your field the same way you assign a data type. This tells Tableau Prep Builder what your data values represent so it can automatically validate values and highlight ones that aren't valid for that role.
Data roles tell Tableau Prep Builder what the field values mean or represent. For example if you have field values for geographical data, you can assign a data role of City and Tableau Prep Builder compares the values in the field to a set of known domain values or patterns to identify values that don't match.
Note: Each field is analyzed independently so a City value of "Portland" in State "Washington" in Country "USA" might not be a valid city and state combination, but it won't be identified that way because it is a valid city name.
Tableau Prep Builder supports the following data roles:
Geographic roles (Based on current geographic data and is the same data used by Tableau Desktop)
- Area code (U.S.)
- Congressional District (U.S.)
- NUTS Europe
- Zip code/Postal code
Tip: In Tableau Prep Builder version 2019.1.4 and later, if you assign a geographic role to a field, you can also use that data role to match and group values with the standard value defined by your data role. For more information about grouping values using data roles, see Group similar values by data role .
To set a data role for your field, do the following:
In the Profile pane, Results pane or data grid, click the data type for the field.
Select the data role for the field.
Tableau Prep Builder compares the field's data values to known domain values or patterns (for email or URL) for the data role you select and marks any values that don't match with a red exclamation point.
Click the drop-down arrow for the field and from the Show Values section select an option to show all values or only values that are valid or not valid for the data role.
Use the cleaning options on the More optionsmenu for the field to correct any values that aren't valid. For more information about how to clean your field values see Apply cleaning operations.
By default, Tableau Prep Builder groups numerical, date, and datetime values in a field into buckets. These buckets are also known as bins. The bins ensure that you can see the distribution of values as a whole and quickly identify outliers and null values. The bin size is calculated based on the minimum and maximum values in the field, and null values are always shown at the top of the distribution.
For example, order and ship dates are summarized or "binned" by year. Each bin represents a year from January of the beginning year to January of the following year and labeled accordingly. Because there are sales dates and ship dates that fall in the latter part of 2018 and 2019, a bin is created for the following year for those values.
If a discrete (or categorical) data field contains many rows or has a distribution that is large enough that it can’t be displayed in the field without scrolling, you can see a summarized distribution to the right of the field. You can click and scroll through the distribution to target specific values.
When your data contains numeric or date fields, you can toggle to display the detailed (discrete) version of the values or a summarized (continuous) version of the values. The summarized view shows you the range of values in a field and the frequency with which certain values appear.
This toggle can help you isolate unique values (like the number of “3” records in a field) or the distribution of values (like the sum of all “3” records in a field)
To toggle your view:
In the Profile pane, Results pane or data grid, click the More optionsmenu for a numeric or date field.
In the context menu, select Detail to see the detailed version of the values, or Summary to see the distributed version of the values.
Sort options on a profile card let you sort the bins (the count of values represented by the distribution bars) in ascending or descending order, or the individual field values in alphabetical order.
If you want to rearrange the order of your fields, in the Profile pane, Results pane or Data grid simply select a profile card or field in the data grid and drag it until you see the black target line appear. Then drop it into place. The Profile pane and data grid are synced so the field will appear in the same order in both places.
Tableau Prep Builder makes it easy to find fields and values in your flow data. Trace where a field originated and where it is used throughout the flow in the flow pane, or click individual values in a profile card or in the data grid to highlight related or identical values.
In Tableau Prep Builder version 2018.3.3 and later, you can highlight everywhere a field is used in a flow, even where it originated to help you track down missing values or troubleshoot a flow when you aren't seeing the results you expect.
Click on a field in the Profile pane in a cleaning step or in the Results pane in any other step type and the flow pane will highlight the path where that field is used.
Note: This option is not available for Input or Output step types.
You can use highlighting to find related values across fields. When you click a value in the Profile card in the Profile pane or Results pane, all the related values in the other fields are highlighted in blue. The blue color shows the relationship distribution between the value you selected and the values in the other fields.
For example, to highlight related values, in the Profile pane, click a value in a field. The related values in other fields turn blue and the proportion of the bar highlighted in blue represents the degree of association.
When you select a value in the data grid, all identical values are highlighted too. These highlights help you identify patterns or irregularities in your data.
As you work with your data in your flow you might want to remove unwanted fields. In the Profile pane or the data grid in any cleaning or action step, select one or more fields and right-click or Ctrl-click (MacOS) and select Remove to remove the selected fields, or select Keep Only (version 2019.2.2 and later) to keep only the selected fields and remove all unselected fields.
If you wanted to experiment with your cleaning operations on a field but don't want to change the original data, you can duplicate your fields .
In the profile pane, data grid or results pane, select the field you want to duplicate.
From the More options menu, select Duplicate Field.
A new field is created with the same name and a modifier. For example, "Ship Date -1".
In the Profile pane or Results pane, you can search for fields or values of particular interest to you and use the search results to filter your data.
To search for fields, enter a full or partial search term in the search box on the toolbar.
To search for a value in a field:
Click the Search icon for a field, and enter a value.
To use advanced search options, click the Search options... button.
To use the search results to filter the data, select Keep Only or Exclude.
In the Flow pane, a filter icon appears above affected steps.
An easy way to filter a value is to select it in a profile card, data grid or results card, right-click, and then select Keep Only or Exclude. You can also select Edit Value to edit the value in-line.
You can filter data at any step in the flow. To add a filter, click the More optionsmenu on the profile card, in the data grid or in the results pane.
Note: To apply a filter in the data grid More optionsmenu, click the Hide profile pane button and then click More options for the field you want to filter.
Filters available for each data type
|Data type||Available filters|
|String||Calculation, Wildcard Match, Null Values, Selected Values|
|Number||Calculation, Range of Values, Null Values, Selected Values|
|Date and Date & Time||Calculation, Range of Dates, Relative Date, Null Values, Selected Values|
Use a Calculation filter
When you select Calculation, the Add Filter dialog box opens. Enter the calculation, verify that it's valid, and click Save.
Use a Selected Values filter (version 2019.2.3 and later)
When you select Selected Values, you can pick and choose the values that you want to keep or exclude for a field, even values that aren't in your sample. In the right pane, click the Keep Only or Exclude tab to select your action, then enter search terms to search for values or click Add a value to add values that are in your data set but aren't included in your sample. Click Done to apply your filter.
Note: this filter options isn't available for Aggregation or Pivot step types.
Use a Range of Values filter
When you select Range of Values, you can specify a range or set minimum or maximum values.
Use Range of Dates filter
When you select Range of Dates, you can specify a range of dates or set a minimum or maximum date.
Use a Relative Date filter
When you select Relative Dates, you can filter the date range based on year, quarter, month, week, or day. You can also configure an anchor relative to a specific date, and include null values.
Use a Wildcard Match filter
When you select Wildcard Match, you can filter the field values to keep or exclude only those values that match your criteria. In the filter editor, select the Keep Only or Exclude tab, enter a value to match and then set the criteria to return the values you are looking for.
The filtered results display in the left pane of the filter editor so that you can review and experiment with your results. Once you have the results you want, click Done to apply your change.
Use a Null Values filter
When you select Null Values you can filter the values in the selected field to show only null values or exclude all null values.