Was this page helpful?
Yes No

Clean and Shape Data

Clean and shape data to make it easier to combine with other data or to make it simpler for other people to understand.

In this article

Build your flow

After you connect to the data that you want to include in your flow, begin cleaning and shaping your data by adding new steps to the flow. Hover over a step until the plus icon appears, and then click the icon to display options.

Use these options to build your flow:

  • Add Branch: Split your flow into different branches.

  • Add Step or Insert Step: Add a step to a new flow or insert a step into an existing flow to perform cleaning operations.

  • Add Aggregate: Select the step that includes the data you want to aggregate or group.

  • Add Pivot: Select the step that includes the data you want to pivot.

  • Add Join: Select the step that you want to join data with. As an alternative, you can drag-and-drop to join files. In the following example, we're dragging the Orders_Central Input step and dropping it on Join:

    For more information about creating a join, see Join or Union Data.

  • Add Union: Select the step that you want to union data with. As an alternative, you can drag-and-drop to union files. For more information about creating a union, see Join or Union Data.

  • Add Output: Select this option to save the output to an extract file (.tde or .hyper) or a .csv file, or to publish the output as a data source to a server.

Change the color scheme

Tableau Prep assigns each step in your flow a color by default. This color scheme is applied throughout the flow to help you keep track of your data throughout the flow as you apply cleaning steps, join, union or aggregate the data so you know which files are impacted by your operations.

To select a different color scheme for your steps do the following:

  1. Select one or more steps.

  2. Right-click on a selected step and select Edit Step Color.

  3. Click on a color in the color palette to apply it.

To reset the step color back to the default color, do one the following:

  • Click Undo from the top menu.

  • Cntrl+Z or Command-Shift-Z (Mac).

  • Select the steps you changed, right-click on a selected step and select Edit Step Color, then select Reset Step Color from the bottom of the color palette.

Add a description

As you build your flow and perform various cleaning operations, you might want to add a description to help others who might later look at or work with your flow better understand your steps.

You can add a description to any individual step in your flow directly on the Flow pane. The description can be up to 200 characters long.

When you add a description, a message icon is added underneath the step. Click the icon to show or hide the description text in the Flow pane.

To add a description to a step do the following:

  1. In the Flow pane, select a step.

  2. Do one of the following:

    • Right-click the step and select Add Description from the menu.

    • Double-click in the name field for the step, then click on Add a description.

  3. Type your description in the text box.

  4. Click outside the text box or press Enter to apply your changes. By default, the description displays underneath the step. To hide the description click the message icon.

  5. To edit or delete the description, right-click on the step or description and select Edit Description or Delete Description from the menu.

Remove steps from the flow

At any point in the flow, you can remove steps or the flow lines between steps.

  • To remove a step or flow line, select the step or line you want to remove, right-click the element, and then select Remove.

  • To remove multiple steps or flow lines, do one of the following:

    • Use your mouse to drag and select a whole section of the flow. Then right-click on one of the selected steps and select Remove

    • Press Ctrl+A (Cmd+A on Mac) to select all elements in the flow, or press Ctrl+click (Cmd+Click on Mac) to select specific elements, and then press the Delete key.

(Back to top)


Pivot your data

Sometimes analyzing data from a spreadsheet or crosstab format can be difficult in Tableau. Tableau prefers data to be "tall" instead of "wide", which means that you often have to pivot your data from columns to rows so that Tableau can evaluate it properly.

With Tableau Prep, you can pivot on one or more groups of fields to get the results you want from your data. Simply select the fields that you want to work with, pivot the data from columns to rows, and then interact directly with the results to get your data looking just the way you want it.

Note: Pivoting on multiple groups of fields is not supported in Tableau Prep 2018.1.1. Editing a pivot that includes pivoted columns on multiple field groups in Tableau Prep version 2018.1.1 can result in errors or unexpected results.

You can also use Tableau Prep's smart default naming feature to automatically rename your pivoted fields and values.

To pivot your data:

  1. Connect to your data source.

  2. Drag the table that you want to pivot to the Flow pane.

  3. Click the plus icon, and select Add Pivot from the context menu.

  4. (Optional) Search for fields to pivot.

  5. Select one or more fields from the left pane, and drag them to the Pivot1 Values column in the Pivoted Fields pane.

  6. (Optional) In the Pivoted Fields pane, click the plus icon to add more columns to pivot on, then repeat the previous step to select more fields to pivot.

    Note: You must select the same number of fields that you selected in Step 5. For example if you selected 3 fields to initially pivot on, then each subsequent column that you pivot on must also contain 3 fields.

  7. (Optional) Select the Automatically rename pivoted fields and values check box to enable Tableau Prep to rename the new pivoted fields using common values in the data. If no common values are found, the default name is used.

    Your results appear immediately in both the Pivot Results pane and the data grid.

  8. If you didn't enable the default naming option in the previous step, edit the names of the fields. You can also edit the names of the original fields in this pane to best describe the data.

  9. Rename the new Pivot step to keep track of your changes. For example "Pivot months".

Example: Pivoting on multiple fields

This example shows a spreadsheet for pharmaceutical sales, taxes and totals by month and year.

By pivoting the data you can create rows for each month and year and individual columns for sales, taxes and totals so that Tableau can more easily interpret this data for analysis.

Watch "pivot on multiple field" in action.

(Back to top)


Apply cleaning operations

You clean data by applying cleaning operations such as filtering, adding, renaming, splitting, grouping, or removing fields. In prior releases, cleaning operations could only be performed in the Clean step type. In Tableau Prep version 2018.2.1 and later, you could also perform cleaning operations in the data grid in a cleaning step.

Starting in Tableau Prep version 2018.3.1, you can perform cleaning operations in most step types in your flow. The one exception is the output step.

The following table shows which cleaning operations are available in each step type:

  Input Clean Aggregate Pivot Join Union Output
Filter X X X X X X  
Group and Replace   X X X X X  
Clean   X X X X X  
Split Values   X X X X X  
Rename Field X X X X X X  
Edit Value   X X X X X  
Change Data Type X X X X X X  
Create Calculated Field   X X X X X  
Remove Field X X X X X X  

As you make changes to your data, annotations are added to the corresponding step in the Flow pane and an entry is added in the Changes pane to track your actions. If you make changes in the Input step, the annotation shows to the left of the step in the Flow pane and is also shown in the Input profile in the field list.

The order that you apply your changes matters. Changes made in Aggregate, Pivot, Join and Union step types are performed either before or after those reshaping actions, depending on where the field is when you make the change. Where the change was made is shown in the Changes pane for the step.

For example, if you change a field value in a join clause, the change is performed prior to the join action to give you the corrected results.

The following tables shows where the cleaning action is performed in Aggregate, Pivot, Join and Union step types depending on where the field is in the step.

Note: For joins, if the field is a calculated field that was created using a field from one table, the change is applied before the join. If the field is created with fields from both tables, the change is applied after the join.

Apply cleaning operations in an input step

Only some cleaning operations are available in an Input step. You can make any of the following changes in the Input field list. Your changes are tracked in the Changes pane and annotations are added to the left of the Input step in the Flow pane and in the Input field list.

  • Filter: Click Filter Values in the toolbar then enter your filter criteria in the calculation editor.

  • Rename Field: In the Field Name field, double-click (CTRL+click on Mac) on the field name and enter a new field name.

  • Change Data Type: Click on the data type for the field and select a new data type from the menu.

  • Remove Field: Clear the check box next to the fields that you don't want to include in your flow.

Apply cleaning operations in other steps

All of the cleaning operations are available in the other step types. To apply cleaning operations to fields, use the toolbar options or click More options on the field profile card to open the menu. In Aggregate, Pivot, Join and Union step types, the More options menu is available on the profile cards in the summary panes.

Profile pane toolbar Drop-down menu

In a cleaning step you can also apply cleaning operations in the data grid. Just click the Hide profile pane button to collapse the Profile pane then click More options on a field to open the menu.

Note: Not all cleaning operations are available in the data grid. For example if you want to edit a value in-line, you must do this in the Profile pane.

To apply cleaning operations to a field:

  1. In the Profile pane, data grid, or Summary pane select the field you want to make changes to.

  2. From either the toolbar or More options for the field , select from the following options:

    • Filter or Filter Values: You can also right-click (CTRL + Click on Mac) a field value to keep or exclude values.

    • Group and Replace: Manually select values or use automatic grouping. You can also multi-select values in the Profile card and right-click (CTRL+click on Mac) to group or ungroup values or edit the group value.

    • Clean: Select from a list of quick cleaning operations to apply to all values in the field.

    • Split Values: Select either automatic or custom splits.

      Note: Automatic split and custom split work the same as they do in Tableau Desktop. For more information, see Split a Field into Multiple Fields in the Tableau Desktop and Web Authoring Help.

    • Rename Field

    • Create Calculated Field

    • Remove Field

  3. To edit a value, right-click (CTRL+click on Mac) one or more values and select Edit Value then enter a new value. You can also select Replace with Null to replace the values with a Null value or double-click in a single field to edit it directly. For more information about editing field values see Cleaning (fixing) variations of the same value.

  4. Review the results of these operations in the Profile pane, Summary panes or data grid.

View your changes

The different types of cleaning operations are represented by icons over the steps in your flow. If more than four types of operations are applied to a step, an ellipsis displays over the step. Hover over these icons to view annotations showing applied operations and the order in which they are performed.

You can also select a step and then expand the Changes pane to view the details for each change, edit or remove your changes, or drag changes up or down to change the order in which they're applied.

Cleaning annotation Changes pane

When viewing changes in an Aggregation, Pivot, Join, or Union step, the order that the change is applied shows either before or after the reshaping action. The order of these changes is applied by the system and cannot be changed. You can edit and remove the change.

(Back to top)


Merge fields

If you have fields that contain the same values but are named differently, you can easily merge them to combine them into one by dragging one field on top of the other. When you merge the fields, the field name of the target field persists.

Example:

Wildcard union results in 3 fields with the same values Merge 3 fields into 1

To merge fields, do one of the following:

  • Drag and drop one field onto another. A Drop to merge fields indicator displays.

  • Select multiple fields and right-click within the selection to open the context menu, and then click Merge Fields.

  • Select multiple fields, and then click Merge Fields on the toolbar.

For information about how to fix mismatched fields as a result of a union, see Fix fields that don’t match.

(Back to top)


Cleaning (fixing) variations of the same value

Multiple variations of the same value can prevent you from accurately summarizing your data. You can quickly and easily correct these variations using the following options in Tableau Prep.

Note: Any edits that you make to the values must be compatible with the field data type.

Edit a single value

  1. In the Profile card, click the value you want to edit, and enter the new value. A group icon shows next to the value.

    Alternatively, right-click a value and click Edit Value. The change is recorded in the Changes pane on the left side of the screen.

  2. View the results in the Profile pane, and data grid.

Edit multiple values

You have several options to edit multiple values at once. For example, use quick cleaning operations to remove punctuation for all values in a field, manually group values using multi-select, automatically group values together using fuzzy-match algorithms that find similar values or select multiple values and replace them with Null.

Note: When you map multiple values to a single value, the original field shows a group icon next to the value, showing you which values are grouped together.

Edit multiple values using quick cleaning operations

This option applies only to text fields.

  1. In the Profile pane, Summary pane or data grid, select the field you want to edit.

  2. Click More options, select Clean, and then select one of the following options:

    • Make Uppercase: Change all values to uppercase text.

    • Make Lowercase: Change all values to lowercase text.

    • Remove Letters: Remove all letters and leave only other characters.

    • Remove Numbers: Remove all numbers and leave letters and other characters.

    • Remove Punctuation: Remove all punctuation.

    • Trim Spaces: Remove leading and trailing spaces.

    You can stack operations to apply multiple cleaning operations to the fields. For example first select Clean > Remove Numbers and then select Clean > Remove Punctuation to remove all numbers and punctuation from the field values.

  3. To undo your changes, click the Undo arrow at the top of the Flow pane, or remove the change from the change list.

Group and edit multiple values inline

  1. In the Profile card, select the field you want to edit.

  2. Press Ctrl or Command (on Mac), and select the values that you want to group.

  3. Right-click, and select Group from the context menu. The value in the selection that you right-click becomes the default name for the new group but you can edit this in-line.

  4. To edit the group name, select the grouped field and edit the value or right-click or Ctrl+click (Mac) on the grouped field and select Edit Value from the context menu.

  5. To ungroup the grouped field values, right-click on the grouped field and select Ungroup from the context menu.

Replace one or more values with Null

If you have data rows that you want to include in your analysis but you want to exclude certain field values you can change them to a Null value.

  1. In the Profile card, press Ctrl or Command (Mac), and select the values that you want to change.

  2. Right-click or Ctrl+click (Mac), and select Repace with Null from the menu. The values are changed to Null and the group icon shows next to the value.

Edit multiple values manually using Group and Replace

Use Group and Replace to map the value of a field from one value to another or manually select multiple values to group them. You can even add new values to set up mapping relationships to organize your data.

For example, let’s say you have three values in a field: My Company, My Company Incorporated, and My Company Inc. All these values represent the same company, My Company. You can use Group and Replace to map the values My Company Incorporated and My Company Inc to My Company, so that all three values appear as My Company in the field.

Map multiple values to a single selected field

  1. In the Profile pane or Summary pane, select the field you want to edit.

  2. Click More options and select Group and Replace > Manual Selection from the menu.

  3. In the left pane of the Group and Replace editor, select the field value that you want to use as the grouping value. This value now shows at the top of the right pane.

  4. In the lower section of the right pane in the Group and Replace editor, select the values you want to add to the group.

    To remove values from the group, in the upper section of the right pane in the Group and Replace editor, clear the check box next to the values.

Create a group by selecting multiple values

  1. In the Profile pane or Summary pane, select the field you want to edit.

  2. Click More optionsand select Group and Replace > Manual Selection from the menu.

  3. In the left pane of the Group and Replace editor, select multiple values that you want to group.

  4. In the right pane of the Group and Replace editor, click Group Values.

    A new group is created using the last selected value as the group name. To edit the group name, select the grouped field and edit the value or right-click or Ctrl+click (Mac) on the grouped field and select Edit Value from the menu.

Edit multiple values using Group and Replace with fuzzy match

To search for and automatically group similar values, use one of the fuzzy match algorithms. Field values are grouped under the value that appears most frequently. Review the grouped values and add or remove values in the group as needed.

Choose one of the following options to group values:

  • Pronunciation: Find and group values that sound alike. This option uses the Metaphone 3 algorithm that indexes words by their pronunciation and is most suitable for English words. This type of algorithm is used by many popular spell checkers.

  • Common Characters: Find and group values that have letters or numbers in common. This option uses the ngram fingerprint algorithm that indexes words by their unique characters after removing punctuation, duplicates, and whitespace. This algorithm works for any supported language.

    For example, this algorithm would match names that are represented as "John Smith" and "Smith, John" because they both generate the key "hijmnost". Since this algorithm doesn't consider pronunciation, the value "Tom Jhinois" would have the same key "hijmnost" and would also be included in the group.

  • Spelling: Find and group text values that are spelled alike. This option uses the Levenshtein distance algorithm to compute an edit distance between two text values using a fixed default threshold. It then groups them together when the edit distance is less than the threshold value. This algorithm works for any supported language.

To automatically group and replace values using fuzzy match, do the following:

  1. In the Profile pane or Summary pane, select the field you want to edit.

  2. Click More optionsand select Group and Replace then select one of these options:

    • Pronunciation

    • Common Characters

    • Spelling

    Tableau Prep finds and groups values that match and replaces them with the value that occurs most frequently in the group.

  3. Review the groupings and manually add or remove values or edit them as needed. Then click Done.

Add and identify values that aren't in the data set

If you want to map values in your data set to a new value that doesn't exist, you can add it using Group and Replace. To easily identify any values that are not in the data set, these values are marked with a red dot next to the value name in the Group and Replace editor.

For example in the image below, Wyoming and Nevada aren’t in the data set.

Some reasons why a value might not be in the data set include the following:

  • You just added the new value manually.

  • The value is no longer in the data.

  • The value is in the data, but isn’t in the sampled data set.

To add a new value:

  1. In the Profile pane or Summary pane, select the field you want to edit.

  2. Click More options and select Group and Replace > Manual Selection from the context menu.

  3. In the left pane of the Group and Replace editor, click the plus to add a new value.

  4. Type a new value in the field and press Enter to add it.

  5. In the right pane, select the values that you want to map to the new value.

  6. (Optional) To add additional new values to your mapped value, click the plus button in the right pane in the Group and Replace editor.

(Back to top)


Aggregate and group values

Sometimes you’ll need to adjust the granularity of some data, either to reduce the amount of data produced from the flow, or to align data with other data you might want to join or union together. For example, you might want to aggregate sales data by customer before joining a sales table with a customer table.

If you need to adjust the granularity of your data, use the Add Aggregate option to create a step to aggregate or group data. Whether data is aggregated or grouped depends on the data type (string, number, or date).

  1. In the Flow pane, click the plus icon, and select Add Aggregate. A new aggregation step displays in the Flow pane and the Profile pane updates to show the aggregate and group profile.

  2. To group or aggregate fields, drag them from the left pane to one of the columns in the right pane.

    You can also:

    • Drag and drop fields between the two panes.

    • Search for fields in the list and select only the fields you want to include in your aggregation.

    • Double-click a field to add it to the left or right pane.

    • Change the function of the field to automatically add it to the appropriate pane.

    • Click Add All or Remove All to bulk apply or remove fields.

    • Apply cleaning operations to fields.

    Fields are distributed between the Grouped Fields and Aggregated Fields columns based on their data type. Click the group or aggregation type (for example, AVG or SUM) headings to change the group or aggregation type.

    In the data grids below the aggregation and group profile, you can see a sample of the members of the group or aggregation.

    Any cleaning operations that are made to the fields are tracked in the Changes pane.

(Back to top)