How Forecasting Works in Tableau
Forecasting in Tableau uses a technique known as exponential smoothing. Forecast algorithms try to find a regular pattern in measures that can be continued into the future.
You typically add a forecast to a view that contains a date field and at least one measure. However, in the absence of a date, Tableau can create a forecast for a view that contains a dimension with integer values in addition to at least one measure.
All forecast algorithms are simple models of a real-world data generating process (DGP). For a high quality forecast, a simple pattern in the DGP must match the pattern described by the model reasonably well. Quality metrics measure how well the model matches the DGP. If the quality is low, the precision measured by the confidence bands is not important because it measures the precision of an inaccurate estimate.
Tableau automatically selects the best of up to eight models, the best being the one that generates the highest quality forecast. The smoothing parameters of each model are optimized before Tableau assesses forecast quality. The optimization method is global. Therefore, choosing locally optimal smoothing parameters that are not also globally optimal is not impossible. However, initial value parameters are selected according to best practices but are not further optimized. So it is possible for initial value parameters to be less than optimal. The eight models available in Tableau are among those described at the following location on the OTexts web site: A taxonomy of exponential smoothing methods.
When there is not enough data in the visualization, Tableau automatically tries to forecast at a finer temporal granularity, and then aggregates the forecast back to the granularity of the visualization. Tableau provides prediction bands which may be simulated or calculated from a closed form equation. All models with a multiplicative component or with aggregated forecasts have simulated bands, while all other models use the closed form equations.
Exponential smoothing models iteratively forecast future values of a regular time series of values from weighted averages of past values of the series. The simplest model, Simple Exponential Smoothing, computes the next level or smoothed value from a weighted average of the last actual value and the last level value. The method is exponential because the value of each level is influenced by every preceding actual value to an exponentially decreasing degree—more recent values are given greater weight.
Exponential smoothing models with trend or seasonal components are effective when the measure to be forecast exhibits trend or seasonality over the period of time on which the forecast is based. Trend is a tendency in the data to increase or decrease over time. Seasonality is a repeating, predictable variation in value, such as an annual fluctuation in temperature relative to the season.
In general, the more data points you have in your time series, the better the resulting forecast will be. Having enough data is particularly important if you want to model seasonality, because the model is more complicated and requires more proof in the form of data to achieve a reasonable level of precision. On the other hand, if you forecast using data generated by two or more different DGPs, you will get a lower quality forecast because a model can only match one.
Tableau tests for a seasonal cycle with the length most typical for the time aggregation of the time series for which the forecast is estimated. So if you aggregate by months, Tableau will look for a 12-month cycle; if you aggregate by quarters, Tableau will search for a four-quarter cycle; and if you aggregate by days, Tableau will search for weekly seasonality. Therefore, if there is a six-month cycle in your monthly time series, Tableau will probably find a 12-month pattern that contains two similar sub-patterns. However, if there is a seven-month cycle in your monthly time series, Tableau will probably find no cycle at all. Luckily, seven-month cycles are uncommon.
Tableau can use either of two methods for deriving season length. The original temporal method uses the natural season length of the temporal granularity (TG) of the view. Temporal granularity means the finest unit of time expressed by the view. For example, if the view contains either a continuous green date truncated to month or discrete blue year and month date parts, the temporal granularity of the view is month. The new non-temporal method, introduced with Tableau 9.3, uses periodic regression to check season lengths from 2 to 60 for candidate lengths.
Tableau automatically selects the most appropriate method for a given view. When Tableau is using a date to order the measures in a view, if the temporal granularity is quarterly, monthly, weekly, daily or hourly, the season lengths are almost certainly 4, 12, 13, 7 or 24, respectively. So only the length natural to the TG is used to construct the five seasonal exponential smoothing models supported by Tableau. The AIC of the five seasonal models and the three non-seasonal models are compared and the lowest returned. (For an explanation of the AIC metric, see Forecast Descriptions.)
When Tableau is using an integer dimension for forecasting, the second method is used. In this case there is no temporal granularity (TG), so potential season lengths must be derived from the data.
The second method is also used if the temporal granularity is yearly. Yearly series rarely have seasonality, but, if they do, it must also be derived from the data.
The second method is also used for views with temporal granularity of minute or second. If such series have seasonality, the season lengths are likely 60. However, when measuring a regular real world process, the process may have a regular repetition which does not correspond to the clock. So, for minutes and seconds, Tableau also checks for a length different from 60 in the data. This does not mean that Tableau can model two different season lengths at the same time. Rather, ten seasonal models are estimated, five with a season length of 60 and another five with the season length derived from the data. Whichever of the ten seasonal models or three non-seasonal models has the lowest AIC, that model is used to compute the forecast.
For series ordered by year, minute, or second, a single season length from the data is tested if the pattern is fairly clear. For integer ordered series, up to nine somewhat less clear potential season lengths are estimated for all five seasonal models, and the model with the lowest AIC is returned. If there are no likely season length candidates, only the non-seasonal models are estimated.
Since all selection is automatic when Tableau is deriving potential season lengths from the data, the default Model Type of “Automatic” in the Forecast Options Dialog Model Type menu does not change. Selecting “Automatic without seasonality” improves performance by eliminating all season length searching and estimation of seasonal models.
The heuristic that Tableau uses to decide when to use season lengths derived from the data depends on the distribution of errors for the periodic regression of each candidate season length. Since the assembly of season length candidates by periodic regression usually produces one or two clear winning lengths if seasonality actually exists in the data, the return of a single candidate indicates likely seasonality. In this case, Tableau estimates seasonal models with this candidate for year, minute and second granularity. The return of less than the maximum of ten candidates indicates possible seasonality. In this case, Tableau estimates seasonal models with all returned candidates for integer ordered views. The return of the maximum number of candidates indicates that errors for most length are similar. Therefore, the existence of any seasonality is unlikely. In this case, Tableau estimates only non-seasonal models for an integer-ordered or yearly ordered series, and only the seasonal models with a natural season length for other temporally ordered views.
For Model Type “Automatic” in integer-, year-, minute- and second-ordered views, candidate season lengths are always derived from the data whether or not they are used. Since model estimation is much more time consuming than periodic regression, the performance impact should be moderate.
In the Forecast Options dialog box, you can choose the model type Tableau users for forecasting. The Automatic setting is typically optimal for most views. If you choose Custom , then you can specify the trend and season characteristics independently, choosing either None, Additive, or Multiplicative:
An additive model is one in which the contributions of the model components are summed, whereas a multiplicative model is one in which at least some component contributions are multiplied. Multiplicative models can significantly improve forecast quality for data where the trend or seasonality is affected by the level (magnitude) of the data:
Keep in mind that you do not need to create a custom model to generate a forecast that is multiplicative: the Automatic setting can determine if a multiplicative forecast is appropriate for your data. However, a multiplicative model cannot be computed when the measure to be forecast has one or more values that are less than or equal to zero.
When you are forecasting with a date, there can be only one base date in the view. Part dates are supported, but all parts must refer to the same underlying field. Dates can be on Rows, Columns, or Marks (with the exception of the Tooltip target).
Tableau supports three types of dates, two of which can be used for forecasting:
Truncated dates reference a particular point in history with specific temporal granularity, such as February 2017. They are usually continuous, with a green background in the view. Truncated dates are valid for forecasting.
Date parts refer to a particular member of a temporal measure such as February. Each date part is represented by a different, usually discrete field (with a blue background). Forecasting requires at least a Year date part. Specifically, it can use any of the following sets of date parts for forecasting:
Year + quarter
Year + month
Year + quarter + month
Year + week
Custom: Month/Year, Month/Day/Year
Other date parts, such as Quarter or Quarter + month, are not valid for forecasting. See Convert Fields between Discrete and Continuous for more details about different date types.
Exact dates refer to a particular point in history with maximum temporal granularity such as February 1, 2012 at 14:23:45.0. Exact dates are invalid for forecasting.
It is also possible to forecast without a date. See Forecasting When No Date is in the View.
When you create a forecast, you select a date dimension that specifies a unit of time at which date values are to be measured. Tableau dates support a range of such time units, including Year, Quarter, Month, and Day. The unit you choose for the date value is known as the granularity of the date.
The data in your measure typically does not align precisely with your unit of granularity. You might set your date value to quarters, but your actual data may terminate in the middle of a quarter—for example, at the end of November. This can cause a problem because the value for this fractional quarter is treated by the forecasting model as a full quarter, which will typically have a lower value than a full quarter would. If the forecasting model is allowed to consider this data, the resulting forecast will be inaccurate. The solution is to trim the data, such that the trailing periods that could mislead the forecast are ignored. Use the Ignore Last option in the Forecast Options dialog box to remove—or trim—such partial periods. The default is to trim one period.
Tableau requires at least five data points in the time series to estimate a trend, and enough data points for at least two seasons or one season plus five periods to estimate seasonality. For example, at least nine data points are required to estimate a model with a four quarter seasonal cycle (4 + 5), and at least 24 to estimate a model with a twelve month seasonal cycle (2 * 12).
If you turn on forecasting for a view that does not have enough data points to support a good forecast, Tableau can sometimes retrieve enough data points to produce a valid forecast by querying the datasource for a finer level of granularity:
If your view contains fewer than nine years of data, by default, Tableau will query the data source for quarterly data, estimate a quarterly forecast, and aggregate to a yearly forecast to display in your view. If there are still not enough data points, Tableau will estimate a monthly forecast and return the aggregated yearly forecast to your view.
If your view contains fewer than nine quarters of data, by default Tableau will estimate a monthly forecast and return the aggregated quarterly forecast results to your view.
If your view contains fewer than nine weeks of data, by default, Tableau will estimate a daily forecast and return the aggregated weekly forecast results to your view.
If your view contains fewer than nine days of data, by default, Tableau will estimate an hourly forecast and return the aggregated daily forecast results to your view.
If your view contains fewer than nine hours of data, by default, Tableau will estimate an minutely forecast and return the aggregated hourly forecast results to your view.
If your view contains fewer than nine minutes of data, by default, Tableau will estimate an secondly forecast and return the aggregated minutely forecast results to your view.
These adjustments happen behind the scene and require no configuration. Tableau does not change the appearance of your visualization, and does not actually change your date value. However, the summary of the forecast time period in the Forecast Describe and Forecast Options dialog will reflect the actual granularity used.
Tableau can only get more data when the aggregation for the measure you are forecasting is SUM or COUNT. See Data Aggregation in Tableau for information on available aggregation types and information on how to change the aggregation type.