`library(lmForc)`

Linear forecasting models have two main features: simplicity and interpretability. These features allow the performance of linear forecasting models to be tested in multiple ways, both in-sample and out-out-sample. The *lmForc* package brings these performance tests to the R language with a philosophy that compliments the features of linear models: One, the syntax for creating linear models is simple and the syntax for testing them should be equally simple. Two, performance tests should be realistic and when possible replicate what it would have been like to forecast in real-time.

At the heart of the *lmForc* package is the `Forecast`

class. Base R does not provide a good format for working with forecasts, so *lmForc* addresses this by introducing a new class for storing forecasts that is simple and rigorous. The `Forecast`

class is paramount to the *lmForc* philosophy of simple syntax and realistic tests. `Forecast`

is an S4 class that contains equal length vectors with the following data:

`origin`

Â Â Â The time when the forecast was made.`future`

Â Â Â The time that is being forecasted.`forecast`

The forecast itself.`realized`

Â If available, the realized value at the time being forecasted.

The `Forecast`

class also includes an additional length-one slot `h_ahead`

for representing how many periods ahead are being forecasted. This slot is optional, but becomes useful for documentation and performing out-of-sample forecast tests.

We demonstrate the `Forecast`

class by constructing a simple `Forecast`

object. This forecast contains four observations at the quarterly frequency.

```
<- Forecast(
my_forecast origin = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31")),
future = as.Date(c("2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31")),
forecast = c(4.21, 4.27, 5.32, 5.11),
realized = c(4.40, 4.45, 4.87, 4.77),
h_ahead = 4L
)
```

Note that we have chosen Date objects at the quarterly frequency for our `origin`

and `future`

slots. This forecast is for four quarters ahead, so we fill in the `h_ahead`

slot with the integer four.

The `origin`

, `future`

, and `h_ahead`

slots can be filled with any values. This is where the general nature of the `Forecast`

class shines. In the `origin`

and `future`

slots we could use dates at a daily or yearly frequency, the `POSIXct`

class to include minutes and seconds, or integers to represent discrete periods. We can also store different types of forecasts. In the example above, we have forecasts made at different `origin`

times for a constant four quarter ahead forecast horizon. We could also store a forecast made at a single `origin`

time for a horizon of `future`

times by setting all of the `origin`

values to one time and using `future`

to represent the horizon of times that we are forecasting. In this case the `h_ahead`

slot becomes irrelevant and it is left as `NULL`

. The flexibility of these slots allows us to represent any type of numeric forecast.

The `forecast`

and `realized`

slots take numeric vectors. In the `forecast`

slot we see the forecast that was made at each `origin`

time and in the `realized`

slot we see the true value that was realized at each `future`

time. The realized values may not exist yet, so this slot may be partially populated or not populated at all. If the `realized`

slot is set to `NULL`

then it will be populated with a vector of `NA`

values.

The `Forecast`

class strikes a balance between simplicity and rigor. It is simple enough to store any numeric forecast, but it is rigorous enough to create a useful data structure. For example, we can quickly calculate a number of forecast accuracy metrics for the `Forecast`

object we created above using only one argument.

```
mse(my_forecast)
#> [1] 0.09665
rmse(my_forecast)
#> [1] 0.3108858
mae(my_forecast)
#> [1] 0.29
mape(my_forecast)
#> [1] 0.06182814
R2(my_forecast)
#> [1] 0.9973145
```

Because the `forecast`

and `realized`

slots must be numeric vectors, and all slots must be of the same length, we can calculate forecast accuracy metrics without having to worry about input validation or coercing multiple vectors to the correct format. The forecast accuracy metrics available in the *lmForc* package are calculated as follows where: \[
n = \text{forecast vector length}\\
\quad Y_i = \text{realized values}\\
\hat{Y_i} = \text{forecast values}
\]

MSE is calculated as: \[ MSE = \frac{1}{n} \sum_{i=1}^{n}{(Y_i - \hat{Y_i})^2} \]

RMSE is calculated as: \[ RMSE = \sqrt{MSE} \] MAE is calculated as: \[ MAE = \frac{1}{n} \sum_{i=1}^{n}{|\hat{Y_i} - Y_i|} \] MAPE is calculated as: \[ MAPE = \frac{1}{n} \sum_{i=1}^{n}\frac{|Y_i - \hat{Y_i}|}{{Y_i}} \] R2 is calculated as: \[ R^2 = cor(\hat{Y_i}, \ Y_i)^2 \]

Note that these equations require two inputs, but because both inputs are already stored in the `Forecast`

object we only need to pass one argument to the `mse()`

and `rmse()`

functions. Calculating forecast accuracy is a simple use case of the `Forecast`

class. More complex use cases exist in the *lmForc* package, where many of the functions require inputs to be of the `Forecast`

class. When weighting multiple forecasts or testing a linear model that is conditional on another forecast, the consistent structure of the class results in simple functions that execute correctly, no matter the type of forecast passed to the function. Furthermore, all *lmForc* functions return objects of the `Forecast`

class which creates synergy between functions. One can take two linear models, test their performance out-of-sample with the `oos_realized_forc()`

function which returns `Forecast`

objects, and then pass these two objects to the `performance_weighted_forc()`

function to find the weighted out-of-sample performance of both models.

One fear may be that the novel `Forecast`

class will not play well with functions and packages that already exist in the R language. The *lmForc* package provides methods for accessing all of the vectors stored in a `Forecast`

object as well as the `forc2df()`

function which returns one or multiple `Forecast`

objects as a data frame. With these methods, one can easily pass the data in a `Forecast`

object to other functions.

```
forc2df(my_forecast)
#> origin future my_forecast realized
#> 1 2010-03-31 2011-03-31 4.21 4.40
#> 2 2010-06-30 2011-06-30 4.27 4.45
#> 3 2010-09-30 2011-09-30 5.32 4.87
#> 4 2010-12-31 2011-12-31 5.11 4.77
origin(my_forecast)
#> [1] "2010-03-31" "2010-06-30" "2010-09-30" "2010-12-31"
future(my_forecast)
#> [1] "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-31"
forc(my_forecast)
#> [1] 4.21 4.27 5.32 5.11
realized(my_forecast)
#> [1] 4.40 4.45 4.87 4.77
```

Examples throughout the rest of the vignette will use a stylized dataset with a `date`

column of ten quarterly dates, a dependent variable `y`

, and two independent variables `x1`

and `x2`

. Equations are also written in terms of the variables `y`

, `x1`

, and `x2`

.

```
<- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
date "2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30"))
<- c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99)
y
<- c(4.22, 3.86, 4.27, 5.60, 5.11, 4.31, 4.92, 5.80, 6.30, 4.17)
x1
<- c(10.03, 10.49, 10.85, 10.47, 9.09, 10.91, 8.68, 9.91, 7.87, 6.63)
x2
<- data.frame(date, y, x1, x2)
data
head(data)
#> date y x1 x2
#> 1 2010-03-31 1.09 4.22 10.03
#> 2 2010-06-30 1.71 3.86 10.49
#> 3 2010-09-30 1.09 4.27 10.85
#> 4 2010-12-31 2.46 5.60 10.47
#> 5 2011-03-31 1.78 5.11 9.09
#> 6 2011-06-30 1.35 4.31 10.91
```

The `is_forc()`

function produces an in-sample forecast based on a linear model. The function takes a linear model call and an optional vector of time data associated with the linear model as inputs. The linear model is estimated once over the entire sample period and the coefficients are multiplied by the realized values in each period of the sample. This is functionally identical to the `predict()`

function in the *stats* package, but it returns a `Forecast`

object instead of a numeric vector.

For all observations *i* in the sample, coefficients are estimated as:

\[ Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i \qquad \text{for all } i \] And forecasts are estimated as:

\[ forecast_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i \qquad \]

```
is_forc(
lm_call = lm(y ~ x1 + x2, data),
time_vec = data$date
)#> h_ahead = 0
#>
#> origin future forecast realized
#> 1 2010-03-31 2010-03-31 1.394370 1.09
#> 2 2010-06-30 2010-06-30 1.138708 1.71
#> 3 2010-09-30 2010-09-30 1.423339 1.09
#> 4 2010-12-31 2010-12-31 2.358107 2.46
#> 5 2011-03-31 2011-03-31 2.024964 1.78
#> 6 2011-06-30 2011-06-30 1.450924 1.35
#> 7 2011-09-30 2011-09-30 1.894861 2.89
#> 8 2011-12-31 2011-12-31 2.502394 2.11
#> 9 2012-03-31 2012-03-31 2.867846 2.97
#> 10 2012-06-30 2012-06-30 1.384488 0.99
```

Note that because we are creating an in-sample forecast, `h_ahead`

is set to 0 and the `origin`

time equals the `future`

time. This test evaluates how well a linear forecast model fits the historical data.

The `oos_realized_forc()`

function produces an *h* period ahead out-of-sample forecast that is conditioned on realized values. The function takes a linear model call, an integer number of periods ahead to forecast, a period to end the initial coefficient estimation and begin forecasting, an optional vector of time data associated with the linear model, and an optional integer number of past periods to estimate the linear model over. The linear model is originally estimated with data up to `estimation_end`

minus the number of periods specified in the `estimation_window`

argument. For instance, if the linear model is being estimated on quarterly data and the `estimation_window`

is set to `20L`

, coefficients will be estimated using five years of data up to `estimation_end`

. If `estimation_window`

is set to `NULL`

then the linear model is estimated with all available data up to `estimation_end`

. Coefficients are multiplied by realized values of the covariates `h_ahead`

periods ahead to create an `h_ahead`

period ahead forecast. This process is iteratively repeated for each period after `estimation_end`

with coefficients updating in each period as more information would have become available to the forecaster. In each period, coefficients are updated based on all available information if `estimation_window`

is set to `NULL`

, or a rolling window of past periods if `estimation_window`

is set to an integer value.

In the sample *i*, for each period *p* greater than or equal to `estimation_end`

, coefficients are updated as:

\[
Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i
\qquad \text{for all} \quad p-w \leq i \leq \text{p}
\] Where *w* is the `estimation_window`

. `h_ahead`

*h* forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 X1_{p+h} + \beta_2 X2_{p+h} \qquad \]

```
oos_realized_forc(
lm_call = lm(y ~ x1 + x2, data),
h_ahead = 2L,
estimation_end = as.Date("2011-03-31"),
time_vec = data$date,
estimation_window = NULL,
return_betas = FALSE
)#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-03-31 2011-09-30 1.623750 2.89
#> 2 2011-06-30 2011-12-31 2.341664 2.11
#> 3 2011-09-30 2012-03-31 3.415198 2.97
#> 4 2011-12-31 2012-06-30 2.708308 0.99
```

Note that the `oos_realized_forc`

function returns an out-of-sample forecast that conditions on realized values that **would not** have been available to the forecaster at the forecast origin. This test evaluates the performance of a linear forecast model had it been conditioned on perfect information.

The `oos_lag_forc()`

function produces an *h* period ahead out-of-sample forecast conditioned on present period values. The function takes a linear model call, an integer number of periods ahead to forecast, a period to end the initial coefficient estimation and begin forecasting, an optional vector of time data associated with the linear model, and an optional integer number of past periods to estimate the linear model over. The linear model data is lagged by `h_ahead`

periods and the linear model is re-estimated with data up to `estimation_end`

minus the number of periods specified in the `estimation_window`

argument to create a lagged linear model. If `estimation_window`

is left `NULL`

then the linear model is estimated with all available lagged data up to `estimation_end`

. Coefficients are multiplied by present period realized values of the covariates to create a forecast for `h_ahead`

periods ahead. This process is iteratively repeated for each period after `estimation_end`

with coefficients updating in each period as more information would have become available to the forecaster. In each period, coefficients are updated based on all available information if `estimation_window`

is set to `NULL`

, or a rolling window of past periods if `estimation_window`

is set to an integer value.

In the sample *i*, for each period *p* greater than or equal to `estimation_end`

, coefficients are updated as:

\[
Y_i = \beta_0 + \beta_1 X1_{i-h} + \beta_2 X2_{i-h} + \epsilon_i
\qquad \text{for all} \quad p-w \leq i \leq \text{p}
\] Where *w* is the `estimation_window`

and *h* is `h_ahead`

. `h_ahead`

forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 X1_{p} + \beta_2 X2_{p} \qquad \]

```
oos_lag_forc(
lm_call = lm(y ~ x1 + x2, data),
h_ahead = 2L,
estimation_end = as.Date("2011-03-31"),
time_vec = data$date,
estimation_window = NULL,
return_betas = FALSE
)#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-03-31 2011-09-30 -2.100528 2.89
#> 2 2011-06-30 2011-12-31 2.174392 2.11
#> 3 2011-09-30 2012-03-31 2.813745 2.97
#> 4 2011-12-31 2012-06-30 1.807014 0.99
```

This test evaluates the performance of a lagged linear model had it been conditioned on present values that **would** have been available to the forecaster at the forecast origin. This is in contrast to conditioning on realized values or a forecast of the covariates.

The `oos_vintage_forc()`

function produces an out-of-sample forecast conditioned on *h* period ahead forecasts of the linear model covariates. The function takes a linear model call, a vector of time data associated with the linear model, a vintage forecast for each covariate in the linear model, and an optional integer number of past periods to estimate the linear model over. For each period in the vintage forecasts, coefficients are updated based on information that would have been available to the forecaster at the forecast origin. Coefficients are estimated over information from the last `estimation_window`

number of periods. If `estimation_window`

is left `NULL`

then coefficients are estimated over all of the information that would have been available to the forecaster. Coefficients are then multiplied by vintage forecast values to produce a replication of real time forecasts.

In the sample *i*, for each period *p* in the vintage forecasts *VF1* and *VF2*, coefficients are updated as:

\[ Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i \qquad \text{for all} \quad p-w \leq i \leq \text{p} \]

And `h_ahead`

*h* forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 VF1_p + \beta_2 VF2_p \qquad \]

We introduce stylized vintage forecasts of X1 and X2 to demonstrate the `oos_vintage_forc()`

function. Using four quarter ahead forecasts of the covariates X1 and X2, we create an out-of-sample forecast based on the coefficients and covariate forecasts that the forecaster would have used in each period.

```
<- Forecast(
x1_forecast_vintage origin = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
future = as.Date(c("2011-09-30", "2011-12-31", "2012-03-31", "2012-06-30")),
forecast = c(6.30, 4.17, 5.30, 4.84),
realized = c(4.92, 5.80, 6.30, 4.17),
h_ahead = 4L
)
<- Forecast(
x2_forecast_vintage origin = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
future = as.Date(c("2011-09-30", "2011-12-31", "2012-03-31", "2012-06-30")),
forecast = c(7.32, 6.88, 6.82, 6.95),
realized = c(8.68, 9.91, 7.87, 6.63),
h_ahead = 4L
)
oos_vintage_forc(
lm_call = lm(y ~ x1 + x2, data),
time_vec = data$date,
x1_forecast_vintage, x2_forecast_vintage,estimation_window = NULL,
return_betas = FALSE
)#> h_ahead = 4
#>
#> origin future forecast realized
#> 1 2010-09-30 2011-09-30 -2.497310 2.89
#> 2 2010-12-31 2011-12-31 1.194088 2.11
#> 3 2011-03-31 2012-03-31 1.620716 2.97
#> 4 2011-06-30 2012-06-30 1.470027 0.99
```

This test replicates the forecasts that a linear model conditional on forecasts of covariates would have produced in real-time. Here we see the strength of the `Forecast`

class. Because the vintage forecasts of X1 and X2 share the same data structure, we can calculate a forecast that is conditional on these objects without fearing inconsistency across inputs.

The `conditional_forc()`

function produces a forecast conditioned on forecasts of the linear model covariates. The function takes a linear model call, a vector of time data associated with the linear model, and a forecast for each covariate in the linear model. The linear model is estimated once over the entire sample period and the coefficients are multiplied by the forecasts of each covariate.

For all observations *i* in the sample, coefficients are estimated as:

\[ Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i \qquad \text{for all } i \]

And for all periods *p* in the covariate forecasts *F1* and *F2*, forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 F1_p + \beta_2 F2_p \qquad \]

The difference between `conditional_forc()`

and `oos_vintage_forc()`

is that in the `conditional_forc()`

function coefficients are only estimated once over all observations. Coefficients do not update based on what information would have been available to the forecaster at any given point in time. We introduce stylized forecasts of X1 and X2 to demonstrate the `conditional_forc()`

function. Because in this example we are making a conditional forecast for the future instead of testing past forecasts, we can condition on a horizon of forecasts. This is in contrast to the `oos_vintage_forc()`

example where we test the performance of four quarter ahead vintage forecasts.

```
<- Forecast(
x1_forecast origin = as.Date(c("2012-06-30", "2012-06-30", "2012-06-30", "2012-06-30")),
future = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
forecast = c(4.14, 4.04, 4.97, 5.12),
realized = NULL,
h_ahead = NULL
)
<- Forecast(
x2_forecast origin = as.Date(c("2012-06-30", "2012-06-30", "2012-06-30", "2012-06-30")),
future = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
forecast = c(6.01, 6.05, 6.55, 7.45),
realized = NULL,
h_ahead = NULL
)
conditional_forc(
lm_call = lm(y ~ x1 + x2, data),
time_vec = data$date,
x1_forecast, x2_forecast
)#> h_ahead =
#>
#> origin future forecast realized
#> 1 2012-06-30 2012-09-30 1.368054 NA
#> 2 2012-06-30 2012-12-31 1.297686 NA
#> 3 2012-06-30 2013-03-31 1.945655 NA
#> 4 2012-06-30 2013-06-30 2.044105 NA
```

This function is used to create a forecast for the present period or replicate a forecast made at a specific period in the past. Note that because we are forecasting into the future, `realized`

is `NULL`

. Also, because we are forecasting a horizon of dates, `h_ahead`

is `NULL`

.

The `historical_average_forc()`

function produces an *h* period ahead forecast based on the historical average of the series that is being forecasted. The function takes an average function, a vector of realized values, an integer number of periods ahead to forecast, a period to end the initial average estimation and begin forecasting, an optional vector of time data associated with the realized values, and an optional integer number of past periods to estimate the average over. The historical average is originally calculated with realized values up to `estimation_end`

minus the number of periods specified in `estimation_window`

. If `estimation_window`

is left `NULL`

then the historical average is calculated with all available realized values up to `estimation_end`

. In each period the historical average is set as the `h_ahead`

period ahead forecast. This process is iteratively repeated for each period after `estimation_end`

with the historical average updating in each period as more information would have become available to the forecaster.

If `avg_function`

is set to `mean`

, in the sample *i*, for each period *p* greater than or equal to `estimation_end`

, `h_ahead`

*h* forecasts are calculated as:

\[ forecast_{p+h} = \frac{1}{p-w} \sum_{i=p-w}^{p}{Y_i} \qquad \]

Where *Y* is the series being forecasted and *w* is the `estimation_window`

.

```
historical_average_forc(
avg_function = "mean",
realized_vec = data$y,
h_ahead = 2L,
estimation_end = as.Date("2011-03-31"),
time_vec = data$date,
estimation_window = 4L
)#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-03-31 2011-09-30 1.626 2.89
#> 2 2011-06-30 2011-12-31 1.678 2.11
#> 3 2011-09-30 2012-03-31 1.914 2.97
#> 4 2011-12-31 2012-06-30 2.118 0.99
```

`historical_average_forc()`

returns a historical average forecast where the `h_ahead`

period ahead forecast is simply the historical average or rolling window average of the series being forecasted. This replicates the historical average forecast that would have been produced in real-time and can serve as a benchmark for other forecasting models.

The `random_walk_forc()`

function produces an *h* period ahead forecast based on the last realized value in the series that is being forecasted. The function takes a vector of realized values, an integer number of periods ahead to forecast, and an optional vector of time data associated with the realized values. In each period, the current period value of the `realized_vec`

series is set as the `h_ahead`

period ahead forecast.

In the sample *i*, for each period *p*, `h_ahead`

*h* forecasts are calculated as:

\[ forecast_{p+h} = Y_p \]

Where *Y* is the series being forecasted.

```
random_walk_forc(
realized_vec = data$y,
h_ahead = 6L,
time_vec = data$date
)#> h_ahead = 6
#>
#> origin future forecast realized
#> 1 2010-03-31 2011-09-30 1.09 2.89
#> 2 2010-06-30 2011-12-31 1.71 2.11
#> 3 2010-09-30 2012-03-31 1.09 2.97
#> 4 2010-12-31 2012-06-30 2.46 0.99
```

`random_walk_forc()`

returns a random walk forecast where the `h_ahead`

period ahead forecast is simply the present value of the series being forecasted. This replicates the random walk forecast that a forecaster would have produced in real-time and can serve as a benchmark for other forecasting models.

The `autoreg_forc()`

function produces an *h* period ahead forecast based on an autoregressive (AR) model. The function takes a vector of realized values, an integer number of periods ahead to forecast, an integer number of lags to include in the autoregressive model, a period to end the initial model estimation and begin forecasting, an optional vector of time data associated with the realized values, and an optional integer number of past periods to estimate the autoregressive model over. An AR(`ar_lags`

) autoregressive model is originally estimated with realized values up to `estimation_end`

minus the number of periods specified in `estimation_window`

. If `estimation_window`

is left `NULL`

then the autoregressive model is estimated with all realized values up to `estimation_end`

. The AR(`ar_lags`

) model is estimated by regressing the vector of realized values on vectors of the same realized values that have been lagged by one to `ar_lags`

steps. The AR coefficients of this model are multiplied by lagged values and the present period realized value to create a one period ahead forecast. If `h_ahead`

is greater than one, the one period ahead forecasting process is iteratively repeated so that the two period ahead forecast conditions on the one period ahead forecasted value. This process of rolling one period ahead forecasts forward continues until an `h_ahead`

forecast is obtained. The `h_ahead`

forecasting process is repeated for each period after `estimation_end`

with AR model coefficients updating as more information would have become available to the forecaster. In each period, coefficients are updated based on all available realized values if `estimation_window`

is set to `NULL`

, or a rolling window of past periods if `estimation_window`

is set to an integer value.

In the sample *i* with `ar_lags`

set to two and `h_ahead`

set to two. For each period *p* greater than or equal to `estimation_end`

, coefficients are calculated as:

\[
Y_i = \beta_0 + \beta_1 Y_{i-1} + \beta_2 Y_{i-2}
\qquad \text{for all} \quad p-w \leq i \leq \text{p}
\] Where *Y* is the series being forecasted and *w* is the `estimation_window`

. `h_ahead`

two step ahead forecasts are estimated as:

\[ Y_{p+1} = \beta_0 + \beta_1 Y_p + \beta_2 Y_{p-1} \\ forecast_{p+h} = Y_{p+2} = \beta_0 + \beta_1 Y_{p+1} + \beta_2 Y_{p} \]

```
autoreg_forc(
realized_vec = data$y,
h_ahead = 2L,
ar_lags = 2L,
estimation_end = as.Date("2011-06-30"),
time_vec = data$date,
estimation_window = NULL,
return_betas = FALSE
)#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-06-30 2011-12-31 1.649380 2.11
#> 2 2011-09-30 2012-03-31 2.376138 2.97
#> 3 2011-12-31 2012-06-30 1.944882 0.99
```

`autoreg_forc()`

returns an autoressive forecast based on information that would have been available at the forecast origin. This function replicates the forecasts than an AR model would have produced in real-time and can serve as a benchmark for other forecasting models.

The `performance_weighted_forc()`

function produces a weighted average of multiple forecasts based on the recent performance of each forecast. The function takes two or more forecasts of the `Forecast`

class, an evaluation window, and an error function. For each forecast period, the error function is used to calculate forecast accuracy over the past `eval_window`

number of periods. The forecast accuracy of each forecast is then used to weight forecasts based on a weighting function. In each period, weights are calculated and used to create a weighted average forecast.

We use a stylized example in which we create a weighted forecast of two forecasts: Y1 and Y2.

For all periods *p* in the *k* number of forecasts *Y*, weights *W* are calculated over the `eval_window`

*e* as:

\[ W_k = \frac{1/MSE(Y_{ki})}{1/\sum_{k=1}^{k}MSE(Y_{ki})} \qquad \text{where} \quad i = p-e \leq i \leq p \]

Forecasts are estimated as:

\[ forecast_p = Y1_pW_1 + Y2_pW_2 \]

```
<- Forecast(
y1_forecast origin = as.Date(c("2009-03-31", "2009-06-30", "2009-09-30", "2009-12-31",
"2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30")),
future = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30")),
forecast = c(1.33, 1.36, 1.38, 1.68, 1.60, 1.55, 1.32, 1.22, 1.08, 0.88),
realized = c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99),
h_ahead = 4L
)
<- Forecast(
y2_forecast origin = as.Date(c("2009-03-31", "2009-06-30", "2009-09-30", "2009-12-31",
"2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30")),
future = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30")),
forecast = c(0.70, 0.88, 1.03, 1.05, 1.01, 0.82, 0.95, 1.09, 1.07, 1.06),
realized = c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99),
h_ahead = 4L
)
performance_weighted_forc(
y1_forecast, y2_forecast,eval_window = 2L,
errors = "mse",
return_weights = FALSE
)#> h_ahead = 4
#>
#> origin future forecast realized
#> 1 2009-03-31 2010-03-31 NA 1.09
#> 2 2009-06-30 2010-06-30 NA 1.71
#> 3 2009-09-30 2010-09-30 NA 1.09
#> 4 2009-12-31 2010-12-31 NA 2.46
#> 5 2010-03-31 2011-03-31 NA 1.78
#> 6 2010-06-30 2011-06-30 1.421244 1.35
#> 7 2010-09-30 2011-09-30 1.234979 2.89
#> 8 2010-12-31 2011-12-31 1.186461 2.11
#> 9 2011-03-31 2012-03-31 1.078011 2.97
#> 10 2011-06-30 2012-06-30 0.893773 0.99
```

The `performance_weighted_forc()`

function returns a weighted forecast of the Y1 and Y2 forecasts based on performance in recent periods. The weights used in each period can be returned to the Global Environment by setting `return_weights`

to `TRUE`

. Note that although we were only weighting performance over the past two periods, we have five `NA`

forecasts. This reflects the *lmForc* philosophy of replicating what it would be like to forecast in real-time. If a forecaster was making a forecast at `2010-06-30`

, they would only have access to realized values up to `2010-06-30`

, in this case the first two rows. This is why a weighted forecast with an `eval_window`

of two can only be computed once the forecast origin becomes `2010-06-30`

and the forecaster has access to two realized values. This function can be used to compute a weighted forecast for the present period or to test how a weighted forecast would have performed historically. The weighted forecasts are based on information that **would** have been available to the forecaster at the forecast origin.

The `states_weighted_forc()`

function produces a weighted average of multiple forecasts based on how each forecast performed during the past state of the world that is most similar to the current state of the world. The function takes two or more forecasts, a data frame, matrix, or array of matching variables, an optional vector of time data associated with the matching variables, a matching window size, a matching function, and an error function. The first step of the weighted forecast process is to match the current state of the world to a similar past state of the world. For each forecast period, the `matching_vars`

are standardized and the past `matching_window`

periods of the matching variables are considered as the current state of the world. This current state of the world is compared to all past `matching_window`

size periods of the matching variables. The current state is matched to the past state that minimizes the user selected matching function. For example, if `matching`

is set to `euclidean`

then the matched past state is the past state which has the minimum euclidean distance to the current state of the world. The objective is to select a past period that is similar to the current state of the world as given by the matching variables. Once a past state has been matched, the accuracy of each forecast is calculated over the periods of the past state according to the user selected error function. Forecast weights are then computed based on forecast accuracy during the past state. The objective is to give more weight to the forecasts that perform better in conditions that reflect the current state of the world. The forecast weights are then used to create a weighted forecast for the current period.

We use a stylized example with one matching variable *x* as well as two forecasts *Y1* and *Y2*.

The matching variable *x* is first standardized using the function:

\[ x_i = \frac{x_i - mean(x)}{sd(x)} \]

For all periods *p*, the current state of the world *c* and past states of the world *p* are calculated as:

\[ c_i = x_i \qquad \text{where} \quad p-w \leq i \leq p \\ \qquad \qquad \qquad \qquad p_{id} = x_i \qquad \text{where} \quad d-w \leq i \leq d \quad \text{for all} \quad d \lt p-w \]

Where *w* is the `matching_window`

size and *d* are all periods that occur before the beginning of the current state.

All possible past states are passed through the matching function and the matched past state is selected as the past state that minimizes the matching function. If `matching`

is set to `euclidean`

, the matched past state *p* is the past state that satisfies the following:

\[ p = \min \sqrt{\sum_{i=1}^{n}(c_i - p_{id})^2} \qquad \text{for all past states } d \]

Forecast accuracy and forecast weights are computed over observations from the matched past state *p*. If `errors`

is set to `mse`

then the forecast weights *W* for each forecast *k* are calculated as.

\[ W_k = \frac{1/MSE(Y_{kp})}{1/\sum_{k=1}^{k}MSE(Y_{kp})} \]

The current period forecast is then calculated as:

\[ forecast_p = Y1_pW_1 + Y2_pW_2 \]

```
<- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
date "2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30"))
<- as.Date(c("2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
future "2012-03-31", "2012-06-30", "2012-09-30", "2012-12-31",
"2013-03-31", "2013-06-30"))
<- c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99)
y <- c(4.22, 3.86, 4.27, 5.60, 5.11, 4.31, 4.92, 5.80, 6.30, 4.17)
x1 <- c(10.03, 10.49, 10.85, 10.47, 9.09, 10.91, 8.68, 9.91, 7.87, 6.63)
x2
<- data.frame(date, y, x1, x2)
data <- data[, c("x1", "x2")]
matching_vars
<- Forecast(
y1_forecast origin = date,
future = future,
forecast = c(1.33, 1.36, 1.38, 1.68, 1.60, 1.55, 1.32, 1.22, 1.08, 0.88),
realized = c(1.78, 1.35, 2.89, 2.11, 2.97, 0.99, 1.31, 1.41, 1.02, 1.05),
h_ahead = 4L
)
<- Forecast(
y2_forecast origin = date,
future = future,
forecast = c(0.70, 0.88, 1.03, 1.05, 1.01, 0.82, 0.95, 1.09, 1.07, 1.06),
realized = c(1.78, 1.35, 2.89, 2.11, 2.97, 0.99, 1.31, 1.41, 1.02, 1.05),
h_ahead = 4L
)
states_weighted_forc(
y1_forecast, y2_forecast,matching_vars = matching_vars,
time_vec = data$date,
matching_window = 2L,
matching = "euclidean",
errors = "mse",
return_weights = FALSE
)#> h_ahead = 4
#>
#> origin future forecast realized
#> 1 2010-03-31 2011-03-31 NA 1.78
#> 2 2010-06-30 2011-06-30 NA 1.35
#> 3 2010-09-30 2011-09-30 NA 2.89
#> 4 2010-12-31 2011-12-31 NA 2.11
#> 5 2011-03-31 2012-03-31 NA 2.97
#> 6 2011-06-30 2012-06-30 1.456977 0.99
#> 7 2011-09-30 2012-09-30 1.211438 1.31
#> 8 2011-12-31 2012-12-31 1.174534 1.41
#> 9 2012-03-31 2013-03-31 1.077066 1.02
#> 10 2012-06-30 2013-06-30 0.932814 1.05
```

The `states_weighted_forc()`

function returns a weighted forecast of the Y1 and Y2 forecasts based on how these forecasts performed in past states of the world that resemble the current state of the world. The weights used in each period and a list of the matched states can be returned to the Global Environment by setting `return_weights`

to `TRUE`

. This function can be used to compute a states weighted forecast for the present period or to test how a states weighted forecast would have performed historically. The states weighted forecasts are based on information that **would** have been available to the forecaster at the forecast origin.