diff --git a/DESCRIPTION b/DESCRIPTION index 7380da8..25745e5 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -3,17 +3,19 @@ Version: 0.3.0 Title: Dynamic Factor Models Authors@R: c(person("Sebastian", "Krantz", role = c("aut", "cre"), email = "sebastian.krantz@graduateinstitute.ch"), person("Rytis", "Bagdziunas", role = "aut"), - person("Santtu", "Tikka", role = "rev")) + person("Santtu", "Tikka", role = "rev"), + person("Eli", "Holmes", role = "rev")) Description: Efficient estimation of Dynamic Factor Models using the Expectation Maximization (EM) algorithm - or Two-Step (2S) estimation, supporting datasets with missing data. The estimation options follow advances in the - econometric literature: either running the Kalman Filter and Smoother once with initial values from PCA - - 2S estimation as in Doz, Giannone and Reichlin (2011) - or via iterated - Kalman Filtering and Smoothing until EM convergence - following Doz, Giannone and Reichlin (2012) - - or using the adapted EM algorithm of Banbura and Modugno (2014) , - allowing arbitrary patterns of missing data. The implementation makes heavy use of the 'Armadillo' 'C++' library and - the 'collapse' package, providing for particularly speedy estimation. A comprehensive set of methods supports - interpretation and visualization of the model as well as forecasting. Information criteria to choose the number - of factors are also provided - following Bai and Ng (2002) . + or Two-Step (2S) estimation, supporting datasets with missing data. Factors are assumed to follow a stationary VAR + process of order p. The estimation options follow advances in the econometric literature: either running the Kalman + Filter and Smoother once with initial values from PCA - 2S estimation as in Doz, Giannone and Reichlin (2011) + - or via iterated Kalman Filtering and Smoothing until EM convergence - following + Doz, Giannone and Reichlin (2012) - or using the adapted EM algorithm of Banbura and + Modugno (2014) , allowing arbitrary patterns of missing data. The implementation makes heavy + use of the 'Armadillo' 'C++' library and the 'collapse' package, providing for particularly speedy estimation. + A comprehensive set of methods supports interpretation and visualization of the model as well as forecasting. + Information criteria to choose the number of factors are also provided - following Bai and Ng (2002) + . URL: https://sebkrantz.github.io/dfms/ BugReports: https://github.com/SebKrantz/dfms/issues Depends: R (>= 3.5.0) diff --git a/R/DFM.R b/R/DFM.R index c550f32..3582d86 100644 --- a/R/DFM.R +++ b/R/DFM.R @@ -147,7 +147,8 @@ #' VARselect(IC_small$F_pca[, 1:2]) #' #' # Estimating the model with 2 factors and 3 lags -#' dfm_small = DFM(BM14[, BM14_Models$small], 2, 3) +#' dfm_small = DFM(BM14[, BM14_Models$small], r = 2, p = 3, +#' quarterly.vars = BM14_Models %$% series[freq == "Q" & small]) #' #' # Inspecting the model #' summary(dfm_small) @@ -170,7 +171,8 @@ #' VARselect(IC_medium$F_pca[, 1:3]) #' #' # Estimating the model with 3 factors and 3 lags -#' dfm_medium = DFM(BM14[, BM14_Models$medium], 3, 3) +#' dfm_medium = DFM(BM14[, BM14_Models$medium], r = 3, p = 3, +#' quarterly.vars = BM14_Models %$% series[freq == "Q" & medium]) #' #' # Inspecting the model #' summary(dfm_medium) @@ -193,7 +195,8 @@ #' VARselect(IC_large$F_pca[, 1:6]) #' #' # Estimating the model with 6 factors and 3 lags -#' dfm_large = DFM(BM14, 6, 3) +#' dfm_large = DFM(BM14, r = 6, p = 3, +#' quarterly.vars = BM14_Models %$% series[freq == "Q"]) #' #' # Inspecting the model #' summary(dfm_large) diff --git a/R/dfms.R b/R/dfms.R index 5cb4aeb..db5dfc6 100644 --- a/R/dfms.R +++ b/R/dfms.R @@ -1,27 +1,19 @@ #' Dynamic Factor Models #' -#' *dfms* provides efficient estimation of Dynamic Factor Models via the EM Algorithm. +#' @description #' -#' Estimation can be done in 3 different ways following: +#' *dfms* provides efficient estimation of Dynamic Factor Models via the EM Algorithm --- following Doz, Giannone & Reichlin (2011, 2012) and Banbura & Modugno (2014). The package has the following contents: #' -#' - Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. *Journal of Econometrics, 164*(1), 188-205. +#' **Information Criteria** #' -#' - Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate dynamic factor models. *Review of Economics and Statistics, 94*(4), 1014-1024. -#' -#' - Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. *Journal of Applied Econometrics, 29*(1), 133-160. -#' -#' The default is `em.method = "auto"`, which chooses `"BM"` following Banbura & Modugno (2014) with missing data or mixed frequency, and `"DGR"` following Doz, Giannone & Reichlin (2012) otherwise. Using `em.method = "none"` generates Two-Step estimates following Doz, Giannone & Reichlin (2011). This is extremely efficient on bigger datasets. PCA and Two-Step estimates are also reported in EM-estimation. All methods support missing data, but `em.method = "DGR"` does not model them in EM iterations. -#' -#' @section Package Contents: -#' -#' **Functions to Specify/Estimate Model and Key Methods** -#' -#' \code{\link[=ICr]{ICr()}} --- Information Criteria\cr +#' \code{\link[=ICr]{ICr()}}\cr #' #' - \code{\link[=plot.ICr]{plot()}}\cr #' - \code{\link[=screeplot.ICr]{screeplot()}}\cr #' -#' \code{\link[=DFM]{DFM()}} --- Estimate the Model\cr +#' **Fit a Dynamic Factor Model** +#' +#' \code{\link[=DFM]{DFM()}}\cr #' #' - \code{\link[=summary.dfm]{summary()}}\cr #' - \code{\link[=plot.dfm]{plot()}}\cr @@ -29,20 +21,26 @@ #' - \code{\link[=residuals.dfm]{residuals()}}\cr #' - \code{\link[=fitted.dfm]{fitted()}} #' -#' \code{\link[=predict.dfm]{predict()}} --- Generate Forecasts\cr +#' **Generate Forecasts** +#' +#' \code{\link[=predict.dfm]{predict()}}\cr #' #' - \code{\link[=plot.dfm_forecast]{plot()}}\cr #' - \code{\link[=as.data.frame.dfm_forecast]{as.data.frame()}}\cr #' -#' **Auxiliary Functions** +#' **Fast Stationary Kalman Filtering and Smoothing** #' -#' \code{\link[=.VAR]{.VAR()}} --- Estimate Vector Autoregression\cr #' \code{\link[=SKF]{SKF()}} --- Stationary Kalman Filter\cr #' \code{\link[=FIS]{FIS()}} --- Fixed Interval Smoother\cr #' \code{\link[=SKFS]{SKFS()}} --- Stationary Kalman Filter + Smoother\cr +#' +#' **Helper Functions** +#' +#' \code{\link[=.VAR]{.VAR()}} --- (Fast) Barebones Vector-Autoregression\cr +#' \code{\link[=ainv]{ainv()}} --- Armadillo's Inverse Function\cr +#' \code{\link[=apinv]{apinv()}} --- Armadillo's Pseudo-Inverse Function\cr #' \code{\link[=tsnarmimp]{tsnarmimp()}} --- Remove and Impute Missing Values in a Multivariate Time Series\cr -#' \code{\link[=ainv]{ainv()}} --- Rcpp Armadillo's Inverse Function\cr -#' \code{\link[=apinv]{apinv()}} --- Rcpp Armadillo's Pseudo-Inverse Function\cr +#' \code{\link[=em_converged]{em_converged()}} --- Convergence Test for EM-Algorithm\cr #' #' **Data** #' @@ -50,6 +48,13 @@ #' \code{\link{BM14_Q}} --- Quarterly Series by Banbura and Modugno (2014)\cr #' \code{\link{BM14_Models}} --- Series Metadata + Small/Medium/Large Model Specifications\cr #' +#' @references +#' Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. *Journal of Econometrics, 164*(1), 188-205. +#' +#' Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate dynamic factor models. *Review of Economics and Statistics, 94*(4), 1014-1024. +#' +#' Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. *Journal of Applied Econometrics, 29*(1), 133-160. +#' #' @docType package #' @name dfms-package #' @aliases dfms diff --git a/R/methods.R b/R/methods.R index c9b825d..78a1722 100644 --- a/R/methods.R +++ b/R/methods.R @@ -457,7 +457,7 @@ fitted.dfm <- function(object, #' @param method character. The factor estimates to use: one of \code{"qml"}, \code{"2s"} or \code{"pca"}. #' @param standardized logical. \code{FALSE} will return data forecasts on the original scale. #' @param resFUN an (optional) function to compute a univariate forecast of the residuals. -#' The function needs to have a second argument providing the forecast horizon (\code{h}) and return a vector or forecasts. See Examples. +#' The function needs to have a second argument providing the forecast horizon (\code{h}) and return a vector of forecasts. See Examples. #' @param resAC numeric. Threshold for residual autocorrelation to apply \code{resFUN}: only residual series where AC1 > resAC will be forecasted. #' @param \dots further arguments to \code{resFUN}. #' diff --git a/README.md b/README.md index 49545f4..b5070a6 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,8 @@ -*dfms* provides efficient estimation of Dynamic Factor Models via the EM Algorithm. Estimation can be done in 3 different ways following: +*dfms* provides efficient estimation of Dynamic Factor Models via the EM Algorithm. Factors are assumed to follow a stationary VAR + process of order `p`. Estimation can be done in 3 different ways following: - Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. *Journal of Econometrics, 164*(1), 188-205. @@ -27,7 +28,7 @@ The package is fully functional though, and you are very welcome to install it u The default is `em.method = "auto"`, which chooses `"BM"` following Banbura & Modugno (2014) with missing data or mixed frequency, and `"DGR"` following Doz, Giannone & Reichlin (2012) otherwise. Using `em.method = "none"` generates Two-Step estimates following Doz, Giannone & Reichlin (2011). This is extremely efficient on bigger datasets. PCA and Two-Step estimates are also reported in EM-estimation. All methods support missing data, but `em.method = "DGR"` does not model them in EM iterations. -The package is stable, but functionality may expand in the future. In particular, mixed-frequency estimation with autoregressive errors is planned for the near future, and generation of the 'news' may be added in the further future. +The package is currently stable, but functionality may expand in the future. In particular, mixed-frequency estimation with autoregressive errors is planned for the near future, and generation of the 'news' may be added in the further future. ### Comparison with Other R Packages @@ -56,7 +57,7 @@ install.packages('dfms', repos = c('https://sebkrantz.r-universe.dev', 'https:// library(dfms) # Fit DFM with 6 factors and 3 lags in the transition equation -mod = DFM(diff(BM14_M), r = 6, p = 3) +mod <- DFM(diff(BM14_M), r = 6, p = 3) ``` ``` @@ -139,7 +140,7 @@ plot(mod) ```
-plot of chunk unnamed-chunk-1 +plot of chunk unnamed-chunk-1
```r @@ -158,14 +159,14 @@ as.data.frame(mod) |> head() ```r # Forecasting 20 periods ahead -fc = predict(mod, h = 20) +fc <- predict(mod, h = 20) # 'dfm_forecast' methods plot(fc) ```
-plot of chunk unnamed-chunk-1 +plot of chunk unnamed-chunk-1
```r diff --git a/man/DFM.Rd b/man/DFM.Rd index 1d8ff82..2f4a277 100644 --- a/man/DFM.Rd +++ b/man/DFM.Rd @@ -157,7 +157,8 @@ screeplot(IC_small) VARselect(IC_small$F_pca[, 1:2]) # Estimating the model with 2 factors and 3 lags -dfm_small = DFM(BM14[, BM14_Models$small], 2, 3) +dfm_small = DFM(BM14[, BM14_Models$small], r = 2, p = 3, + quarterly.vars = BM14_Models \%$\% series[freq == "Q" & small]) # Inspecting the model summary(dfm_small) @@ -180,7 +181,8 @@ screeplot(IC_medium) VARselect(IC_medium$F_pca[, 1:3]) # Estimating the model with 3 factors and 3 lags -dfm_medium = DFM(BM14[, BM14_Models$medium], 3, 3) +dfm_medium = DFM(BM14[, BM14_Models$medium], r = 3, p = 3, + quarterly.vars = BM14_Models \%$\% series[freq == "Q" & medium]) # Inspecting the model summary(dfm_medium) @@ -203,7 +205,8 @@ screeplot(IC_large) VARselect(IC_large$F_pca[, 1:6]) # Estimating the model with 6 factors and 3 lags -dfm_large = DFM(BM14, 6, 3) +dfm_large = DFM(BM14, r = 6, p = 3, + quarterly.vars = BM14_Models \%$\% series[freq == "Q"]) # Inspecting the model summary(dfm_large) diff --git a/man/dfms-package.Rd b/man/dfms-package.Rd index 08cb52e..fafe4ff 100644 --- a/man/dfms-package.Rd +++ b/man/dfms-package.Rd @@ -6,30 +6,19 @@ \alias{dfms} \title{Dynamic Factor Models} \description{ -\emph{dfms} provides efficient estimation of Dynamic Factor Models via the EM Algorithm. -} -\details{ -Estimation can be done in 3 different ways following: -\itemize{ -\item Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. \emph{Journal of Econometrics, 164}(1), 188-205. \url{doi:10.1016/j.jeconom.2011.02.012} -\item Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate dynamic factor models. \emph{Review of Economics and Statistics, 94}(4), 1014-1024. \url{doi:10.1162/REST_a_00225} -\item Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. \emph{Journal of Applied Econometrics, 29}(1), 133-160. \url{doi:10.1002/jae.2306} -} - -The default is \code{em.method = "auto"}, which chooses \code{"BM"} following Banbura & Modugno (2014) with missing data or mixed frequency, and \code{"DGR"} following Doz, Giannone & Reichlin (2012) otherwise. Using \code{em.method = "none"} generates Two-Step estimates following Doz, Giannone & Reichlin (2011). This is extremely efficient on bigger datasets. PCA and Two-Step estimates are also reported in EM-estimation. All methods support missing data, but \code{em.method = "DGR"} does not model them in EM iterations. -} -\section{Package Contents}{ - +\emph{dfms} provides efficient estimation of Dynamic Factor Models via the EM Algorithm --- following Doz, Giannone & Reichlin (2011, 2012) and Banbura & Modugno (2014). The package has the following contents: -\strong{Functions to Specify/Estimate Model and Key Methods} +\strong{Information Criteria} -\code{\link[=ICr]{ICr()}} --- Information Criteria\cr +\code{\link[=ICr]{ICr()}}\cr \itemize{ \item \code{\link[=plot.ICr]{plot()}}\cr \item \code{\link[=screeplot.ICr]{screeplot()}}\cr } -\code{\link[=DFM]{DFM()}} --- Estimate the Model\cr +\strong{Fit a Dynamic Factor Model} + +\code{\link[=DFM]{DFM()}}\cr \itemize{ \item \code{\link[=summary.dfm]{summary()}}\cr \item \code{\link[=plot.dfm]{plot()}}\cr @@ -38,21 +27,27 @@ The default is \code{em.method = "auto"}, which chooses \code{"BM"} following Ba \item \code{\link[=fitted.dfm]{fitted()}} } -\code{\link[=predict.dfm]{predict()}} --- Generate Forecasts\cr +\strong{Generate Forecasts} + +\code{\link[=predict.dfm]{predict()}}\cr \itemize{ \item \code{\link[=plot.dfm_forecast]{plot()}}\cr \item \code{\link[=as.data.frame.dfm_forecast]{as.data.frame()}}\cr } -\strong{Auxiliary Functions} +\strong{Fast Stationary Kalman Filtering and Smoothing} -\code{\link[=.VAR]{.VAR()}} --- Estimate Vector Autoregression\cr \code{\link[=SKF]{SKF()}} --- Stationary Kalman Filter\cr \code{\link[=FIS]{FIS()}} --- Fixed Interval Smoother\cr \code{\link[=SKFS]{SKFS()}} --- Stationary Kalman Filter + Smoother\cr + +\strong{Helper Functions} + +\code{\link[=.VAR]{.VAR()}} --- (Fast) Barebones Vector-Autoregression\cr +\code{\link[=ainv]{ainv()}} --- Armadillo's Inverse Function\cr +\code{\link[=apinv]{apinv()}} --- Armadillo's Pseudo-Inverse Function\cr \code{\link[=tsnarmimp]{tsnarmimp()}} --- Remove and Impute Missing Values in a Multivariate Time Series\cr -\code{\link[=ainv]{ainv()}} --- Rcpp Armadillo's Inverse Function\cr -\code{\link[=apinv]{apinv()}} --- Rcpp Armadillo's Pseudo-Inverse Function\cr +\code{\link[=em_converged]{em_converged()}} --- Convergence Test for EM-Algorithm\cr \strong{Data} @@ -60,4 +55,10 @@ The default is \code{em.method = "auto"}, which chooses \code{"BM"} following Ba \code{\link{BM14_Q}} --- Quarterly Series by Banbura and Modugno (2014)\cr \code{\link{BM14_Models}} --- Series Metadata + Small/Medium/Large Model Specifications\cr } +\references{ +Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. \emph{Journal of Econometrics, 164}(1), 188-205. \url{doi:10.1016/j.jeconom.2011.02.012} + +Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate dynamic factor models. \emph{Review of Economics and Statistics, 94}(4), 1014-1024. \url{doi:10.1162/REST_a_00225} +Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. \emph{Journal of Applied Econometrics, 29}(1), 133-160. \url{doi:10.1002/jae.2306} +} diff --git a/man/predict.dfm.Rd b/man/predict.dfm.Rd index 9cfde00..f4aecb1 100644 --- a/man/predict.dfm.Rd +++ b/man/predict.dfm.Rd @@ -58,7 +58,7 @@ \item{standardized}{logical. \code{FALSE} will return data forecasts on the original scale.} \item{resFUN}{an (optional) function to compute a univariate forecast of the residuals. -The function needs to have a second argument providing the forecast horizon (\code{h}) and return a vector or forecasts. See Examples.} +The function needs to have a second argument providing the forecast horizon (\code{h}) and return a vector of forecasts. See Examples.} \item{resAC}{numeric. Threshold for residual autocorrelation to apply \code{resFUN}: only residual series where AC1 > resAC will be forecasted.} diff --git a/vignettes/introduction.Rmd b/vignettes/introduction.Rmd index 8064b8f..a94d777 100644 --- a/vignettes/introduction.Rmd +++ b/vignettes/introduction.Rmd @@ -53,7 +53,7 @@ Prior to estimation, all data is differenced by BM14, and some series are log, d library(magrittr) # log-transforming and first-differencing the data BM14_M[, BM14_Models_M$log_trans] %<>% log() -BM14_M_diff = diff(BM14_M) +BM14_M_diff <- diff(BM14_M) plot(scale(BM14_M_diff), lwd = 1) ``` @@ -62,7 +62,7 @@ plot(scale(BM14_M_diff), lwd = 1) Before estimating a model, the `ICr()` function can be applied to determine the number of factors. It computes 3 information criteria proposed in Bai and NG (2002)^[Bai, J., Ng, S. (2002). Determining the Number of Factors in Approximate Factor Models. *Econometrica, 70*(1), 191-221. ], whereby the second criteria generally suggests the most parsimonious model. ```{r} -ic = ICr(BM14_M_diff) +ic <- ICr(BM14_M_diff) print(ic) plot(ic) ``` @@ -88,7 +88,7 @@ Estimation can then simply be done using the `DFM()` function with parameters `r ```{r} # Estimating the model with 4 factors and 3 lags using BM14's EM algorithm -model1 = DFM(BM14_M_diff, r = 4, p = 3) +model1 <- DFM(BM14_M_diff, r = 4, p = 3) print(model1) plot(model1) ``` @@ -134,7 +134,7 @@ DFM forecasts can be obtained with the `predict()` method, which dynamically for ```{r} # 12-period ahead DFM forecast -fc = predict(model1, h = 12) +fc <- predict(model1, h = 12) print(fc) ``` @@ -156,7 +156,7 @@ head(as.data.frame(fc, pivot = "wide")) ## Estimation with Mixed Frequency -*dfms* currently provides no specific adjustments for data at different frequencies. An algorithm that accommodates monthly and quarterly series is planned for summer 2023. In the meantime, users may choose to block the data (creating multiple quarterly series from a monthly series, and duplicating quarterly series to maintain equal representation). +Since v0.3.0 *dfms* allows monthly and quarterly mixed frequency estimation following Mariano & Murasawa (2003) and Banbura & Modugno (2014). Quarterly variables should be to the right of the monthly variables in the data matrix and need to be indicated using the `quarterly.vars` argument. Quarterly observations should be provided every 3rd period.