MODEL SELECTION AND INFERENCE: FACTS AND FICTION

Hannes Leeb; Benedikt M. Pötscher

doi:10.1017/S0266466605050036

MODEL SELECTION AND INFERENCE: FACTS AND FICTION

Published online by Cambridge University Press: 08 February 2005

Hannes Leeb and

Benedikt M. Pötscher

Show author details

Hannes Leeb: Affiliation:
Yale University
Benedikt M. Pötscher: Affiliation:
University of Vienna

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Model selection has an important impact on subsequent inference. Ignoring the model selection step leads to invalid inference. We discuss some intricate aspects of data-driven model selection that do not seem to have been widely appreciated in the literature. We debunk some myths about model selection, in particular the myth that consistent model selection has no effect on subsequent inference asymptotically. We also discuss an “impossibility” result regarding the estimation of the finite-sample distribution of post-model-selection estimators.

Type: Research Article
Information: Econometric Theory , Volume 21 , Issue 1 , February 2005 , pp. 21 - 59

DOI: https://doi.org/10.1017/S0266466605050036 [Opens in a new window]
Copyright: © 2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Ahmed, S.E. & A.K. Basu (2000) Least squares, preliminary test and Stein-type estimation in general vector AR(p) models. Statistica Neerlandica 54, 47–66.Google Scholar

Altissimo, F. & V. Corradi (2002) Bounds for inference with nuisance parameters present only under the alternative. Econometrics Journal 5, 494–519.Google Scholar

Altissimo, F. & V. Corradi (2003) Strong rules for detecting the numbers of breaks in a time series. Journal of Econometrics 117, 207–244.Google Scholar

Andrews, D.W.K. (1986) Complete consistency: A testing analogue of estimator consistency. Review of Economic Studies 53, 263–269.Google Scholar

Bauer, P., B.M. Pötscher, & P. Hackl (1988) Model selection by multiple test procedures. Statistics 19, 39–44.Google Scholar

Bunea, F. (2004) Consistent covariate selection and post model selection inference in semiparametric regression. Annals of Statistics 32, 898–927.Google Scholar

Bunea, F., X. Niu, & M.H. Wegkamp (2003) The Consistency of the FDR Estimator. Working paper, Department of Statistics, Florida State University at Tallahassee.

Chen, S.S., D.L. Donoho, & M.A. Saunders (1998) Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20, 33–61.Google Scholar

Corradi, V. (1999) Deciding between I(0) and I(1) via FLIL-based bounds. Econometric Theory 15, 643–663.Google Scholar

Danilov, D. & J.R. Magnus (2004) On the harm that ignoring pretesting can cause. Journal of Econometrics 122, 27–46.Google Scholar

Dijkstra, T.K. & J.H. Veldkamp (1988) Data-driven selection of regressors and the bootstrap. Lecture Notes in Economics and Mathematical Systems 307, 17–38.Google Scholar

Dufour, J.M., D. Pelletier, & E. Renault (2003) Short run and long run causality in time series: Inference. Journal of Econometrics (forthcoming).Google Scholar

Dukić, V.M. & E.A Peña (2002) Estimation after Model Selection in a Gaussian Model. Manuscript, Department of Statistics, University of Chicago.

Ensor, K.B. & H.J. Newton (1988) The effect of order estimation on estimating the peak frequency of an autoregressive spectral density. Biometrika 75, 587–589.Google Scholar

Fan, J. & R. Li (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360.Google Scholar

Frank, I.E. & J.H. Friedman (1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics 35, 109–148.Google Scholar

Giles, J.A. & D.E.A. Giles (1993) Pre-test estimation and testing in econometrics: Recent developments. Journal of Economic Surveys 7, 145–197.Google Scholar

Hajek, J. (1971) Limiting properties of likelihoods and inference. In V.P. Godambe & D.A. Sprott (eds.), Foundations of Statistical Inference: Proceedings of the Symposium on the Foundations of Statistical Inference, University of Waterloo, Ontario, March 31–April 9, 1970, pp. 142–159. Holt, Rinehart and Winston.

Hajek, J. & Z. Sidak (1967) Theory of Rank Tests. Academic Press.

Hall, A.R. & F.P.M. Peixe (2003) A consistent method for the selection of relevant instruments. Econometric Reviews 22, 269–287.Google Scholar

Hannan, E.J. & B.G. Quinn (1979) The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41, 190–195.Google Scholar

Hansen, P.R. (2003) Regression Analysis with Many Specifications: A Bootstrap Method for Robust Inference. Working paper, Department of Economics, Brown University.

Hidalgo, J. (2002) Consistent order selection with strongly dependent data and its application to efficient estimation. Journal of Econometrics 110, 213–239.Google Scholar

Hjort, N.L. & G. Claeskens (2003) Frequentist model average estimators. Journal of the American Statistical Association 98, 879–899.Google Scholar

Hosoya, Y. (1984) Information criteria and tests for time series models. In O.D. Anderson (ed.), Time Series Analysis: Theory and Practice, vol. 5, pp. 39–52. North-Holland.

Judge, G.G. & M.E. Bock (1978) The Statistical Implications of Pre-test and Stein-Rule Estimators in Econometrics. North-Holland.

Judge, G.G. & T.A. Yancey (1986) Improved Methods of Inference in Econometrics. North-Holland.

Kabaila, P. (1995) The effect of model selection on confidence regions and prediction regions. Econometric Theory 11, 537–549.Google Scholar

Kabaila, P. (1996) The evaluation of model selection criteria: Pointwise limits in the parameter space. In D.L. Dowe, K.B. Korb, & J.J. Oliver (eds.), Information, Statistics and Induction in Science, pp. 114–118. World Scientific.

Kabaila, P. (1998) Valid confidence intervals in regression after variable selection. Econometric Theory 14, 463–482.Google Scholar

Kabaila, P. & H. Leeb (2004) On the Large-Sample Minimal Coverage Probability of Confidence Intervals after Model Selection. Working paper, Department of Statistics, Yale University.

Kapetanios, G. (2001) Incorporating lag order selection uncertainty in parameter inference for AR models. Economics Letters 72, 137–144.Google Scholar

Kempthorne, P.J. (1984) Admissible variable-selection procedures when fitting regression models by least squares for prediction. Biometrika 71, 593–597.Google Scholar

Kilian, L. (1998) Accounting for lag order uncertainty in autoregressions: The endogenous lag order bootstrap algorithm. Journal of Time Series Analysis 19, 531–548.Google Scholar

Knight, K. (1999) Epi-convergence in Distribution and Stochastic Equi-semicontinuity. Working paper, Department of Statistics, University of Toronto.

Knight, K. & W. Fu (2000) Asymptotics of lasso-type estimators. Annals of Statistics 28, 1356–1378.Google Scholar

Koul, H.L. & W. Wang (1984) Local asymptotic normality of randomly censored linear regression model. Statistics & Decisions, supplement 1, 17–30.Google Scholar

Kulperger, R.J. & S.E. Ahmed (1992) A bootstrap theorem for a preliminary test estimator. Communications in Statistics: Theory and Methods 21, 2071–2082.Google Scholar

Leeb, H. (2003a) The distribution of a linear predictor after model selection: Conditional finite-sample distributions and asymptotic approximations. Journal of Statistical Planning and Inference (forthcoming).Google Scholar

Leeb, H. (2003b) The Distribution of a Linear Predictor after Model Selection: Unconditional Finite-Sample Distributions and Asymptotic Approximations. Working paper, Department of Statistics, University of Vienna.

Leeb, H. & B.M. Pötscher (2002) Performance Limits for Estimators of the Risk or Distribution of Shrinkage-Type Estimators, and Some General Lower Risk-Bound Results. Working paper, Department of Statistics, University of Vienna.

Leeb, H. & B.M. Pötscher (2003a) The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19, 100–142.Google Scholar

Leeb, H. & B.M. Pötscher (2003b) Can One Estimate the Conditional Distribution of Post-Model-Selection Estimators? Working paper, Department of Statistics, University of Vienna. (Also available as Cowles Foundation Discussion paper 1444.)

Leeb, H. & B.M. Pötscher (2004) Can One Estimate the Unconditional Distribution of Post-Model-Selection Estimators? Manuscript, Department of Statistics, Yale University.

Lehmann, E.L. & G. Casella (1998) Theory of Point Estimation. Springer Texts in Statistics. Springer-Verlag.

Lütkepohl, H. (1990) Asymptotic distributions of impulse response functions and forecast error variance decompositions of vector autoregressive models. Review of Economics and Statistics 72, 116–125.Google Scholar

Magnus, J.R. (1999) The traditional pretest estimator. Teoriya Veroyatnost. i Primenen. 44, 401–418; translation in Theory of Probability and Its Applications 44 (2000), 293–308.Google Scholar

Nickl, R. (2003) Asymptotic Distribution Theory of Post-Model-Selection Maximum Likelihood Estimators. Master's thesis, Department of Statistics, University of Vienna.

Nishii, R. (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics 12, 758–765.Google Scholar

Phillips, P.C.B. (2005) Automated discovery in econometrics. Econometric Theory (this issue).Google Scholar

Pötscher, B.M. (1981) Order Estimation in ARMA-Models by Lagrangian Multiplier Tests. Research report 5, Department of Econometrics and Operations Research, University of Technology, Vienna.

Pötscher, B.M. (1983) Order estimation in ARMA-models by Lagrangian multiplier tests. Annals of Statistics 11, 872–885.Google Scholar

Pötscher, B.M. (1991) Effects of model selection on inference. Econometric Theory 7, 163–185.Google Scholar

Pötscher, B.M. (1995) Comment on “The effect of model selection on confidence regions and prediction regions.” Econometric Theory 11, 550–559.Google Scholar

Pötscher, B.M. (2002) Lower risk bounds and properties of confidence sets for ill-posed estimation problems with applications to spectral density and persistence estimation, unit roots, and estimation of long memory parameters. Econometrica 70, 1035–1065.Google Scholar

Pötscher, B.M. & A.J. Novak (1998) The distribution of estimators after model selection: Large and small sample results. Journal of Statistical Computation and Simulation 60, 19–56.Google Scholar

Rao, C.R. & Y. Wu (2001) On model selection. IMS Lecture Notes Monograph Series 38, 1–57.Google Scholar

Sargan, D.J. (2001) The choice between sets of regressors. Econometric Reviews 20, 171–186.Google Scholar

Sclove, S.L., C. Morris, & R. Radhakrishnan (1972) Non-optimality of preliminary-test estimators for the mean of a multivariate normal distribution. Annals of Mathematical Statistics 43, 1481–1490.Google Scholar

Sen, P.K (1979) Asymptotic properties of maximum likelihood estimators based on conditional specification. Annals of Statistics 7, 1019–1033.Google Scholar

Sen, P.K & A.K.M.E. Saleh (1987) On preliminary test and shrinkage M-estimation in linear models. Annals of Statistics 15, 1580–1592.Google Scholar

Shibata, R. (1986) Consistency of model selection and parameter estimation. Journal of Applied Probability, special volume 23A, 127–141.Google Scholar

Söderström, T. (1977) On model structure testing in system identification. International Journal of Control 26, 1–18.Google Scholar

Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288.Google Scholar

Yang, Y. (2003) Can the Strengths of AIC and BIC Be Shared? Working paper, Department of Statistics, Iowa State University.

Article contents

MODEL SELECTION AND INFERENCE: FACTS AND FICTION

Abstract

Access options

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests