Most powerful quadratic tests for high dimensional free alternatives.
Yi He, Sombut Jaidee and Jiti Gao
We propose an unweighted quadratic test for the overall significance of many weak exogenous variables in a dense autoregressive model. By shrinking the classical weighting matrix on the sample moments to be identity, our test is asymptotically correct in high dimensions even when the number of coefficients is (much) larger than the sample size. Our theroy allows autoregressors as nuisance variables and a unknown error distribution, by standardizing the test statistic into a martingale form and establishing its asymptotic normality using martingale central limit theorem. Using random matrix theory, we show that our unweighted test has the optimal asymptotic testing power among a large class of competitors against local dense alternatives whose direction is free in the eigenbasis of the sample covariance matrix among regressors. Our results are fully adaptive to the cross sectional and temporal dependence among predictors, and do not require the limiting spectral law of the sample covariance matrix. We have also discussed the extension beyond autogressive models and for general alternatives. Our simulation study suggests a good power performance of our proposed test against high dimensional dense alternative for various data generating processes of the predictors. We apply our tests to detect whether the set of more than a hundred exogenous variables in the latest FRED-MD database is highly significant in predicting the monthly growth in the US industrial production index. We show that the practitioners may improve forecast accuracy by using a dense learning strategy during the periods where the coefficients are significant.
Manuscript available soon
Optimal Forecast with Large Dimensional Regression
We study a universe of Tikhonov regularized estimators for non-sparse linear regression with a large number of predictors comparable to the sample size. Using random matrix theory, among the entire universe we identify a closed-form ridge estimator that minimizes asymptotic out-of-sample mean squared forecast loss when the regression coefficients are generated from a random-effects model. The optimal ridge penalty can be directly estimated from data when the number of predictors is proportional to the sample size. We also provide an explicit optimal linear shrinkage estimator that improves the classical James-Stein estimator in high dimensions and allows interactions between the regression coefficients and predictor correlations. Our simulation study demonstrates the efficiency of the ridge estimator, which is robust against the cross-sectional and time-series dependence among the predictors. We apply our methods to forecast the growth rate of US industrial production using a large dimensional macroeconomic database. Our optimal ridge estimator displays a non-trivial improvement relative to the principal-components based method, and the best forecasting performance among all types of estimators in our analysis.
Manuscript available soon
Set Inference for Multivariate Extreme Quantile Region under Regular Variation
Consider the extreme quantile region in arbitrary dimension as the half-space depth trimmed region at a very small probability level, which coincides with the envelope of the directional extreme quantiles. The classical asymptotic confidence regions significantly undercover the population set in finite samples, even when the sample size is in thousands. We propose a second-order correction under regular variation base on the joint convergence of the properly normalized second-order residual process of the extreme directional quantile estimator and the Hill estimator of the tail index. Our approach relies on a dual relationship between the set-valued estimator and its support function, and recognizes the shape estimation error of our quantile region. Our second-order confidence regions improve the coverage towards correct levels in simulation studies. We offer finance applications in a six-dimensional international stock market dataset. Extensions beyond the half-space depth are discussed.
Extreme Value Statistics for High Dimensional Data
John H.J. Einmahl and Yi He
We propose a novel statistical formulation of the empirical power laws widely observed in high-dimensional data sets. Our approach extends classical extreme value theory to specifying the behavior of the empirical distribution of data with a complex dependence structure and possibly different marginal distributions, for a diverging number of dimensions. The main assumption is that in the intermediate tail the empirical distribution function approaches some heavy tailed distribution function that is in the max-domain of attraction. In this setup the Hill estimator consistently estimates the extreme value index and extreme quantiles are consistently estimated, on a log-scale. We discuss several model examples that satisfy our conditions. We also consider applications to finance.
Manuscript available soon