“Information Theoretic Approach to High Dimensional Multiplicative Models: Stochastic Discount Factor and Treatment Effect”, with Taisuke Otsu (reject & resubmit, Quantitative Economics)
This paper is concerned with the estimation of functionals of a latent weight function that satisfies possibly high dimensional multiplicative moment conditions. The main examples covered are missing data problems, treatment effects, and functionals of the stochastic discount factor in asset pricing. We propose to estimate the latent weight function by an information theoretic approach combined with the l1 penalization technique to deal with high dimensional moment conditions under sparsity. We derive the asymptotic properties of the proposed estimator and illustrate the proposed method with a theoretical example on treatment effect analysis and an empirical example on the stochastic discount factor.
Download the paper
“Cross-Fitted Empirical Likelihood on High Dimensional Semiparametric Models”
We consider the empirical likelihood ratio for low dimensional parameters in the presence of infinite dimensional nuisance parameters. When nuisance parameters are estimated by modern high dimensional machine learning methods, the Donsker Theorem can be rather restrictive. Instead, by using locally robust estimating equations and a cross fitting procedure, we establish a Wilks type theorem that validates empirical likelihood inference in high dimensional models. We construct easy-to-verify low level conditions and show how our results can be applied to many econometric models including the partly linear model, treatment effect analyses and partly log linear models. Two simulation exercises demonstrate that our method performs as well as Wald statistics in the linear case while outperforming its counterpart when the moment condition becomes nonlinear in parameters.
Download the paper
Research in progress
“Model Selection and Asymptotic Normality of Machine Learning Estimators”
Following the work of Qiu and Otsu (2018) and Qiu (2019), it seems that root-n normality of some plug-in estimators can be achieved in a linear semiparametric framework, without correct model selection, or beta-min condition, as long as the model selection mistake is not too large. This paper aims to extend this line of research, and to assess the asymptotic distribution of semiparametric estimators with the following general framework: We first use some shrinkage methods (for example, lasso) as a model selector. Then, based on the selected model, we construct a semiparametric plug-in estimator using some post-selection methods (for example, minimax learning). The key idea is that post-model selection procedure can perform at least as well as the first step selector (under some model selection properties that do not require perfect selection, for example, see Belloni and Chernozhukov, 2011) but significantly reduces the dimension of the problem. This seems enough for root-n normality of linear semiparametric estimators, as long as it correctly selects some (but not necessarily all) dimensions of the true model.
“Inference on Average Regression Functionals with Many Covariates”
This paper is concerned with inference on average regression functionals when the number of regressors is proportional to sample size. First, we establish a new distributional result that achieves normality around the population object of interest. This result relies on a central limit theorem validated under conditions weaker than those in Qiu (2019) and a growing conditional variance that forces remainder terms to vanish after standardization. Second, we explore consistent estimation of variance under this many-covariates framework when the error term displays unknown conditional heteroscedasticity. Two approaches are being considered: one approach is to look into the bias of plug-in estimators and adjust the weight of each individual component of the variance accordingly (c.f. Cattaneo et al., 2018). A second approach is to employ the leave-one-out principle, which can be viewed as an extension of Kline et al. (2019) applied to stochastic regressors and semiparametric frameworks.
“Online Empirical Likelihood” (with Taisuke Otsu)
We propose a new strategy that extends the applicability of empirical likelihood in two directions. First, the strategy achieves a parametric rate even for nonregular target parameters, such as those in optimal treatment strategy and moment inequality models. Second, it addresses a situation when data arrive sequentially in batches, and thus researchers learn parameters in an online fashion. This procedure is similar but different from “cross-fitted empirical likelihood” (Qiu, 2018) while naturally sharing the spirit of the “block-wise” empirical likelihood in Kitamura (1997) and Kitamura and Stutzer (1997) but with inherently different motivations. The key idea is to estimate the nuisance parameter from data outside their own batches and then reweigh each observation by the inverse of the (conditional) standard deviation in its own batch. This procedure is inspired by recent work of Luedtke and van der Laan (2018) that focuses on constructing online estimators for similar issues. We aim to provide more examples in economic models where our procedure has applicability in terms of inference.
“Nonparametric Estimation via Entropic Convex Programming”
This paper proposes a general framework for estimating a latent function whose identification relies on a linear structure. This framework is more general than the least square problem and covers many interesting examples, like stochastic discount factors, the Riesz representor of a linear functional, nonparametric regression, nonparametric IV, etc. The key idea is to conduct convex optimization with an entropic function subject to linear constraints. This idea is not new: it has been extensively studied by Borwein and Lewis (1991a, b, 1992) and partially applied to econometrics by Imbens, Johnson and Spady (1998) and Newey and Smith (2004). However, this paper tries to show that the earlier literature does not fully illustrate the strengths of this general approach. In particular, the convergence rate of the derived estimator depends a lot on the shape of the entropy function. Moreover, although the L2 rate of similar estimators has been studied by Newey and Robins (2018) and Qiu and Otsu (2018), it seems the supreme rate (and whether it can achieve optimality) has not been explored yet, which is the focus of this paper.