Please click here if you are unable to view this page.
TOPIC:
FAST INFERENCE FOR QUANTILE REGRESSIONS WITH MILLIONS OF OBSERVATIONS
ABSTRACT
While applications of "Big data'' analytic has brought many new opportunities to economic research, with datasets containing millions of samples, usual econometric inferences would require huge computing powers and memories that are often not accessible, and therefore are only applicable to smaller subsamples, and often raise concerns of sample selection biases. In this paper we focus on the quantile regression employed to datasets whose sizes are of order n~ 10^7. We develop a new inference framework that runs very fast, based on the stochastic gradient descent (SGD) updates. The cross-sectional data are treated sequentially into the model and the parameter estimate is updated when each "new observation" arrives, and finally aggregated as the “Polyak-Ruppert average”. We leverage insights from time series regression and construct asymptotically pivotal statistics via random scaling. Our proposed test statistic is computed in a fully online fashion and the critical values are obtained without any resampling methods. We conduct extensive numerical studies to showcase the computational merits of our proposed inference, and find that alternative inference methods from existing approaches would be binding by either time constraints or memory budgets for computations. Our proposed method is useful to analyze the entire economic census data without focusing on subsamples.