Rank-based Canonical Correlation and Its Application in Feature Screening for a Large Class of Semi-parametric Regression Models
2018年11月1日 （周四） 下午15:30-16:15
何勇，山东财经大学副教授，2017年博士毕业于复旦大学，2015.09-2016.08美国威斯康星麦迪逊大学统计系访问学者。研究方向为高维数据统计推断、金融统计、生物统计、大范围假设检验。目前在国内外统计学权威期刊 Computational Statistics and Data Analysis, Bioinformatics, BMC Bioinformatics，中国科学：数学等发表学术论文十余篇，现主持国家自然科学基金青年基金一项，全国统计科学研究项目一项。
六、 报告摘要： Regression analysis has always been a hot research topic in statistics. We propose a very flexible semi-parametric regression model called Elliptical Copula Regression (ECR) model, which covers a large class of linear and nonlinear regression models such as additive regression model, single index model. Besides, ECR model can capture the heavy-tail characteristic and tail dependence between variables, thus it could be widely applied in many areas such as econometrics and finance. In this paper we mainly focus on the feature screening problem for ECR model in ultra-high dimensional setting. We propose a doubly robust sure screening procedure for ECR model, in which two types of correlation coefficient are involved: Kendall' tau correlation and Canonical correlation. Theoretical analysis shows that the procedure enjoys sure screening property, i.e., with probability tending to 1, the screening procedure selects out all important variables and substantially reduces the dimensionality to a moderate size against the sample size. Thorough numerical studies are conducted to illustrate its advantage over existing sure independence screening methods and thus it can be used as a safe replacement of the existing procedures in practice. The proposed procedure is applied on a gene-expression real data set to show its empirical usefulness. At last, we generalize the method to achieve feature screening for multi-response ECR models.