0% found this document useful (0 votes)
28 views15 pages

On Some New Hybridized Regression Estimation and Feature Selection Techniques

Conventional regularization techniques like LASSO, SCAD and MCP have been shown to perform poorly in the presence of extremely large or ultra-high dimensional covariates. This has created the need for and led to the development and reliance on filtering technique like screening. Screening techniques (such as SIS, DC-SIS, and DC – RoSIS) have been shown to reduce the computational complexity in selecting important covariates from ultrahigh dimensional candidates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views15 pages

On Some New Hybridized Regression Estimation and Feature Selection Techniques

Conventional regularization techniques like LASSO, SCAD and MCP have been shown to perform poorly in the presence of extremely large or ultra-high dimensional covariates. This has created the need for and led to the development and reliance on filtering technique like screening. Screening techniques (such as SIS, DC-SIS, and DC – RoSIS) have been shown to reduce the computational complexity in selecting important covariates from ultrahigh dimensional candidates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

On Some New Hybridized Regression


Estimation and Feature Selection Techniques
Adamu Buba1 Umar Usman 2; Yakubu Musa3
1 2,3
Department of Mathematics/Statistics, Federal University Department of Statistics, Usmanu Danfodiyo University
Birnin Kebbi, Kebbi State, Nigeria Sokoto, Sokoto State, Nigeria

Murtala Muhammed Hamza4


4
Department of Mathematics, Usmanu Danfodiyo University
Sokoto, Sokoto State, Nigeria

Abstract:- Conventional regularization techniques like rivals or exceeds 𝑛 (the number of observations), we often
LASSO, SCAD and MCP have been shown to perform seek, for the sake of interpretation, a smaller set of variables.
poorly in the presence of extremely large or ultra-high Hence, we want to our fitting procedure to make only a
dimensional covariates. This has created the need for subset of the coefficients large and others small or even
and led to the development and reliance on filtering zero. These shortcomings are of high-dimensionality in
technique like screening. Screening techniques (such as regression setting. The traditional method (OLS) tends to
SIS, DC-SIS, and DC – RoSIS) have been shown to over fit the model also the method becomes unusable as the
reduce the computational complexity in selecting coefficients estimate is no longer unique and its variance
important covariates from ultrahigh dimensional becomes infinite.
candidates. To this end, there have been various
attempts to hybridize the conventional regularization To deal with such problems, coefficient shrinkage
techniques. In this paper, we combine some (regularization) is employed to shrink the estimated
regularization techniques (LASSO and SCAD) with a coefficients towards zero relative to the least squares
screening technique (DC – RoSIS) to form new hybrid estimates. Depending on what type of shrinkage is
methods with a view to achieving better dimension performed, these procedures are capable of reducing the
reduction and variable selection simultaneously. variance and can also perform variable selection. Some of
Extensive simulation results and real life data these procedures like the least absolute shrinkage selection
performance show that the proposed methods perform Operator (LASSO), SCAD (smoothly clipped absolute
better than the conventional methods. deviation) (Fan and Li, 2001)[2] and the MCP (minimax
concave penalty) (Zhang, 2010) [3] enable variable selection
Keywords:- Regularization Techniques, Screening such that only the important predictor variables stay in the
Technique, LASSO DC–RoSIS, SCAD DC – RoSIS. model (Szymczack, et al., 2009)[1].

I. INTRODUCTION The high volume of data currently processed due to the


great evolution in social media and other data intensive tasks
Regression analysis, a form of predictive modeling has led to the collection of extremely large or ultra-high
technique mostly used in investigating relationship between dimensional covariates. This makes conventional
a dependent variable and a set of predictors, is a widely regularization techniques fail or underperform well due to
known technique for fitting models to data. It is a reliable expediency and algorithmic stability (Fan, Samworth and
method of identifying which variables have impact on or Wu, 2009)[4]. This has led to the use filtering techniques like
greatly influence the problem of interest. To significantly screening, which naturally focuses on the extremes and
explain the functional relationship between the predictor consistently outperform the usual form of regression
variables and the outcome variables, one would need to analysis. These screening techniques further reduces the
select a parsimonious model in other to achieve a good computational complexity in selecting important covariates
prediction performance. When models are fitted by least from ultrahigh dimensional candidates. Such techniques are
squares regression each additional useful covariates adds to the SIS (Sure Independence Screening) (Fan and Lv
the actual variance of the final regression equation. In 2008)[5], DC-SIS (SIS based on Distance Correlation) (Li,
medical studies or clinical research, it is common to collect Zhong and Zhu 2012)[6], DC – RoSIS (Robust SIS based on
data with numerous variables, however the number of Distance Correlation) (Zhong et al, 2016)[7] .
observations may be small due to cost or constraints.
Datasets with more variables (features) are known as high When the covariate dimension is high in regression
dimensional. When the covariates dimension is high, it is modelling, it is natural to assume that some covariates are
natural to assume that some covariates are irrelevant. irrelevant. The presence of irrelevant covariates may
Specifically, when the number of covariates (predictors) 𝑝 substantially deteriorate the precision of parameter

IJISRT23SEP1925 www.ijisrt.com 2479


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
estimation and the accuracy of response prediction (Altham, methods have their shortcomings ranging from being
1984)[8]. In the context of linear regression or generalized impractical, poor performance, to algorithm instability. It is
linear regression, many regularization methods and general expected that incorporating screening with these methods
penalty functions have been proposed to remove irrelevant will reduce the computational complexity in selecting
covariates and simultaneously estimate the nonzero important covariates from ultrahigh dimensional settings
coefficients. However, when there are outliers in the leading to improved performance and more stable
response data, the above-mentioned techniques do not computations. We perform extensive simulation and on real
perform optimally. Freue et al (2019)[9] introduced penalized life data demonstration to evaluate the performance of the
M-Estimation technique for high dimensional data with proposed techniques viz-a-viz existing alternatives.
outliers in the response data. However, each of these

II. METHODOLOGY

This section presents the methodology employed in this paper with a focus on the traditional linear regression techniques.

 Linear Regression
Consider the multiple linear regression models where 𝑌 denote the response variable (also called the dependent variable) and
𝑋1 , 𝑋2 …, 𝑋𝑝 , denote the explanatory variables (also called predictors, features or independent variables). The relationship
between 𝑌 and 𝑋1 , 𝑋2 …, 𝑋𝑝 can be expressed as

𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 + 𝜀 (1)

The parameters 𝛽0 , 𝛽1 … 𝛽𝑝 are called regression coefficients and ε is the random error term

Given a data set {𝑦𝑖, 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑝, }𝑛𝑖=1 of 𝑛 statistical units, each statistical unit can be expressed as

𝑌𝑖 = 𝛽0 + 𝛽𝑖 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑝 𝑋𝑖𝑝 + 𝜀𝑖 , 1,2, … , 𝑛 (2)

Where 𝑦𝑖 is the 𝑖 th response observation, 𝛽0 , 𝛽1, 𝛽2, … , 𝛽𝑝 are the unknown parameters and

𝜀𝑖 ~𝑁(0, 𝛿𝑖2 ) . Often those 𝑛 equations can be rewritten in vector form as

𝑌 = 𝑋𝛽 + 𝜀 (3)

 𝑋 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑑𝑒𝑠𝑖𝑔𝑛 𝑚𝑎𝑡𝑟𝑖𝑥


 𝑌 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑟𝑒𝑠𝑝𝑜𝑛𝑠e vector
 𝛽 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑣𝑒𝑐𝑡𝑜𝑟
 ε is the error vector

 Assumptions of Multiple Linear Regression

 Linearity:
The relationship between the explanatory variables and the response variable is linear. This is the only restriction on the
parameters (not explanatory variables), since the explanatory variables are regarded as fixed values.

 Independence:
There are two types of independence.

 Each combination of explanatory variable and error is independent.


 The error terms are independent. Therefore, 𝐶𝑜𝑟(𝜀𝑖, 𝜀𝑗 ) = 0 for all 𝑖 ≠ 𝑗.

 Normality:
The error terms follow normal distribution.

𝜀𝑖 ~𝑁(0, 𝛿𝑖2),

IJISRT23SEP1925 www.ijisrt.com 2480


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Where

𝜎2 0 ⋯ 0
𝛿2 = ( 0 𝜎2 ⋯ 0)
⋮ ⋮ ⋱ ⋮
0 0 … 𝜎2

 Equal Variance:
Error terms are assume to have equal variances.

𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖

𝑉𝑎𝑟(𝑌𝑖 ) = 𝜎 2 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖

The ordinary Least Squares (OLS) is the traditional technique used to estimate the parameters of the multiple linear
regression model. The OLS estimator, which minimizes the residual sum of squares,

𝑅𝑆𝑆 = (𝑌 − 𝑋𝛽)𝑇 (𝑌 − 𝑋𝛽) (4)

Is given as

𝛽̂0𝐿𝑆 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌.

 Penalization Methods
We consider a linear regression model given with 𝑛 observations on a dependent variable 𝑌 having p predictors. Penalized
regression approaches have been used in cases where 𝑝 < 𝑛, and in the case with 𝑝 ≥ 𝑛. In general, the Penalized Least Squares
(PLS) is aimed at minimizing Residual Sum of Squares

(𝑌 − 𝑋𝛽)𝑇 (𝑌 − 𝑋𝛽)
𝑇
Subject to 𝑃𝑒𝑛(𝛽) ≤ 𝑡, where 𝑃𝑒𝑛(𝛽) (specific penalty) is a function of 𝛽 = (𝛽0, 𝛽1, … , 𝛽𝑝 ) and 𝑡 is a tuning parameter.
This constrained optimization problem can be solved with the equivalent Lagrangian formulation which minimizes.
𝑇
𝑃𝐿𝑆 = 𝑂𝐿𝑆 + 𝑃𝑒𝑛𝑎𝑙𝑡𝑦 = (𝑌 − 𝑋𝛽) (𝑌 − 𝑋𝛽)

+𝜆𝑃𝑒𝑛(𝛽) (5)

Where 𝜆 is a tuning parameter and controls the strength of shrinkage. For example,𝜆 = 0, no penalty is applied and we have
the ordinary least squares regression. When 𝜆 gets larger, more weight is given to the penalty term. Desirable properties of
penalization include variable selection and grouping effect.

 LASSO Penalty
The Least Absolute Shrinkage and Selection Operator (LASSO) regression method was introduced by Tibshirani (1996) as
an estimation and variable selection method. It is also called L1 penalized regression. The LASSO is a penalized least squares
procedure that minimizes RSS subject to the non-differentiable constraint expressed in terms of the L1 norm of the coefficients.
The penalty function is given by
𝑝

𝑃𝑒𝑛(𝛽) = 𝜆 ∑|𝛽𝑖 | (6)


𝑖=1

The objectives is to minimize


𝑝

𝛽̂𝐿𝐴𝑆𝑆𝑂 = 𝑎𝑟𝑔𝑚𝑖𝑛𝛽𝜖𝑅𝑝 (𝑌 − 𝑋𝛽) 𝑇 (𝑌


− 𝑋𝛽) + 𝜆 ∑|𝛽𝑖 | (7)
𝑖=1

Where 𝜆 is a non-negative regularization parameter.

IJISRT23SEP1925 www.ijisrt.com 2481


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Since the LASSO penalty term is no longer quadratic, there is no explicit formula for the mean squared error of the LASSO
estimator. Generally, the 𝐵𝑖𝑎𝑠(𝛽̂𝐿𝐴𝑆𝑆𝑂 ) also increases as the tuning parameter 𝜆 increases, while the variance, 𝑉𝑎𝑟(𝛽̂𝐿𝐴𝑆𝑆𝑂 )
decreases. For instance

Where 𝜆 = 0
𝑀𝑆𝐸(𝛽̂𝐿𝐴𝑆𝑆𝑂 ) = 𝑀𝑆𝐸(𝛽̂𝑂𝐿𝑆 ).
And when 𝜆 → ∞
𝑀𝑆𝐸(𝛽̂𝐿𝐴𝑆𝑆𝑂 ) = 𝑡𝑟𝑎𝑐𝑒 (𝑉𝑎𝑟(𝛽̂𝐿𝐴𝑆𝑆𝑂 )) + 𝐵𝑖𝑎𝑠 𝑇 (𝛽̂𝐿𝐴𝑆𝑆𝑂 )𝐵𝑖𝑎𝑠(𝛽̂𝐿𝐴𝑆𝑆𝑂 ) → 0.

Since 𝐵𝑖𝑎𝑠 𝑇 (𝛽̂𝐿𝐴𝑆𝑆𝑂 )𝐵𝑖𝑎𝑠(𝛽̂𝐿𝐴𝑆𝑆𝑂 )𝑎𝑛𝑑 𝑡𝑟𝑎𝑐𝑒 (𝑉𝑎𝑟(𝛽̂𝐿𝐴𝑆𝑆𝑂 )) move to opposite directions as the tuning parameter 𝜆
increases, thus, we can choose an optimal value of the parameter 𝜆 that minimizes 𝑀𝑆𝐸(𝛽̂𝐿𝐴𝑆𝑆𝑂 ).

 The Smoothly Clipped Absolute Deviation (SCAD)


The SCAD penalty (Fan and Li, 2001) is
𝑝
𝑃𝑒𝑛𝑆𝐶𝐴𝐷 (𝛽) = ∑𝑖=1 p𝜆 (𝛽𝑖 ) (8)

Where

𝑎𝜆|𝛽𝑖 | − (𝛽𝑖2 + 𝜆2 )/2 (𝑎 + 1)𝜆2


p𝜆 (𝛽𝑖 ) = 𝜆|𝛽𝑖 |𝐼(𝑂 ≤ 𝜆) + 𝐼(𝜆 ≤ |𝛽𝑖 | ≤ 𝑎𝜆) + 𝐼(|𝛽𝑖 | > 𝑎𝜆), 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑎 > 2, 𝜆 > 0
𝑎−1 2

Where 𝐼(∙) is the indicator function and a = 3.7 is suggested by Fan and Li (2001).

The SCAD estimator 𝛽̂𝑆𝐶𝐴𝐷 is given as the minimizer of

𝐿(𝜆1 , 𝜆2 , 𝛽) = (𝑌 − 𝑋𝛽)𝑇 (𝑌 − 𝑋𝛽) + 𝑃𝑒𝑛𝑆𝐶𝐴𝐷 (𝛽) (9)

 Penalized M-Estimation
It is common to for the response variable in a regression problem to contain outliers. The OLS procedure and penalized
methods discussed earlier do not perform adequately when there are outliers in the response data. One robust approach that
handles the problem of outliers is M-Estimation. The letter M indicates that M estimation is an estimation of the maximum
likelihood type. M estimation principle is to minimize the residual function.
𝑝
𝑦𝑖 − 𝛽0 − ∑𝑗=1 𝛽𝑗 𝑥𝑖𝑗
𝛽̂𝑀 = min 𝜌 ( ), (10)
𝛽 𝜎

Where 𝜌 is some function with the following properties:

 𝜌(𝑟) ≥ 0 for all r and has a minimum at 0


 𝜌(𝑟) = 𝜌(−𝑟) for all 𝑟
 𝜌(𝑟) increases as r increases from 0, but doesn’t get too large as 𝑟 increases

If the 𝜌 function can be differentiated, the M-estimator is said to be a 𝜓-type. Otherwise, the M-estimator is said to be a 𝜌-
type. Note that the OLS estimator is a special case of the M-estimator.

Common 𝜌 functions are the Tukey’s bisquare, Andrew’s and Huber’s functions. Tukey’s 𝜌 function is given as

𝑟𝑖2 𝑟𝑖4 𝑟𝑖6


− 2+ 4, 𝑖𝑓 |𝑟𝑖 | ≤ 𝑐
𝜌(𝑟𝑖 ) = 22 2𝑐 6𝑐 ,
𝑐
{6 , 𝑖𝑓 |𝑟𝑖 | > 𝑐

Where 𝑐 is a constant.

Huber’s 𝜌 function is given as

IJISRT23SEP1925 www.ijisrt.com 2482


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
1 2
𝑟𝑖 , 𝑖𝑓 |𝑟𝑖 | < 𝑐
𝜌(𝑟 ) = {2
𝑖 .
1
𝑐 |𝑟𝑖 | − 𝑐 2 , 𝑖𝑓 |𝑟𝑖 | ≥ 𝑐
2

Andrew’s 𝜌 function is given as

1 − cos(𝑟𝑖 ) , 𝑖𝑓 |𝑟𝑖 | ≤ 𝜋
𝜌(𝑟𝑖 ) = { .
0, 𝑖𝑓 |𝑟𝑖 | > 𝜋

The M-estimation algorithm using the Tukey’s bisquare function is given as follows:

 Estimate regression coefficients 𝛽0 on the data using OLS.


 Calculate residual value 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 .
 Calculate value 𝜎̂𝑖 = 1.4826 𝑀𝐴𝐷(𝑒1 , … , 𝑒𝑛 ), where 𝑀𝐴𝐷(𝑒1 , … , 𝑒𝑛 ) = 𝑀𝑒𝑑𝑖𝑎𝑛|𝑒𝑖 − 𝑀𝑒𝑑𝑖𝑎𝑛(𝑒1 , … , 𝑒𝑛 )|.
𝑒
 Calculate value 𝑟𝑖 = 𝑖 .
𝜎
̂𝑖
 Calculate the weighted value
𝑟𝑖 2 2

{[1 − ( ) ] , 𝑖𝑓 |𝑟𝑖 | ≤ 4.685


 𝑤𝑖 = 4.685
0, 𝑖𝑓 |𝑟𝑖 | > 4.685
 Calculate 𝛽̂𝑀 using weighted least squares (WLS) method with weights 𝑤𝑖 .
 Repeat steps 2-6 to obtain a convergent value of 𝛽̂𝑀 . Note that at step 2, 𝑒𝑖 is recalculated based on the fitted model in the
current iteration.

While the M-estimation technique may be robust against outliers, it doesn’t cater for other problems associated with
regression such as high- dimensionality and multicollinearity (Freue et al, 2019). In order to solve the problem of high-
dimensionality or multicollinearity a penalized M-Estimation procedure may be used.

A penalized M-Estimator is defined as the minimizer of


𝑝
𝑦𝑖 − 𝛽0 − ∑𝑗=1 𝛽𝑗 𝑥𝑖𝑗
𝜌( ) + 𝜆𝑃𝑒𝑛(𝛽), (11)
𝜎

Freue et al (2019) introduced efficient algorithms for penalized M-Estimators using the LASSO and Elastic-Net penalties.
The pense R package contains implementation of M-Estimation using the LASSO and Elastic-Net penalties.

 Robust Variable Screening based on Distance Correlation (DC-RoSIS)


In this study, a robust feature screening procedure for regression models using distance correlation proposed by Zhong et al
(2016) will be adopted. The definition of distance correlation according to Szekely et al (2007) is given as follows: the distance
covariance between random variables 𝑋 and 𝑌 is

𝑑𝑐𝑜𝑣 2 (𝑋, 𝑌 ) = 𝑆1 + 𝑆2 − 2𝑆3 , (12)

Where 𝑆1 = 𝐸(|𝑋 − 𝑋̃||𝑌 − 𝑌̃ |), 𝑆2 = 𝐸(|𝑋 − 𝑋̃||𝑌 − 𝑌̃|), 𝑆3 = 𝐸(|𝑋 − 𝑋̃||𝑌 − 𝑌̃|), and (𝑋̃, 𝑌̃) is an independent copy of
(𝑋, 𝑌). The distance correlation between 𝑋 and 𝑌 is

𝑑𝑐𝑜𝑣(𝑋, 𝑌)
𝑑𝑐𝑜𝑟𝑟(𝑋, 𝑌) = (13)
√𝑑𝑐𝑜𝑣(𝑋, 𝑌) 𝑑𝑐𝑜𝑣(𝑋, 𝑌)

Szekely et al (2007) pointed out that 𝑑𝑐𝑜𝑟𝑟(𝑋, 𝑌) = 0 if and only if 𝑋 and 𝑌 are independent and 𝑑𝑐𝑜𝑟𝑟(𝑋, 𝑌 ) is strictly
increasing in the absolute value of the Pearson correlation between 𝑋 and 𝑌. Motivated by these properties, Li et al (2012)
proposed a sure independence screening to rank all predictors using their distance correlations with the response variable, termed
DC-SIS, and proved its sure screening property for ultrahigh-dimensional data.

Following Zhong et al (2016), let 𝑋𝑘 denote the 𝑘 𝑡ℎ predictor with 𝑘 = 1, . . . , 𝑝𝑛 , this work proposes to quantify the
importance of 𝑋𝑘 is through its distance correlation with the marginal distribution function of 𝑌, denoted by 𝐹(𝑌). That is,

𝜔𝑘 = 𝑑𝑐𝑜𝑟𝑟{𝑋𝐾 , 𝐹(𝑌)},

IJISRT23SEP1925 www.ijisrt.com 2483


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Where 𝐹(𝑌) = 𝐸 {𝟏(𝑌 ≤ 𝑦)}. This is a modification of the marginal utility in Li et al (2012) in that here 𝐹(𝑌 ) is used
instead of 𝑌.

The distance correlation has several advantages compared with existing measurements: 𝑑𝑐𝑜𝑟𝑟{𝑋𝑘 , 𝐹(𝑌 )} = 0 if and only if
𝑋𝑘 and 𝑌 are independent, and following Li et al (2012), we can see that the screening procedure is model-free and hence is
applicable for both dense and sparse situations ; since 𝐹(𝑌) is a bounded function for all types of 𝑌, it can be expected that the
procedure has a reliable performance when the response is the heavy-tailed and when extreme values are present in the response
values; If one suspects that the covariates also contain some extreme values, then one can use 𝜔𝑘𝑏 = 𝑑𝑐𝑜𝑟𝑟{𝐹𝑘 (𝑋𝑘 ), 𝐹(𝑌 )} to rank
the importance of the 𝑋𝑘 , where 𝐹𝑘 (𝑥) = 𝐸 {𝟏(𝑋𝑘 ≤ 𝑥)}.

Zhong et al (2016) showed how to implement the marginal utility in the screening procedure as follows. Let {(𝑿𝒊 , 𝑌𝑖 ), 𝑖 =
1,··· , 𝑛} be a random sample from the population (𝑿, 𝑌). The distance covariance between 𝑋𝑘 and 𝐹(𝑌 ) is first estimated through
the moment estimation method,

̂ 2 {𝑋𝑘 , 𝐹(𝑌)} = 𝑆̂𝑘,1 + 𝑆̂𝑘,2 − 2𝑆̂𝑘,3 ,


𝑑𝑐𝑜𝑣 (14)

Where
𝑛 𝑛
1
𝑆̂𝑘,1 = 2 ∑ ∑|𝑋𝑖𝑘 − 𝑋𝑗𝑘 ||𝐹𝑛 (𝑌𝑖 ) − 𝐹𝑛 (𝑌𝑗 )|,
𝑛
𝑖=1 𝑗=1

𝑛 𝑛 𝑛 𝑛
1 1
𝑆̂𝑘,2 = ∑ ∑|𝑋𝑖𝑘 − 𝑋𝑗𝑘 | 2 ∑ ∑|𝐹𝑛 (𝑌𝑖 ) − 𝐹𝑛 (𝑌𝑗 )|,
𝑛2 𝑛
𝑖=1 𝑗=1 𝑖=1 𝑗=1

And
𝑛 𝑛 𝑛
1
𝑆̂𝑘,3 = ∑ ∑ ∑ |𝑋𝑖𝑘 − 𝑋𝑙𝑘 ||𝐹𝑛 (𝑌𝑖 ) − 𝐹𝑛 (𝑌𝑗 )|
𝑛3
𝑖=1 𝑗=1 𝑙=1

Are the corresponding estimators of 𝑆𝑘,1 , 𝑆𝑘,2 , 𝑆𝑘,3 , 𝑎𝑛𝑑 𝐹𝑛 (𝑦) = 𝑛−1 ∑𝑛𝑖=1 1 (𝑌𝑖 ≤ 𝑦). We estimate 𝜔𝑘 with

̂ (𝑋𝑘 , 𝐹(𝑌))
𝑑𝑐𝑜𝑣
𝜔 ̂ {𝑋𝑘 , 𝐹(𝑌)} =
̂𝑘 = 𝑑𝑐𝑜𝑟𝑟
̂ (𝑋𝑘 , 𝑋𝑘 ) 𝑑𝑐𝑜𝑣
√𝑑𝑐𝑜𝑣 ̂ (𝐹(𝑌), 𝐹(𝑌))

Larger than a user-specified threshold. Let A ̂ =≤ {k ∶ 𝜔 ̂𝑘 ≥ cn−κ , for 1 ≤ k ≤ p𝑛 } . The independence screening
procedure retains the covariates with the 𝜔𝑘 values for some pre-specified thresholds c > 0 and 0 κ < 1/2. The constants c and κ
control the signal strength (see Zhong et al, 2016). Zhong et al (2016) referred to this approach as the distance correlation based
robust independence screening procedure (DC-RoSIS).

̂𝑘𝑏 which is based on the marginal distribution function of both 𝑋 and 𝑌 is


Additionally, in this study, an estimate of 𝜔
introduced and is defined as

̂ (𝐹(𝑋𝐾 ), 𝐹(𝑌))
𝑑𝑐𝑜𝑣
̂ {𝐹(𝑋𝐾 ), 𝐹(𝑌)} =
̂𝑘𝑏 = 𝑑𝑐𝑜𝑟𝑟
𝜔
̂ (𝐹(𝑋𝐾 ), 𝐹(𝑋𝐾 )) 𝑑𝑐𝑜𝑣
√𝑑𝑐𝑜𝑣 ̂ (𝐹(𝑌), 𝐹(𝑌))

Where,

̂ 2 (𝐹(𝑋𝐾 ), 𝐹(𝑌)) = 𝑆̂𝑘,1


𝑑𝑐𝑜𝑣 𝑏
+ 𝑆̂𝑘,2
𝑏
− 2𝑆̂𝑘,3
𝑏
,
𝑛 𝑛
1
𝑆̂𝑘,1
𝑏
= 2 ∑ ∑|𝐹𝑛 (𝑋𝑖𝑘 ) − 𝐹𝑛 (𝑋𝑗𝑘 )||𝐹𝑛 (𝑌𝑖 ) − 𝐹𝑛 (𝑌𝑗 )|,
𝑛
𝑖=1 𝑗=1

IJISRT23SEP1925 www.ijisrt.com 2484


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
𝑛 𝑛 𝑛 𝑛
1 1
𝑆̂𝑘,2
𝑏
= 2
∑ ∑|𝐹𝑛 (𝑋𝑖𝑘 ) − 𝐹𝑛 (𝑋𝑗𝑘 )| 2 ∑ ∑|𝐹𝑛 (𝑌𝑖 ) − 𝐹𝑛 (𝑌𝑗 )|,
𝑛 𝑛
𝑖=1 𝑗=1 𝑖=1 𝑗=1

And
𝑛 𝑛 𝑛
1
𝑆̂𝑘,3
𝑏
= ∑ ∑ ∑ |𝐹𝑛 (𝑋𝑖𝑘 ) − 𝐹𝑛 (𝑋𝑗𝑘 )||𝐹𝑛 (𝑌𝑖 ) − 𝐹𝑛 (𝑌𝑗 )|
𝑛3
𝑖=1 𝑗=1 𝑙=1

̂𝑘𝑏 may produce better results if the covariates also contain some extreme values.
The use of 𝜔

 Sure Screening Property of DC-RoSIS


We first state the consistency of 𝜔
̂𝑘 screening property of the DC-RoSIS procedure, which paves the road to proving the
sure screening property of the DC-RoSIS procedure.

 Theorem 1. Under the condition (C1) that there exist positive constants 𝑡𝑜 and 𝐶 such that
max 𝐸{exp(𝑡|𝑋𝑘 |)} ≤ 𝐶 < ∞, 𝑓𝑜𝑟 0 < 𝑡 ≤ 𝑡0 , for any 0 < 𝛾 < 1/2 − 𝜅, there exist positive constants 𝑐1 and 𝑐2 such that
1≤𝑘≤𝑃𝑛

̂𝑘 − 𝜔𝑘 | ≥ 𝑐𝑛−𝑘 ) ≤ 𝑂(𝑝[𝑒𝑥𝑝{−𝑐1 𝑛1−2(𝑘+𝛾) } + 𝑛 exp(−𝑐2 𝑛𝛾 )]),


𝑃𝑟 ( max |𝜔 (15)
1≤𝑘≤𝑝

We remark here that to derive the consistency of the estimated marginal utility, we do not need any moment condition on the
response. To prove the sure screening property, we make use of further assumption (C6) - the marginal utility satisfies min 𝜔𝑘 ≥
𝑘∈𝐴
2𝑐𝑛−𝜅 , for some constants 𝑐 > 0 𝑎𝑛𝑑 0 ≤ 𝜅 < 1/2.

Condition (C6) allows the minimal signal of the active covariates to converge to zero as the sample size diverges, while it
requires the minimum signal of active covariates be not too small.

 Theorem 2 (Sure Screening Property). Under (C6) and the conditions in Theorem 1, it follows that 𝑃𝑟 (𝐴 ⊆ 𝐴̂) ≥ 1 −
1−2(𝑘+𝛾)
𝑂(𝑠𝑛 [𝑒𝑥𝑝{−𝑐1𝑛 } + 𝑛𝑒𝑥𝑝(−𝑐2 𝑛𝛾 )]), where 𝑠𝑛 is the cardinality of 𝐴. Thus, 𝑃𝑟 (𝐴 ⊆ 𝐴̂) → 1 as 𝑛 → ∞.

III. THE PROPOSED DC-ROSIS PENALIZED


REGRESSION From a practical perspective, when the covariate
dimension is extremely large, it is hoped that the DCRoSIS
In this paper, we propose some new estimators by offers a useful complement to penalized regression since it
combining the DC-RoSIS with some penalized regression helps to reduce the computational complexity in selecting
estimators, namely, the LASSO, SCAD and MCP. We important covariates from ultrahigh dimensional candidates.
achieve this by first utilizing the DC-RoSIS to select 𝑑 =
𝑛 Since 𝐹 is bounded and monotone, we can reasonably
2 [𝑙𝑜𝑔(𝑛)] (see Zhong et al, 2016) top ranked covariates and
expect that the procedure still works in the presence of
then applying penalized linear regression to estimate the outliers or extreme values in the covariate or response
direction of 𝛽 . The combination gives rise to LASSO- variable. It is computationally efficient and hence offers a
DCRoSIS, SCAD-DCRoSIS and LASSO-M-DCRoSIS. useful complement, rather an alternative, to the penalized
Hence, the proposed method is a two-stage method. First, regression approach since the proposed independence
DCRoSIS is used to to reduce the covariate dimension to a screening can precede the penalized regression when the
moderate scale and then, based on the reduced model, latter fails to produce a reliable solution within a tolerable
penalized linear regression further estimates and refines time. Zhong et al (2016) showed that this new
selection of important covariates. independence screening procedure has the sure screening
property even when 𝑝 is ultrahigh.
The need for this hybridization stems from the fact that
from a practical perspective, when the covariate dimension  The LASSO-DCRoSIS Penalized Regression
is extremely large, it is hoped that the DCRoSIS offers a Considering the model given by (3), 𝑋 is the matrix
useful complement to penalized regression since it helps to with 𝑝 columns representing all the predictors. The
reduce the computational complexity in selecting important
DCRoSIS technique is used to compute 𝜔 ̂𝑘𝑏 ), 𝑗 =
̂𝑗 (or 𝜔
covariates from ultrahigh dimensional candidates. More so
in our previous work (Buba, Usman, Musa, and Hamza, ̂𝑗 ’s are ranked. Let 𝑋𝐴 denote the
1, … , 𝑝. Thereafter, the 𝜔
2023), the hybridization of Elastic Net, SCAD and MCP matrix with columns containing the top 𝑑 predictors
gave rise to some visible improvements. corresponding to the top 𝑑 ranked 𝜔 ̂𝑗 ’s. Also, let 𝛽𝐴 =

IJISRT23SEP1925 www.ijisrt.com 2485


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

(𝛽𝐴 0 , 𝛽𝐴 1 , 𝛽𝐴 2 , … , 𝛽𝐴 𝑝 ) denote the regression coefficients (Fu, 1998), proximal methods (Beck and Teboulle, 2009)
and quadratic solver (Grandvalet et al, 2017).
associated with 𝑋𝐴 . Then,
 The SCAD-DCRoSIS Penalized Regression
The minimization problem given by (20) can be solved
Given that the earlier definitions of 𝑋 , 𝑋𝐴 and 𝛽𝐴
by a number of algorithms including as coordinate descent
remain unchanged. Then, the SCAD-DCRoSIS estimator
𝛽̂SCAD−DCRoSIS is given as
𝑝

𝛽̂SCAD−DCRoSIS = argmin𝛽𝐴𝜖𝑅𝑝 (𝑌 − 𝑋𝐴 𝛽𝐴 )𝑇 (𝑌 − 𝑋𝐴 𝛽𝐴 ) + ∑ p𝜆 (𝛽𝐴 𝑖 ) , (16)


𝑖=1

Where,

𝛽𝐴 2𝑖 + 𝜆2
𝑎𝜆|𝛽𝐴 𝑖 | − 2 (𝑎 + 1)𝜆2
p𝜆 (𝛽𝐴 𝑖 ) = 𝜆|𝛽𝐴 𝑖 |𝐼(0 ≤ 𝜆) + 𝐼(𝜆 ≤ |𝛽𝐴 𝑖 | ≤ 𝑎𝜆) + 𝐼(|𝛽𝐴 𝑖 | > 𝑎𝜆),
𝑎−1 2

For some 𝑎 > 2, 𝜆 > 0 and 𝐼(∙) is the indicator function. The minimization problem in (22) can be solved using coordinate
descent algorithms.

 The LASSO-M-DCRoSIS Penalized Regression


Given that 𝑋, 𝑋𝐴 and 𝛽𝐴 are as earlier defined. Then, the LASSO-M-DCRoSIS estimator 𝛽̂LASSO−M−DCRoSIS is given as
𝑝
𝑌 − 𝑋𝐴 𝛽𝐴
𝛽̂LASSO−M−DCRoSIS = argmin𝛽𝐴𝜖𝑅𝑝 𝜌 ( ) + 𝜆 ∑|𝛽𝐴 𝑖 | , (17)
𝜎
𝑖=1

Where 𝜌(∙) is the Tukey’s bisquare function defined in section 3.3.

The minimization problem in (24) can be solved by a weighted LASSO least squares technique proposed by Freue et al (2019).

IV. ANALYSIS AND RESULTS In this case, the simulated data sets consist of 𝑛/10𝑛/
100 observations and 200 predictors and we set 𝛽 =
This section presents details description of the (5, ⏟… ,0), 𝑛 = 100, 𝜎 = 12 and 𝜌(𝑖, 𝑗) = 0.5|𝑖−𝑗| for
⏟… ,5 , 0,
proposed LASSO-DCRoSIS, LASSO-M-DCRoSIS and 20 180
SCAD-DCRoSIS. The section also shows the results of the all 𝑖, 𝑗.
evaluation of the proposed hybrid methods against
themselves and other classical methods under different  Case 2
sample size settings and outlier severity. It is worthy to note In this case, a linear model only is considered and is
that all implementations of the methods, simulations and
computations were carried out using R(R Core Team, 2019) 𝑌𝑖 = 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + 𝛽7 𝑋𝑖7 + 𝜀𝑖 , 𝑖 = 1,2, … , 𝑛.
while tables and plots are used to present the results.
𝑇
𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑝 ) was generated from 𝒩(0, Σ) ,
 Simulation Design
where Σ = (𝜎𝑖𝑗 ) with 𝜎𝑖𝑗 = 0.5|𝑖−𝑗| . Here, 𝑝 was set to
The performances of the LASSO-DCRoSIS, LASSO- 𝑝×𝑝
M-DCRoSIS and SCAD-DCRoSIS for variable selection 1000 and 𝑛 = 50,100 and 200. It should be noted that out
and estimation are evaluated via simulation at various of the 1000 generated covariates, only three (𝑋1 , 𝑋2 and 𝑋7 )
sample sizes and level of contamination by outliers. Each are useful in the model. Hence, 𝛽 was set such that 𝛽 =
simulated data consists of a training set for fitting the model, (3,1.5, 0,0,0,0,2,0 … , 0)𝑇 .
a validation set for selecting the tuning parameters, and a
test set on which the test errors are computed for evaluation  Case 3:
of performance. The notation ·/·/· is used to represent the In this case, the simulated data sets consist of 𝑛/10𝑛/
number of observations in the training, validation and test 200 observations and 1000 predictors and we set 𝛽 =
set, respectively. (0,
⏟… ,0 , 2,
⏟… ,2 , 0, ⏟… ,2), 𝑛 ∈ { 50, 100} , 𝜎 = 2 and
⏟… ,0 , 2,
485 15 485 15
 Case 1 𝜌(𝑖, 𝑗) = 0.5|𝑖−𝑗| for all 𝑖, 𝑗 . In this case there are 1000
The true underlying regression model from which we sparse grouped predictors with only 30 being relevant.
simulate data is given by
 Case 4:
𝑌 = 𝑋 𝑇 𝛽∗ + 𝜎 ∗ 𝜖, 𝜖~𝑁(0,1). In this case, the simulated data sets consisting of
𝑛/10𝑛/200 observations and 1000 predictors and we set

IJISRT23SEP1925 www.ijisrt.com 2486


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
𝛽 = (3, ⏟… ,0) , 𝑛 ∈ { 50, 100} and 𝜎 = 15 . The
⏟… ,3 , 0,  𝑆: the average number of non-zero estimated regression
15 985 coefficients
predictors 𝑋 are generated as follows:  𝑆𝐸: the absolute difference between 𝑆 and the actual size
of the model defined here by |𝑆 − 𝑇𝑆|, where 𝑇𝑆 is the
𝑋𝑖 = 𝑍1 + 𝑤𝑖𝑥 , 𝑍1 ~𝑁(0,1), 𝑖 = 1, … ,5, true model size.
𝑋𝑖 = 𝑍2 + 𝑤𝑖𝑥 , 𝑍2 ~𝑁(0,1), 𝑖 = 6, … ,10,  𝐶 : the average number of truly non-zero coefficients
𝑋𝑖 = 𝑍3 + 𝑤𝑖𝑥 , 𝑍3 ~𝑁(0,1), 𝑖 = 11, … ,15. correctly estimated to be non-zero
 𝐼𝐶 : the average number of truly zero coefficients
𝑋𝑖 are independent identically distributed (iid) incorrectly estimated to be non-zero
𝑁(0,1), for 𝑖 = 16, … ,1000 and 𝑤𝑖𝑥 are iid 𝑁(0,0.01). This  𝑀𝑆𝐸𝑌 : prediction mean-squared errors defined as
setting implies there are three equally important groups with 1 2
each containing 5 members. Under each case, the situation
𝑇
‖𝑌𝑡𝑒𝑠𝑡 − 𝑋𝑡𝑒𝑠𝑡 𝛽̂‖
𝑛𝑡𝑒𝑠𝑡
where the observations on the response variable 𝑌 contain  𝑀𝑆𝐸𝛽 : mean-squared errors of estimates defined as
outliers are also considered. In order to contaminate 𝑌 with ‖𝛽̂ − 𝛽‖
2
outliers, the error 𝜀𝑖 , 90% of the errors were independently
generated from 𝒩(0,1) and while the remaining 10% were  𝐴𝐸 : the total average absolute estimation error of 𝛽̂ ,
𝑝
generated from 𝑁(20,2). defined by ∑ |𝐸(𝛽̂𝑗 ) − 𝛽𝑗 |.
𝑗=1

The proposed LASSO-DCRoSIS, LASSO-M-  Case 1


DCRoSIS and LASSO-DCRoSIS were applied to estimate The simulation results are presented in this section.
𝛽. To facilitate comparison, the classical LASSO and SCAD The results are based on 100 replications and the evaluation
were applied too. The data simulation, variable screening criteria are 𝑆, 𝑆𝐸, 𝐶, 𝐼𝐶, 𝑀𝑆𝐸𝑌 , 𝑀𝑆𝐸𝛽 and 𝐴𝐸.
and estimation were replicated 100 times and the
performance of the technique is evaluated based on the The simulation results are presented in this section.
following: The results are based on 100 replications and the evaluation
criteria are 𝑆, 𝑆𝐸, 𝐶, 𝐼𝐶, 𝑀𝑆𝐸𝑌 , 𝑀𝑆𝐸𝛽 and 𝐴𝐸.

Table 1 Simulation Results for Case 1 at 𝑛 = 50, 100, 150, 200, with no Outliers, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 26 6 16 10 207.348 30.567 213.200
SCAD-DCRoSIS 19 1 13 6 348.516 26.994 270.275
𝒏 = 𝟓𝟎 LASSO-M-DCRoSIS 23 3 16 8 166.630 18.358 129.842
LASSO 28 8 20 8 57.500 21.075 65.991
SCAD 17 3 10 7 547.868 29.072 463.314
LASSO-DCRoSIS 41 21 20 21 3.108 3.680 6.856
SCAD-DCRoSIS 20 0 20 0 2.050 3.271 5.810
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 31 11 20 11 1.116 3.566 5.018
LASSO 41 21 20 21 3.183 3.571 7.029
SCAD 20 0 20 0 1.638 1.790 5.288
LASSO-DCRoSIS 42 22 20 22 1.658 2.637 5.609
SCAD-DCRoSIS 20 0 20 0 0.998 0.770 4.802
𝒏 = 𝟏𝟓𝟎 LASSO-M-DCRoSIS 31 11 20 11 0.623 1.890 4.496
LASSO 43 23 20 23 1.802 2.726 5.697
SCAD 20 0 20 0 0.957 0.499 4.753
LASSO-DCRoSIS 36 16 20 16 1.145 1.033 4.830
SCAD-DCRoSIS 20 0 20 0 0.751 0.514 4.445
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 31 11 20 11 0.466 1.428 4.396
LASSO 45 25 20 25 1.186 2.197 5.155
SCAD 20 0 20 0 0.741 0.439 4.447

Simulation results when there are no outliers in the response variable for case 1 are given in Table 1. The table contains
medians of 𝑆, 𝑆𝐸, 𝐶, 𝐼𝐶, 𝑀𝑆𝐸𝑌 , 𝐴𝐸 and 𝑀𝑆𝐸𝛽 over 100 replications at sample sizes 50, 100, 150 and 200. The true size of the
model for this case is 20. In terms of variable selection SCAD and SCAD-DCRoSIS correctly select the important variables and
correctly leave out the unimportant ones. However, SCAD-DCRoSIS outperforms the SCAD in terms of estimation and prediction
at sample size 50. Also, LASSO tend to select larger models compared to the proposed LASSO-DCRoSIS and LASSO-M-
DCRoSIS. Similar behaviour can be observed at sample sizes 150 and 200.

IJISRT23SEP1925 www.ijisrt.com 2487


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Table 2 Simulation Results for case 1 at 𝑛 = 50, 100, 150, 200, with 10% Outliers in 𝑌, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 29 9 17 13 228.839 35.226 270.052
ENET-DCRoSIS 28 8 15 13 225.676 30.699 271.707
SCAD-DCRoSIS 21 1 12 9 471.181 32.602 433.153
MCP-DCRoSIS 19 1 12 8 476.287 32.941 438.463
LASSO-M-DCRoSIS 24 4 16 8 156.260 14.845 149.378
𝒏 = 𝟓𝟎
ENET-M-DCRoSIS 24 4 16 8 154.718 14.976 149.378
LASSO 44 24 18 27 224.143 30.550 270.027
ENET 47 27 19 28 143.432 28.087 200.692
SCAD 29 9 14 15 573.576 27.475 469.810
MCP 25 5 14 12 598.667 25.885 471.252
LASSO-DCRoSIS 37 17 20 14 52.831 10.445 75.917
SCAD-DCRoSIS 27 7 20 9 70.970 8.346 84.054
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 26 6 19 7 31.108 4.283 52.914
LASSO 42 22 20 22 33.792 11.560 77.480
SCAD 41 21 20 21 93.324 10.848 101.754
LASSO-DCRoSIS 33 13 20 14 10.855 6.426 53.967
SCAD-DCRoSIS 23 3 20 3 14.289 4.305 50.453
𝒏 = 𝟏𝟓𝟎 LASSO-M-DCRoSIS 27 7 20 7 0.502 1.526 45.011
LASSO 43 23 20 23 11.621 7.129 54.537
SCAD 47 27 20 27 38.879 8.527 63.101
LASSO-DCRoSIS 35 15 20 15 6.025 4.435 49.315
SCAD-DCRoSIS 20 0 20 0 7.503 2.627 46.542
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 29 9 20 9 0.363 1.323 44.757
LASSO 42 22 20 22 6.611 5.178 50.391
SCAD 31 11 20 11 10.934 5.009 49.401

Simulation results for case 1 with outliers introduced Simulation results when there are no outliers in the
into the response are given in Table 2. SCAD-DCRoSIS response variable for case 2 are given in Table 3. The true
outperforms SCAD in terms of estimation and prediction. size of this model is 3. At sample size 50, LASSO-M-
SCAD seems to be strongly affected by the presence of DCRoSIS outperforms the rest in terms of prediction and
outliers. At sample sizes 150 and 200, LASSO-M- estimation accuracy but SCAD-DCRoSIS has the best
DCRoSIS significantly outperform others showing that they performance in terms of variable selection. At sample sizes
are superior when outliers are present. 100, 150 and 200, SCAD-DCRoSIS has the best
performance in terms of variable selection, estimation and
 Case 2 prediction. In this setting, all methods correctly selects the
The simulation results are presented in this section. important variables into the model, however, larger models
The results are based on 100 replications and the evaluation are selected by LASSO and SCAD.
criteria are 𝑆, 𝑆𝐸, 𝐶, 𝐼𝐶, 𝑀𝑆𝐸𝑌 , 𝑀𝑆𝐸𝛽 and 𝐴𝐸.

Table 3 Simulation Results for Case 2 at 𝑛 = 50, 100, 150, 200, with no Outliers, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 13 10 3 10 1.797 3.296 6.049
SCAD-DCRoSIS 9 6 3 6 1.799 2.304 5.485
𝒏 = 𝟓𝟎 LASSO-M-DCRoSIS 7 4 3 4 0.462 1.629 4.524
LASSO 21 18 3 18 2.333 3.691 6.452
SCAD 17 14 3 14 2.481 2.603 5.737
LASSO-DCRoSIS 14 11 3 11 0.867 2.045 4.816
SCAD-DCRoSIS 8 5 3 5 0.301 0.909 4.209
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 9 6 3 6 0.251 1.066 4.212
LASSO 19 16 3 16 0.871 2.167 4.862
SCAD 19 16 3 16 0.466 1.297 4.408
𝒏 = 𝟏𝟓𝟎 LASSO-DCRoSIS 12 9 3 9 0.439 1.493 4.375

IJISRT23SEP1925 www.ijisrt.com 2488


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
SCAD-DCRoSIS 6 3 3 3 0.109 0.420 4.108
LASSO-M-DCRoSIS 9 6 3 6 0.156 0.844 4.251
LASSO 19 16 3 16 0.503 1.730 4.605
SCAD 12 9 3 9 0.181 0.846 4.305
LASSO-DCRoSIS 12 9 3 9 0.322 1.269 4.353
SCAD-DCRoSIS 6 3 3 3 0.092 0.314 4.075
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 9 6 3 6 0.129 0.764 4.080
LASSO 21 18 3 18 0.364 1.442 4.374
SCAD 9 6 3 6 0.110 0.480 4.086

Table 4 Simulation Results for Case 2 at 𝑛 = 50, 100, 150, 200, with 10% Outliers in 𝑌, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 6 3 1 5 12.092 7.055 58.209
SCAD-DCRoSIS 14 11 1 13 39.305 15.409 76.091
𝒏 = 𝟓𝟎 LASSO-M-DCRoSIS 6 3 3 3 0.312 1.605 45.207
LASSO 10 7 1 9 11.663 7.303 57.176
SCAD 26 23 2 24 57.192 15.742 92.875
LASSO-DCRoSIS 10 7 2 8 7.132 5.809 51.648
SCAD-DCRoSIS 29 26 2 27 33.056 15.089 71.621
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 8 5 3 5 0.101 0.787 44.190
LASSO 14 11 2 12 7.512 6.147 51.868
SCAD 47 44 2 45 54.290 17.512 91.307
LASSO-DCRoSIS 12 9 3 9 5.067 4.861 49.362
SCAD-DCRoSIS 30 27 3 27 17.286 11.647 58.306
𝒏 = 𝟏𝟓𝟎 LASSO-M-DCRoSIS 9 6 3 6 0.064 0.632 43.472
LASSO 16 13 3 13 4.951 5.122 49.192
SCAD 64 61 2 61 46.938 17.123 80.462
LASSO-DCRoSIS 13 10 3 10 2.048 3.208 46.450
SCAD-DCRoSIS 33 30 3 30 6.480 6.796 47.835
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 9 6 3 6 0.049 0.477 44.033
LASSO 20 17 3 17 2.325 3.537 46.323
SCAD 79 76 3 76 11.007 9.217 53.441

Table 4 present simulation results for case 2 with 10% outliers introduced into the response variable for case 2. Across all
sample sizes LASSO-M-DCRoSIS outperformed the rest in terms of variable selection, prediction and estimation accuracy while
SCAD produced the worst performance indicating that they don’t do well in the presence of outliers. In this setting also, SCAD
always selects larger models while all the proposed methods always select more parsimonious models compared to existing
methods.

 Case 3

Table 5 Simulation Results for Case 3 at 𝑛 = 50, 100, 150, 200, with no Outliers, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 22 8 9 12 118.178 51.043 219.490
SCAD-DCRoSIS 14 16 7 8 162.370 49.502 249.613
𝒏 = 𝟓𝟎 LASSO-M-DCRoSIS 19 11 11 8 110.334 34.909 151.395
LASSO 35 5 24 11 56.869 20.174 61.112
SCAD 18 12 7 9 125.117 53.103 249.119
LASSO-DCRoSIS 43 13 22 21 57.359 25.915 66.021
SCAD-DCRoSIS 29 1 17 12 101.739 23.268 91.621
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 36 6 23 13 44.165 13.223 38.862
LASSO 76 46 30 46 13.545 17.941 16.593
SCAD 34 4 15 19 145.221 36.416 125.094
LASSO-DCRoSIS 52 22 28 24 16.553 11.280 16.756
𝒏 = 𝟏𝟓𝟎 SCAD-DCRoSIS 37 7 27 10 18.981 6.790 14.961
LASSO-M-DCRoSIS 44 14 28 16 13.315 6.187 12.927

IJISRT23SEP1925 www.ijisrt.com 2489


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
LASSO 85 55 30 55 4.580 7.534 8.864
SCAD 50 20 22 28 71.154 21.348 39.160
LASSO-DCRoSIS 58 28 29 29 6.668 6.567 7.901
SCAD-DCRoSIS 33 3 30 3 1.875 2.657 5.695
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 50 20 29 20 5.814 3.837 7.347
LASSO 83 53 30 53 2.520 5.449 6.707
SCAD 32 2 30 2 1.170 0.832 4.804

Simulation results when there are no outliers in the all the methods except LASSO correctly selects the
response variable for case 3 are given in Table 5. The true important variables into the model at small sample sizes.
size of this model is 30 but the values of the coefficients are This is an indication that the LASSO is quite conservative in
relatively small and the importance of the corresponding terms of variable selection.
predictors may be harder to detect. At sample size 50,100,
and 150, the LASSO outperforms the rest in terms of Table 6 present simulation results for case 3 with 10%
prediction, estimation accuracy and selection of important outliers introduced into the response variable for case 2.
variables. However, at sample size 200, SCAD followed by Across all sample sizes LASSO-M-DCRoSIS outperform
SCAD-DCRoSIS have the best performance in terms of the rest in terms of prediction and estimation accuracy while
variable selection, estimation and prediction. In this setting, SCAD produced the worst performance.

Table 6 Simulation Results for Case 3 at 𝑛 = 50, 100, 150, 200, with 10% Outliers in 𝑌, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 17 13 7 10 125.476 56.116 287.257
SCAD-DCRoSIS 18 12 6 13 269.223 65.477 390.465
𝒏 = 𝟓𝟎 LASSO-M-DCRoSIS 19 11 9 9 109.214 35.836 196.114
LASSO 30 0 13 17 113.895 53.630 214.896
SCAD 36 6 0 36 829.179 93.528 1045.335
LASSO-DCRoSIS 40 10 19 21 79.008 34.611 147.307
SCAD-DCRoSIS 38 8 16 22 176.564 37.161 176.564
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 36 6 22 14 45.706 12.422 80.393
LASSO 83 53 23 60 77.679 33.938 132.499
SCAD 62 32 2 60 706.323 100.389 875.419
LASSO-DCRoSIS 52 22 27 25 41.602 22.631 84.648
SCAD-DCRoSIS 47 17 22 25 79.892 21.809 102.168
𝒏 = 𝟏𝟓𝟎 LASSO-M-DCRoSIS 44 14 27 17 17.283 6.794 55.770
LASSO 79 49 28 51 41.121 22.821 89.129
SCAD 80 50 7 73 498.179 80.622 576.196
LASSO-DCRoSIS 57 27 29 28 17.282 13.351 60.556
SCAD-DCRoSIS 50 20 29 22 25.882 9.305 59.041
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 49 19 29 19 5.486 3.556 46.806
LASSO 84 54 30 54 15.058 13.895 61.126
SCAD 85 55 21 63 96.376 26.282 103.093

 Case 4

Table 7 Simulation Results for Case 4 at 𝑛 = 50, 100, 150, 200, with no Outliers, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 10 5 5 6 404.660 6.619 5.062
SCAD-DCRoSIS 3 12 3 0 531.851 6.855 4.635
𝒏 = 𝟓𝟎 LASSO-M-DCRoSIS 9 6 5 3 373.798 7.040 4.395
LASSO 28 13 4 23 393.286 23.812 7.249
SCAD 3 12 3 0 538.689 71.909 4.378
LASSO-DCRoSIS 13 2 5 8 369.792 7.686 4.587
SCAD-DCRoSIS 3 12 3 0 538.511 7.292 4.318
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 11 4 6 5 334.217 6.225 4.292
LASSO 24 9 5 18 379.969 6.6767 5.134
SCAD 3 12 3 0 539.963 71.878 4.371
𝒏 = 𝟏𝟓𝟎 LASSO-DCRoSIS 14 1 6 8 343.050 7.792 4.505

IJISRT23SEP1925 www.ijisrt.com 2490


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
SCAD-DCRoSIS 3 12 3 0 537.645 8.817 4.023
LASSO-M-DCRoSIS 14 1 7 7 282.709 7.059 4.082
LASSO 27 12 6 21 327.214 7.855 4.770
SCAD 3 12 3 0 538.411 71.925 4.025
LASSO-DCRoSIS 15 0 7 8 302.801 5.810 4.188
SCAD-DCRoSIS 3 12 3 0 537.110 5.815 4.046
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 16 1 7 9 251.832 5.095 4.176
LASSO 28 13 6 22 303.060 4.790 4.470
SCAD 15 0 7 8 302.801 5.810 4.188

Simulation results when there are no outliers in the response variable for case 4 are given in Table 7. The true size of this
model here is 15 and the important predictors are divided into three groups such that predictors within each group are strongly
correlated. All the methods perform similarly with respect to prediction. However, LASSO, LASSO-DCRoSIS, LASSO-M-
DCRoSIS, SCAD, and SCAD-DCRoSIS tend to select one of the important variables in each group with none having the ability to
do group selection.

Table 8 Simulation Results for Case 4 at 𝑛 = 50, 100, 150, 200, with 10% Outliers in 𝑌, based on 100 Replications
𝑺 𝑺𝑬 𝑪 𝑰𝑪 𝑴𝑺𝑬𝜷 AE 𝑴𝑺𝑬𝒀
LASSO-DCRoSIS 9 6 3 6 434.349 15.012 58.394
SCAD-DCRoSIS 3 12 3 0 510.214 12.619 45.612
𝒏 = 𝟓𝟎 LASSO-M-DCRoSIS 8 7 5 3 392.046 7.673 44.489
LASSO 24 9 3 21 76.333 15.370 368.988
SCAD 3 12 3 0 537.854 73.312 43.118
LASSO-DCRoSIS 11 4 4 7 416.201 12.853 51.732
SCAD-DCRoSIS 3 12 3 0 537.374 12.021 42.660
𝒏 = 𝟏𝟎𝟎 LASSO-M-DCRoSIS 12 3 6 6 323.962 5.098 43.871
LASSO 22 7 4 18 412.615 10.085 55.300
SCAD 3 12 3 0 543.058 72.047 41.198
LASSO-DCRoSIS 12 3 4 8 407.867 8.031 49.090
SCAD-DCRoSIS 3 12 3 0 530.967 10.459 41.530
𝒏 = 𝟏𝟓𝟎 LASSO-M-DCRoSIS 13 2 7 6 277.646 4.997 44.103
LASSO 22 7 4 19 410.733 11.271 50.458
SCAD 3 12 3 0 537.219 71.810 40.594
LASSO-DCRoSIS 12 3 4 8 432.346 9.929 46.403
SCAD-DCRoSIS 3 12 3 0 537.457 12.747 41.320
𝒏 = 𝟐𝟎𝟎 LASSO-M-DCRoSIS 14 1 7 7 252.355 4.848 43.932
LASSO 23 8 4 19 413.570 14.299 46.930
SCAD 3 12 3 0 543.027 71.980 40.471

Table 8 present simulation results for case 4 with 10% The training dataset were used for model fitting and
outliers introduced into the response variable for case 4. selection of tuning parameters by10-fold cross validation.
Across all sample sizes, SCAD has the worst performance in The performance of the methods are then compared based
all criteria and just like when there were no outliers, on their prediction mean squared error (MSEy) on the test
LASSO, LASSO-DCRoSIS, LASSO-M-DCRoSIS, SCAD, dataset and number of non-zero coefficients. The process of
and SCAD-DCRoSIS, select one of the important variables data splitting, model fitting and computation of MSEy were
in each group. repeated 100 times. The results for both datasets are
summarized in Table 6.
 Application to Real Life Datasets
In this section, application of the proposed methods The boxplot and the histogram of 𝑌 (TRIM32 gene)
(LASSO-DCRoSIS, LASSO-M-DCRoSIS, and SCAD- are displayed in Figures 1 and 2. Both indicate that the
DCRoSIS on a real life dataset is considered. The dataset is response distribution may be heavy-tailed and the data
the gene expression data from the microarray experiments contain outliers.
on 120 mammalian eye tissue samples by Scheetz et al.
(2006). The dataset consist of 200 predictors which
represents 200 gene probes of 120 rats. The response is the
expression level of TRIM32 gene.

Firstly, the data were randomly split into a training set


with 100 observations, and a test set with 20 observations.

IJISRT23SEP1925 www.ijisrt.com 2491


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
accuracy. The results indicate that the LASSO and LASSO-
M-DCRoSIS yielded the same prediction error which is the
lowest but the LASSO-M-DCRoSIS selected fewer
predictors underscoring the superiority of the proposed
method for this data. The LASSO-DCRoSIS selected 12 less
predictors than the LASSO without significant loss in
prediction accuracy.

The results from this section further show that the


proposed methods perform considerably well for prediction
and variable selection.

V. CONCLUSION

Evidence abound in literature to show that traditional


screening techniques perform poorly in the presence of
Fig 1 Histogram of the Response Variable (the Expression outliers, necessitating the need to generate new approaches
Level of TRIM32 Gene) for the Gene Expression Data that improve the performance of legacy screening
techniques. In this paper, we attempt to enhance the
performance of traditional approaches (LASSO and SCAD)
we combined them with the robust screening technique
(DCRoSIS) that can do well in the presence of outliers with
a view to achieving better dimension reduction and variable
selection simultaneously. The simulation and performance
on real life data show that our proposed LASSO-DCRoSIS
performs better than the rest in both circumstances.

REFERENCES

[1]. Szymczak, S., Biernacka, J. M., Cordell, H. J.,


González-Recio, O., König, I. R., Zhang, H. and Sun,
Y. V. (2009). Machine learning in genome-wide
association studies. Genet. Epidemiol., 33, S51–S57.
[2]. Fan, J. and Li, R. (2001). Variable selection via
nonconcave penalized likelihood and it oracle
Fig 2 Boxplot of the Response Variable (the Expression properties. J. Amer. Statist. Assoc. 96, 1348-1360.
Level of TRIM32 Gene) for the Gene Expression Data [3]. Zhang, C. (2010). Nearly unbiased variable selection
under minimax concave penalty. Ann. Statist. 101,
After the DC-RoSIS screening, only 50 predictors were 1418-1429.
[4]. Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh
left in the model. The penalized regression was further used
dimensional feature selection: beyond the linear
to select important predictors and estimate the coefficients
model. J. Machine Learn. Res. 10, 1829-1853.
using the considered penalty functions. Table 9 gives the
size (number of predictors selected) of the model produced [5]. Fan, J. and Lv, J. (2008). Sure independence
by the considered penalty functions, the selected predictors screening for ultrahigh dimensional feature space
and corresponding estimates. (with discussion). J. Roy. Statist. Soc. Ser. B 70, 849-
911.
[6]. Li, R., Zhong, W. and Zhu, L. (2012). Feature
Table 9 Median mean squared errors of prediction (𝐌𝐒𝐄𝐘)
screening via distance correlation learning. J. Amer.
and median estimated model sizes (𝐒), based on
Statist. Assoc. 107, 1129-1139.
100 replications
[7]. Zhong, W., Zhu, L., Li, R. and Cui, H. (2016).
Eye Tissue Data Regularized Quantile Regression and Robust Feature
Method
𝑴𝑺𝑬𝒀 𝑺 Screening for Single Index Models. Statistica Sinica,
LASSO-DCRoSIS 0.0091 13 26, 69-95.
SCAD-DCRoSIS 0.0101 7 [8]. Altham, P. M. (1984). Improving the precision of
LASSO-M-DCRoSIS 0.0077 10 estimation by fitting a model. Journal of the Royal
LASSO 0.0077 25 Statistical Society: Series B (Methodological), 46(1),
118-119.
SCAD 0.0094 11

Table 9 obviously shows that the proposed methods


select more sparse models compared to the corresponding
existing version with no substantial loss in prediction

IJISRT23SEP1925 www.ijisrt.com 2492


Volume 8, Issue 9, September – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[9]. Freue, G. V. C., Kepplinger, D., Salibián-Barrera, M.
and Smucler, E. (2019). Robust elastic net estimators
for variable selection and identification of proteomic
biomarkers. The Annals of Applied Statistics, 13(4),
2065-2090.
[10]. Buba, A., Usman, U., Musa, Y., & Hamza, M.
M.(2023). Hybrid Regression Estimation and Feature
Selection Technique Using Robust Variable
Screening Technique and Regularization.
International Journal of Mathematics and Statistics
Invention (IJMSI), 11(5), 10-16.

IJISRT23SEP1925 www.ijisrt.com 2493

You might also like