“…We then apply STG and evaluate the classification accuracy and the number of selected features. We use the architecture [200,50,10] with tanh activations. The experiment was repeated 10 times, the extracted features and accuracies were consistent over 20 trials.…”
“…(b) The comparison of accuracy and sparsity level performance for λ in the range of [10 −3 , 10 −2 ] between using our proposed method (STG) and its variant using the Hard-Concrete (HC) distribution. [50]. Following the same format as [50], the following functions are used to generate synthetic data: (SE1: 100/5) y = sin (x 1 + x 3 ) 2 sin (x 7 x 8 x 9 ) + N (0, 0.1).…”
Section: Regression Using Synthetic and Real Datasetsmentioning
confidence: 99%
“…[50]. Following the same format as [50], the following functions are used to generate synthetic data: (SE1: 100/5) y = sin (x 1 + x 3 ) 2 sin (x 7 x 8 x 9 ) + N (0, 0.1). (SE2: 18/5) y = log(( 15 s=11 x s ) 2 ) + N (0, 0.1).…”
Section: Regression Using Synthetic and Real Datasetsmentioning
confidence: 99%
“…For each dataset, we generate 30 different replications and randomly split the data into train, validation, and test set. For data preparation, we use the code publicly made available by [50]. The root mean squared error on the test set averaged over 30 random replicated datasets are reported in Table 1.…”
Section: Regression Using Synthetic and Real Datasetsmentioning
confidence: 99%
“…N is 1000 for SE1,2,3 and 6000 for RCP. The values for LASSO and SRFF are borrowed from [50]. namely the Naive and regulatory T-cells.…”
Section: Purified Populations Of Peripheral Blood Monocytes (Pbmcs)mentioning
Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the 0 norm of the vector of indicator variables that represent if a feature is selected or not. Our approach relies on the continuous relaxation of Bernoulli distributions, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. This general framework simultaneously minimizes a loss function while selecting relevant features. Furthermore, we provide an informationtheoretic justification of incorporating Bernoulli distribution into our approach and demonstrate the potential of the approach on synthetic and real-life applications.
“…We then apply STG and evaluate the classification accuracy and the number of selected features. We use the architecture [200,50,10] with tanh activations. The experiment was repeated 10 times, the extracted features and accuracies were consistent over 20 trials.…”
“…(b) The comparison of accuracy and sparsity level performance for λ in the range of [10 −3 , 10 −2 ] between using our proposed method (STG) and its variant using the Hard-Concrete (HC) distribution. [50]. Following the same format as [50], the following functions are used to generate synthetic data: (SE1: 100/5) y = sin (x 1 + x 3 ) 2 sin (x 7 x 8 x 9 ) + N (0, 0.1).…”
Section: Regression Using Synthetic and Real Datasetsmentioning
confidence: 99%
“…[50]. Following the same format as [50], the following functions are used to generate synthetic data: (SE1: 100/5) y = sin (x 1 + x 3 ) 2 sin (x 7 x 8 x 9 ) + N (0, 0.1). (SE2: 18/5) y = log(( 15 s=11 x s ) 2 ) + N (0, 0.1).…”
Section: Regression Using Synthetic and Real Datasetsmentioning
confidence: 99%
“…For each dataset, we generate 30 different replications and randomly split the data into train, validation, and test set. For data preparation, we use the code publicly made available by [50]. The root mean squared error on the test set averaged over 30 random replicated datasets are reported in Table 1.…”
Section: Regression Using Synthetic and Real Datasetsmentioning
confidence: 99%
“…N is 1000 for SE1,2,3 and 6000 for RCP. The values for LASSO and SRFF are borrowed from [50]. namely the Naive and regulatory T-cells.…”
Section: Purified Populations Of Peripheral Blood Monocytes (Pbmcs)mentioning
Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the 0 norm of the vector of indicator variables that represent if a feature is selected or not. Our approach relies on the continuous relaxation of Bernoulli distributions, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. This general framework simultaneously minimizes a loss function while selecting relevant features. Furthermore, we provide an informationtheoretic justification of incorporating Bernoulli distribution into our approach and demonstrate the potential of the approach on synthetic and real-life applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.