nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed value (even though set by experts) to all test samples. Previous solutions assign different values to different test samples by the cross validation method but are usually time-consuming. This paper proposes a kTree method to learn different optimal values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal values. In the test stage, the kTree fast outputs the optimal value for each test sample, and then, the kNN classification can be conducted using the learned optimal value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different values to different test samples. This paper further proposes an improvement version of kTree method (namely, k*Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k*Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k*Tree) are much more efficient than the compared methods in terms of classification tasks.
This paper presents an efficient method for mining both positive and negative association rules in databases. The method extends traditional associations to include association rules of forms
A
⇒ ¬
B
, ¬
A
⇒
B
, and ¬
A
⇒ ¬
B
, which indicate negative associations between itemsets. With a pruning strategy and an interestingness measure, our method scales to large databases. The method has been evaluated using both synthetic and real-world databases, and our experimental results demonstrate its effectiveness and efficiency.
Na-ion Batteries have been considered as promising alternatives to Li-ion batteries due to the natural abundance of sodium resources. Searching for highperformance anode materials currently becomes a hot topic and also a great challenge for developing Na-ion batteries. In this work, a novel hybrid anode is synthesized consisting of ultrafi ne, few-layered SnS 2 anchored on few-layered reduced graphene oxide (rGO) by a facile solvothermal route. The SnS 2 /rGO hybrid exhibits a high capacity, ultralong cycle life, and superior rate capability. The hybrid can deliver a high charge capacity of 649 mAh g −1 at 100 mA g −1 . At 800 mA g −1 (1.8 C), it can yield an initial charge capacity of 469 mAh g −1 , which can be maintained at 89% and 61%, respectively, after 400 and 1000 cycles. The hybrid can also sustain a current density up to 12.8 A g −1 (≈28 C) where the charge process can be completed in only 1.3 min while still delivering a charge capacity of 337 mAh g −1 . The fast and stable Na-storage ability of SnS 2 /rGO makes it a promising anode for Na-ion batteries.
Immune infiltration of tumors is closely associated with clinical outcome in renal cell carcinoma (
RCC
). Tumor‐infiltrating immune cells (
TIIC
s) regulate cancer progression and are appealing therapeutic targets. The purpose of this study was to determine the composition of
TIIC
s in
RCC
and further reveal the independent prognostic values of
TIIC
s.
CIBERSORT
, an established algorithm, was applied to estimate the proportions of 22 immune cell types based on gene expression profiles of 891 tumors. Cox regression was used to evaluate the association of
TIIC
s and immune checkpoint modulators with overall survival (
OS
). We found that
CD
8+ T cells were associated with prolonged
OS
(hazard ratio [
HR
] = 0.09, 95% confidence interval [
CI
].01‐.53;
P
=
0.03) in chromophobe carcinoma (
KICH
). A higher proportion of regulatory T cells was associated with a worse outcome (
HR
= 1.59, 95%
CI
1.23‐.06;
P
<
0.01) in renal clear cell carcinoma (
KIRC
). In renal papillary cell carcinoma (
KIRP
), M1 macrophages were associated with a favorable outcome (
HR
= .43, 95%
CI
.25‐.72;
P
<
0.01), while M2 macrophages indicated a worse outcome (
HR
= 2.55, 95%
CI
1.45‐4.47;
P
<
0.01). Moreover, the immunomodulator molecules
CTLA
4 and
LAG
3 were associated with a poor prognosis in
KIRC
, and
IDO
1 and
PD
‐L2 were associated with a poor prognosis in
KIRP
. This study indicates
TIIC
s are important determinants of prognosis in
RCC
meanwhile reveals potential targets and biomarkers for immunotherapy development.
Data preparation is a fundamental stage of data analysis. While a lot of low-quality information is available in various data sources and on the Web, many organizations or companies are interested in how to transform the data into cleaned forms which can be used for high-profit purposes. This goal generates an urgent need for data analysis aimed at cleaning the raw data. In this paper, we first show the importance of data preparation in data analysis, then introduce some research achievements in the area of data preparation. Finally, we suggest some future directions of research and development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.