4th International Conference on Artificial Neural Networks 1995
DOI: 10.1049/cp:19950579
|View full text |Cite
|
Sign up to set email alerts
|

Pruning and growing hierachical mixtures of experts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

1998
1998
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…The ME models have three issues: (1) the gating mechanism does not explicitly leverage the input-output dependencies of the data. Rather, it performs probabilistic inputspace partitioning, based on assumed data distributions such as the multinomial distribution (Jordan & Jacobs, 1994), Gaussian distribution (Yuan & Neubauer, 2009), Dirichlet process (Rasmussen & Ghahramani, 2002), Gaussian process (Tresp, 2001), etc; (2) in ME models strong experts are often needed to gain good performance (Yuksel et al, 2012); (3) the structure of the ME models, namely the tree depth and the number of experts, is often optimized through extra procedures, such as pruning (Waterhouse & Robinson, 1995) and Bayesian model selection (Bishop & Svenskn, 2002;Kanaujia & Metaxas, 2006). This increases the complexity of model learning.…”
Section: Related Workmentioning
confidence: 99%
“…The ME models have three issues: (1) the gating mechanism does not explicitly leverage the input-output dependencies of the data. Rather, it performs probabilistic inputspace partitioning, based on assumed data distributions such as the multinomial distribution (Jordan & Jacobs, 1994), Gaussian distribution (Yuan & Neubauer, 2009), Dirichlet process (Rasmussen & Ghahramani, 2002), Gaussian process (Tresp, 2001), etc; (2) in ME models strong experts are often needed to gain good performance (Yuksel et al, 2012); (3) the structure of the ME models, namely the tree depth and the number of experts, is often optimized through extra procedures, such as pruning (Waterhouse & Robinson, 1995) and Bayesian model selection (Bishop & Svenskn, 2002;Kanaujia & Metaxas, 2006). This increases the complexity of model learning.…”
Section: Related Workmentioning
confidence: 99%
“…Figure 3 demonstrates that tailored characteristic kernels on the LCA group work better than Gaussian kernel which is just characteristic. We compared our best results on the above datasets to the results given by GPR [14], K-Nearest Neighbor (K-NN), Linear Regression (LR), Multi-Layer Perceptrons (MLP) with single hidden layer and early stopping [14], and mixtures of experts trained by Bayesian methods (HME) [22]. The results reported in Table 1.…”
Section: Applications Of Regressionmentioning
confidence: 99%
“…Figure 4 shows the justified characteristic kernels have better performance than Gaussian kernel. We compared our best results to those obtained by GPR [14], K-Nearest Neighbor (K-NN), Linear Regression (LR), MLP with early stopping and single hidden layer [14], mixtures of experts trained by Bayesian methods (HME) [22] in Table 2. Results of 25 methods (by Ghahramani) are available at http://www.cs.toronto.edu/~delve/data/ pumadyn/desc.html.…”
Section: Forward Dynamicsmentioning
confidence: 99%
“…Thus, for a gate implemented as a multilayer perceptron, the GEM algorithm must be employed. If the gate is trained through gradient descent (backpropagation), the error backpropagated to the input side of the softmax is (27) This is the same equation that would result from a mean square error criteria if is interpreted as the desired signal for the output of a trainable network. Thus, the posterior probabilities act as targets for the gate.…”
Section: Expectation-maximization (Em) Algorithmmentioning
confidence: 99%
“…In most cases, however, the number is unknown. In these cases, pruning or growing algorithms [10], [27] can be employed but are beyond the scope of this paper. The number of principal components required per expert should be chosen on the basis of the number of experts.…”
Section: Practical Implementation Issuesmentioning
confidence: 99%