1998
DOI: 10.1117/12.304812
|View full text |Cite
|
Sign up to set email alerts
|

Use of localized gating in mixture of experts networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

1999
1999
2018
2018

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 0 publications
0
11
0
Order By: Relevance
“…Earlier related approaches which use a mapping to predict the input-output relationship of the solar wind driven auroral westward electrojet index (AL-VB,) data used local linear ARMA filters [Price and Prichard, 1993 [1993] by including a larger set of data when training and by developing a selection band on a new architecture which takes into account activity level by using a gated network which makes a prediction based on the outputs from networks trained on intervals with differing levels of activity [Ramamurti and Ghosh, 1998]. This architecture is able to account for the scaling problem intrinsic to neural networks with nonlinear activation functions.…”
Section: Introductionmentioning
confidence: 99%
“…Earlier related approaches which use a mapping to predict the input-output relationship of the solar wind driven auroral westward electrojet index (AL-VB,) data used local linear ARMA filters [Price and Prichard, 1993 [1993] by including a larger set of data when training and by developing a selection band on a new architecture which takes into account activity level by using a gated network which makes a prediction based on the outputs from networks trained on intervals with differing levels of activity [Ramamurti and Ghosh, 1998]. This architecture is able to account for the scaling problem intrinsic to neural networks with nonlinear activation functions.…”
Section: Introductionmentioning
confidence: 99%
“…(21)]). In [18], [15], we reason out why the use of gating network based on (1) leads to difficulties while modeling many nontrivial function approximation tasks. In brief, for inputs that are not very close to one of the (soft) hyperplanes implied by the gating network, typically several of the s are substantially greater than zero, since any point will be on the positive side of 50% of the hyperplanes on the average.…”
Section: A Generic Mixture Of Experts Architecturementioning
confidence: 99%
“…The Mackey-Glass chaotic time series is generated by the delay differential equation (18) Two stationary operating modes are established by using different delays, and 23, respectively. After operating 1000 steps in the first mode, the system drifts to the second mode.…”
Section: Table V On-line Pruning and Growing On 2-d Gabor Functionsmentioning
confidence: 99%
“…The mixture-of-experts framework [12,21] simultaneously partitions the input space while learning models for each partition. The partitioning is soft however, i.e., multiple models are involved in varying amounts for producing any particular input-output map, which makes the system less interpretable or actionable as compared to our proposed approach.…”
Section: Related Workmentioning
confidence: 99%