Use of localized gating in mixture of experts networks

Ramamurti, V.; Ghosh, Joydeep

doi:10.1117/12.304812

Cited by 12 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Earlier related approaches which use a mapping to predict the input-output relationship of the solar wind driven auroral westward electrojet index (AL-VB,) data used local linear ARMA filters [Price and Prichard, 1993 [1993] by including a larger set of data when training and by developing a selection band on a new architecture which takes into account activity level by using a gated network which makes a prediction based on the outputs from networks trained on intervals with differing levels of activity [Ramamurti and Ghosh, 1998]. This architecture is able to account for the scaling problem intrinsic to neural networks with nonlinear activation functions.…”

Section: Introductionmentioning

confidence: 99%

Forecasting auroral electrojet activity from solar wind input with neural networks

Weigel¹,

Horton²,

Tajima³

et al. 1999

Geophysical Research Letters

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

Forecasting auroral electrojet activity from solar wind input with neural networks

Weigel¹,

Horton²,

Tajima³

et al. 1999

Geophysical Research Letters

View full text Add to dashboard Cite

“…(21)]). In [18], [15], we reason out why the use of gating network based on (1) leads to difficulties while modeling many nontrivial function approximation tasks. In brief, for inputs that are not very close to one of the (soft) hyperplanes implied by the gating network, typically several of the s are substantially greater than zero, since any point will be on the positive side of 50% of the hyperplanes on the average.…”

Section: A Generic Mixture Of Experts Architecturementioning

confidence: 99%

“…The Mackey-Glass chaotic time series is generated by the delay differential equation (18) Two stationary operating modes are established by using different delays, and 23, respectively. After operating 1000 steps in the first mode, the system drifts to the second mode.…”

Section: Table V On-line Pruning and Growing On 2-d Gabor Functionsmentioning

confidence: 99%

Structurally adaptive modular networks for nonstationary environments

Ramamurti

Ghosh

1999

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant nonlinear input-output maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depending on the complexity of the modeling problem. The structural adaptation procedure addresses the model selection problem and typically leads to much better parameter estimation. Batch mode learning equations are extended to obtain on-line update rules enabling the network to model time varying environments. Simulation results are presented throughout the paper to support the proposed techniques.

show abstract

“…The mixture-of-experts framework [12,21] simultaneously partitions the input space while learning models for each partition. The partitioning is soft however, i.e., multiple models are involved in varying amounts for producing any particular input-output map, which makes the system less interpretable or actionable as compared to our proposed approach.…”

Section: Related Workmentioning

confidence: 99%

A framework for simultaneous co-clustering and learning from complex data

Deodhar

Ghosh

2007

Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

For difficult classification or regression problems, practitioners often segment the data into relatively homogenous groups and then build a model for each group. This two-step procedure usually results in simpler, more interpretable and actionable models without any loss in accuracy. We consider problems such as predicting customer behavior across products, where the independent variables can be naturally partitioned into two groups. A pivoting operation can now result in the dependent variable showing up as entries in a "customer by product" data matrix. We present a modelbased co-clustering (meta)-algorithm that interleaves clustering and construction of prediction models to iteratively improve both cluster assignment and fit of the models. This algorithm provably converges to a local minimum of a suitable cost function. The framework not only generalizes co-clustering and collaborative filtering to model-based coclustering, but can also be viewed as simultaneous co-segmentation and classification or regression, which is better than independently clustering the data first and then building models. Moreover, it applies to a wide range of bi-modal or multimodal data, and can be easily specialized to address classification and regression problems. We demonstrate the effectiveness of our approach on both these problems through experimentation on real and synthetic data.

show abstract

Use of localized gating in mixture of experts networks

Cited by 12 publications

References 0 publications

Forecasting auroral electrojet activity from solar wind input with neural networks

Forecasting auroral electrojet activity from solar wind input with neural networks

Structurally adaptive modular networks for nonstationary environments

A framework for simultaneous co-clustering and learning from complex data

Contact Info

Product

Resources

About