Learning to Teach Reinforcement Learning Agents

Fachantidis, Anestis; Taylor, Matthew E.; Vlahavas, Ioannis

doi:10.3390/make1010002

Cited by 50 publications

(46 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this requires a lot of resources and sometimes does not really work out well. One interesting approach is to use data mining methods [11,12] which, from the available data, use an analytic process to give information about a problem in the future.…”

Section: Introductionmentioning

confidence: 99%

The Factors Affecting Acceptance of E-Learning: A Machine Learning Algorithm Approach

2020

Education Sciences

View full text Add to dashboard Cite

The Covid-19 epidemic is affecting all areas of life, including the training activities of universities around the world. Therefore, the online learning method is an effective method in the present time and is used by many universities. However, not all training institutions have sufficient conditions, resources, and experience to carry out online learning, especially in under-resourced developing countries. Therefore, the construction of traditional courses (face to face), e-learning, or blended learning in limited conditions that still meet the needs of students is a problem faced by many universities today. To solve this problem, we propose a method of evaluating the influence of these factors on the e-learning system. From there, it is a matter of clarifying the importance and prioritizing construction investment for each factor based on the K-means clustering algorithm, using the data of students who have been participating in the system. At the same time, we propose a model to support students to choose one of the learning methods, such as traditional, e-learning or blended learning, which is suitable for their skills and abilities. The data classification method with the algorithms multilayer perceptron (MP), random forest (RF), K-nearest neighbor (KNN), support vector machine (SVM) and naïve bayes (NB) is applied to find the model fit. The experiment was conducted on 679 data samples collected from 303 students studying at the Academy of Journalism and Communication (AJC), Vietnam. With our proposed method, the results are obtained from experimentation for the different effects of infrastructure, teachers, and courses, also as features of these factors. At the same time, the accuracy of the prediction results which help students to choose an appropriate learning method is up to 81.52%.

show abstract

Section: Introductionmentioning

confidence: 99%

The Factors Affecting Acceptance of E-Learning: A Machine Learning Algorithm Approach

2020

Education Sciences

View full text Add to dashboard Cite

show abstract

“…A simplified representation of process implemented by the Qlearning algorithm in order to control a PV system for the implementation of the GMPPT process is presented in Figure 3. In Q-learning, an agent interacts with the unknown environment (i.e., the PV system) and gains experience through a specific set of states, actions and rewards encountered during this interaction [24][25][26][27]. Q-learning strives to learn the Q-values of state-actions pairs, which represent the expected total discounted reward in the long term.…”

Section: The Proposed Q-learning-based Methods For Photovoltaic (Pv) Gmentioning

confidence: 99%

“…Typically, experience for learning is recorded in terms of samples (St, at, Rt, St+1), meaning that at some time step t, action at was executed in state St and a transition to the next state St+1 was observed, while reward Rt was received. The Q-learning update rule, given a sample (St, at, Rt, St+1) at time step t , is defined as follows: In Q-learning, an agent interacts with the unknown environment (i.e., the PV system) and gains experience through a specific set of states, actions and rewards encountered during this interaction [24][25][26][27]. Q-learning strives to learn the Q-values of state-actions pairs, which represent the expected total discounted reward in the long term.…”

Section: The Proposed Q-learning-based Methods For Photovoltaic (Pv) Gmentioning

confidence: 99%

Global MPPT Based on Machine-Learning for PV Arrays Operating under Partial Shading Conditions

2020

View full text Add to dashboard Cite

A global maximum power point tracking (GMPPT) process must be applied for detecting the position of the GMPP operating point in the minimum possible search time in order to maximize the energy production of a photovoltaic (PV) system when its PV array operates under partial shading conditions. This paper presents a novel GMPPT method which is based on the application of a machine-learning algorithm. Compared to the existing GMPPT techniques, the proposed method has the advantage that it does not require knowledge of the operational characteristics of the PV modules comprising the PV system, or the PV array structure. Additionally, due to its inherent learning capability, it is capable of detecting the GMPP in significantly fewer search steps and, therefore, it is suitable for employment in PV applications, where the shading pattern may change quickly (e.g., wearable PV systems, building-integrated PV systems etc.). The numerical results presented in the paper demonstrate that the time required for detecting the global MPP, when unknown partial shading patterns are applied, is reduced by 80.5%–98.3% by executing the proposed Q-learning-based GMPPT algorithm, compared to the convergence time required by a GMPPT process based on the particle swarm optimization (PSO) algorithm.

show abstract

“…• Q-Teaching Reward (QTR): The QTR advising-level reward extends Q-Teaching (Fachantidis, Taylor, and Vlahavas 2017) to MARL by using…”

Section: Contributionmentioning

confidence: 99%

Learning to Teach in Cooperative Multiagent Reinforcement Learning

Omidshafiei

Kim

Liu³

et al. 2019

AAAI

View full text Add to dashboard Cite

Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.

show abstract

Learning to Teach Reinforcement Learning Agents

Cited by 50 publications

References 15 publications

The Factors Affecting Acceptance of E-Learning: A Machine Learning Algorithm Approach

The Factors Affecting Acceptance of E-Learning: A Machine Learning Algorithm Approach

Global MPPT Based on Machine-Learning for PV Arrays Operating under Partial Shading Conditions

Learning to Teach in Cooperative Multiagent Reinforcement Learning

Contact Info

Product

Resources

About