Optimizing Human Learning

Tabibian, Behzad; Upadhyay, Utkarsh; De, Abir; Zarezade, Ali; Schoelkopf, Bernhard; Gomez-Rodriguez, Manuel

doi:10.48550/arxiv.1712.01856

Cited by 1 publication

(10 citation statements)

References 18 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is well known in the psychology literature that repeated and temporally distributed reviewing of information aids long term memorization [14,16,19,18]. Following recent work in the machine learning literature [18,22,27], we will consider the following setting: an online learning platform needs to teach one student some number of items with varying difficulty, say, words from the vocabulary of a foreign language. To this aim, the platform interacts with the student during a studying period by asking her to review each item multiple times, i.e., show a word to the student, ask for its translation, and then show the correct answer.…”

Section: Proposition 1 Given An Agent With P *mentioning

confidence: 99%

“…Interestingly, the above setting has been recently studied from the point of view of stochastic optimal control [27], where the authors have derived the optimal scheduling algorithm for a set of items. However, their solution assumes that the difficulty of the items and the student model are known [24] and that the objective function-the reward-has a particular functional form which depends on the average recall probability over time (and not the actual sampled recall at test time).…”

Section: Proposition 1 Given An Agent With P *mentioning

confidence: 99%

“…Experimental setup. Since we cannot make real interventions in an online learning platform, we use data from Duolingo to fit a probabilistic student model, as reported in previous work [24,27], which we then use to simulate a student's performance over time (refer to Appendix E for further details on the student model). Here, the optimal policy p * A;θ = (λ * θ (t), m * θ (t)) comprises of a reviewing intensity function and a multinomial mark distribution.…”

Section: Proposition 1 Given An Agent With P *mentioning

confidence: 99%

“…In this context, a recent line of work [13,27,29,30,33,34] has exploited an alternative view of MTPPs as stochastic differential equations (SDEs) with jumps [10] to design online, adaptive interventions using stochastic optimal control. While this line of work has shown promise at enhancing the functioning of social and information systems, their wide spread use and deployment is precluded mainly by two drawbacks.…”

Section: Introductionmentioning

confidence: 99%

“…In contrast, previous works considered the policy to be a probability distribution or, more rarely, a deterministic function [4,9,28]. Finally, we apply our methodology to two different applications in personalized teaching [14,22,27] and viral marketing [12,25,29,33,34], respectively. For simple dynamics and objective functions, which allow for stochastic optimal control approaches, our method achieves a comparable performance even though it does not have access to the true underlying dynamics.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Deep Reinforcement Learning of Marked Temporal Point Processes

Upadhyay,

De,

Gomez-Rodriguez

2018

Preprint

Self Cite

View full text Add to dashboard Cite

In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives.

show abstract

Section: Proposition 1 Given An Agent With P *mentioning

confidence: 99%

Section: Proposition 1 Given An Agent With P *mentioning

confidence: 99%

Section: Proposition 1 Given An Agent With P *mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Deep Reinforcement Learning of Marked Temporal Point Processes

Upadhyay,

De,

Gomez-Rodriguez

2018

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Optimizing Human Learning

Cited by 1 publication

References 18 publications

Deep Reinforcement Learning of Marked Temporal Point Processes

Deep Reinforcement Learning of Marked Temporal Point Processes

Contact Info

Product

Resources

About