2019
DOI: 10.1007/978-3-030-12738-1_16
|View full text |Cite
|
Sign up to set email alerts
|

Generating Reward Functions Using IRL Towards Individualized Cancer Screening

Abstract: Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…When it comes to early cancer prediction (e.g., predicting screening t 2 cancer from screening t 0 ), the POMDP outperforms experts, indicating that the model and associated reward function are discriminating between positive and negative cases in a different way. This difference may be attributed to the dynamic observation model used with this POMDP; when independent observations are instead assumed, we have found kappa scores to 1 in other domains, indicating high correlation between the model and experts' decisions [12]. Indeed, error analysis of the POMDP's FPs shows a different subset from the physicians: cases with smaller nodule sizes but more years of smoking and older baseline age are predicted as false positives by the POMDP.…”
Section: Discussionmentioning
confidence: 83%
See 3 more Smart Citations
“…When it comes to early cancer prediction (e.g., predicting screening t 2 cancer from screening t 0 ), the POMDP outperforms experts, indicating that the model and associated reward function are discriminating between positive and negative cases in a different way. This difference may be attributed to the dynamic observation model used with this POMDP; when independent observations are instead assumed, we have found kappa scores to 1 in other domains, indicating high correlation between the model and experts' decisions [12]. Indeed, error analysis of the POMDP's FPs shows a different subset from the physicians: cases with smaller nodule sizes but more years of smoking and older baseline age are predicted as false positives by the POMDP.…”
Section: Discussionmentioning
confidence: 83%
“…The POMDP we designed makes use of a reward function learned through analysis of physicians' past decisions. We recently presented an adaptive maximum entropy inverse reinforcement learning (MaxEnt IRL) algorithm to inform a reward function in different cancers [12]. Using MaxEnt IRL, we established an optimization function explicitly modeling experts' actions.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Yaylali and Karamustafa 13 modeled obesity levels based on BMI, cancer, and death to assess the impact of obesity on cancer and mortality using an MDP model. Petousis et al 14 used reversed enhancement learning to create reward commitments for Markov decision protocol of lung and breast cancer screening. Sun et al 15 evaluated the efficacy of breast cancer control compared to the lack of screening among women in urban and rural areas of China.…”
Section: Literature Reviewmentioning
confidence: 99%