Background: Recent trials of novel agents in 'rare' molecular subtypes of non-small cell lung cancer (NSCLC) have used single-arm trial designs and benchmarked outcomes against historical controls. We assessed the consistency of historical control outcomes using docetaxel data from published NSCLC randomized controlled trials (RCTs). Material and methods: Advanced NSCLC RCTs including a docetaxel monotherapy arm were included. Heterogeneity in tumor objective response rates (ORRs), progression-free survival (PFS) and overall survival (OS), and correlations between outcomes and year of trial commencement were assessed. Results: Among 63 trials (N ¼ 10,633) conducted between 2000 and 2017, ORR ranged from 0% to 26% (I 2 ¼ 76.1%, p heterogeneity < .0001). Mean of the median PFS was 3.0 months (range: 1.4-6.4), 3month PFS ranged from 25% to 85% (I 2 ¼ 86.0%, p heterogeneity < .0001). Mean of the median OS was 9.1 months (range: 4.7-22.9), 9-month OS ranged from 23% to 79% (I 2 ¼ 83.0%, p heterogeneity < .0001). Each later year of trial commencement was associated with 0.3% (p ¼ .046), 0.5% (p ¼ .11) and 0.9% (p ¼ .001) improvement in ORR, 3-month PFS and 9-month OS rates, respectively. Conclusions: There was significant heterogeneity and an improving trend in docetaxel outcomes across trials conducted over 20 years. Benchmarking biomarker-targeted agents against historical controls may not be a valid approach to replace RCTs. Innovative study designs involving a concurrent control arm should be considered.