BackgroundRisk prediction models for colorectal cancer (CRC) detection in symptomatic patients based on available biomarkers may improve CRC diagnosis. Our aim was to develop, compare with the NICE referral criteria and externally validate a CRC prediction model, COLONPREDICT, based on clinical and laboratory variables.MethodsThis prospective cross-sectional study included consecutive patients with gastrointestinal symptoms referred for colonoscopy between March 2012 and September 2013 in a derivation cohort and between March 2014 and March 2015 in a validation cohort. In the derivation cohort, we assessed symptoms and the NICE referral criteria, and determined levels of faecal haemoglobin and calprotectin, blood haemoglobin, and serum carcinoembryonic antigen before performing an anorectal examination and a colonoscopy. A multivariate logistic regression analysis was used to develop the model with diagnostic accuracy with CRC detection as the main outcome.ResultsWe included 1572 patients in the derivation cohort and 1481 in the validation cohorts, with a 13.6 % and 9.1 % CRC prevalence respectively. The final prediction model included 11 variables: age (years) (odds ratio [OR] 1.04, 95 % confidence interval [CI] 1.02–1.06), male gender (OR 2.2, 95 % CI 1.5–3.4), faecal haemoglobin ≥20 μg/g (OR 17.0, 95 % CI 10.0–28.6), blood haemoglobin <10 g/dL (OR 4.8, 95 % CI 2.2–10.3), blood haemoglobin 10–12 g/dL (OR 1.8, 95 % CI 1.1–3.0), carcinoembryonic antigen ≥3 ng/mL (OR 4.5, 95 % CI 3.0–6.8), acetylsalicylic acid treatment (OR 0.4, 95 % CI 0.2–0.7), previous colonoscopy (OR 0.1, 95 % CI 0.06–0.2), rectal mass (OR 14.8, 95 % CI 5.3–41.0), benign anorectal lesion (OR 0.3, 95 % CI 0.2–0.4), rectal bleeding (OR 2.2, 95 % CI 1.4–3.4) and change in bowel habit (OR 1.7, 95 % CI 1.1–2.5). The area under the curve (AUC) was 0.92 (95 % CI 0.91–0.94), higher than the NICE referral criteria (AUC 0.59, 95 % CI 0.55–0.63; p < 0.001). On the basis of the thresholds with 90 % (5.6) and 99 % (3.5) sensitivity, we divided the derivation cohort into three risk groups for CRC detection: high (30.9 % of the cohort, positive predictive value [PPV] 40.7 %, 95 % CI 36.7–45.9 %), intermediate (29.5 %, PPV 4.4 %, 95 % CI 2.8–6.8 %) and low (39.5 %, PPV 0.2 %, 95 % CI 0.0–1.1 %). The discriminatory ability was equivalent in the validation cohort (AUC 0.92, 95 % CI 0.90–0.94; p = 0.7).ConclusionsCOLONPREDICT is a highly accurate prediction model for CRC detection.Electronic supplementary materialThe online version of this article (doi:10.1186/s12916-016-0668-5) contains supplementary material, which is available to authorized users.