Abstract Objective To summarise available evidence regarding the performance metrics of validated prognostic models on cardiovascular and kidney outcomes in adults with type 2 diabetes mellitus. Design Living systematic review and meta-analysis of observational studies. Data sources Medline, Embase, Central, and the Cochrane Database of Systematic Reviews from 1 January 2020 to 17 January 2024. Eligibility criteria for selecting studies Studies validating prognostic models that predicted all cause and cardiovascular mortality, admission to hospital for heart failure, kidney failure, myocardial infarction, or ischaemic stroke in adults with type 2 diabetes mellitus, including people with established cardiovascular disease or chronic kidney disease, or both. Risk models evaluating composite outcomes were not eligible. Data synthesis For each model and outcome, using a random effects model, the reported discrimination measures were pooled, reported as c statistics. Furthermore, when available, calibration plots were reconstructed and interpreted narratively. The Prediction Model Risk of Bias Assessment (PROBAST) tool was used to assess the risk of bias of each analysed study cohort and the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach to evaluate our certainty in the evidence. Results 6529 publications were identified, of which 35 studies reporting on 13 models were included, all of which were developed for general populations with type 2 diabetes but no established cardiovascular disease or chronic kidney disease. Among the identified models, the Risk Equations for Complications of Type 2 Diabetes (RECODe) and the UK Prospective Diabetes Study Outcomes Model 2 (UKPDS-OM2) evaluated all outcomes except for admission to hospital for heart failure. Relative to a threshold c statistic of 0.7, RECODe had an acceptable discrimination for cardiovascular mortality (0.79, high certainty), probably has an acceptable discrimination for myocardial infarction (0.72, moderate certainty) and stroke (0.71, moderate certainty), and may have an acceptable discrimination for kidney failure (0.76, low certainty). High certainty evidence suggests that UKPDS-OM2 has unacceptable discrimination for myocardial infarction (0.64) and stroke (0.65). RECODe showed acceptable calibration for cardiovascular mortality (high certainty), myocardial infarction (high certainty), and kidney failure (moderate certainty) but had unacceptable calibration for stroke (moderate certainty). UKPDS-OM2 showed acceptable calibration for cardiovascular mortality (moderate certainty), stroke (moderate certainty), and kidney failure (low certainty), but may have unacceptable calibration for myocardial infarction (moderate certainty). Conclusion 13 unique models were identified that evaluated cardiovascular and kidney outcomes in patients with type 2 diabetes. Two models, RECODe and UKPDS-OM2, evaluated all outcomes except for admission to hospital for heart failure. Of all the appraised prognostic models, RECODe had acceptable discrimination and calibration in validation studies for most outcomes; although, additional studies directly comparing models are needed. Study registration number PROSPERO, CRD42023423075. Readers’ note This article is a living systematic review that will be updated to reflect emerging evidence. Updates may occur for up to two years from the date of original publication. This version is the original article. What is already known on this topic Several risk prediction models incorporating multiple risk factors have been developed and validated for people with type 2 diabetes mellitus Risk stratification of key patient groups, through the use of risk prediction models, is a core component in the development of clinical practice guidelines What this study adds Although several prediction models for cardiovascular and kidney outcomes in people with type 2 diabetes have been validated, only two models, the Risk Equations for Complications of Type 2 Diabetes (RECODe) and the UK Prospective Diabetes Study Outcomes Model 2 (UKPDS-OM2), assessed most of the patient important outcomes RECODe had acceptable discrimination and calibration in validation studies for most outcomes, and UKPDS-OM2 had variable discrimination and calibration across outcomes How this study might affect research, practice or policy When risk stratifying their patients with type 2 diabetes, clinicians should consider patients' individual anticipated risk of cardiovascular and kidney related outcomes using validated risk prediction models Introduction Type 2 diabetes mellitus affects approximately 537 million adults worldwide, and this number is projected to rise to 643 million by 2030, and 783 million by 2045.1 Diabetes related healthcare expenditures represent a significant economic burden, costing US$966 billion annually worldwide.1 In 2021, diabetes was the cause of 6.7 million deaths globally, accounting for 12.2% of all deaths in individuals aged 20-79 years old.1 Furthermore, individuals living with diabetes have high rates of comorbidity, with approximately 32% of people also having cardiovascular disease and 27% having chronic kidney disease.2–4 However, substantial variation exists in prognoses across patients living with type 2 diabetes mellitus; factors such as age, sex, glycaemic control, obesity, and pre-existing cardiovascular and kidney diseases affect an individuals' risk of future cardiovascular and kidney outcomes. To account for the impact of these variables, many risk prediction models incorporating multiple risk factors have been developed and validated for people with type 2 diabetes mellitus.5–7 These models show great promise: clinicians can use them to prognosticate patients and in clinical decision making, or they can be used by researchers and policy makers to better understand the risk of events across different groups of patients. However, these prognostic models need to yield valid and reliable risk estimates to inform decision making. A plethora of randomised controlled trials have also shown benefits of several novel antidiabetic treatments, including sodium glucose cotransporter-2 inhibitors (SGLT2-i), glucagon-like peptide-1 receptor agonists (GLP-1RA), and non-steroidal mineralocorticoid receptor antagonists, in reducing the risk of cardiovascular and kidney outcomes in adults with diabetes, with weight loss as another outcome of global interest.8 A continued flow of new randomised controlled trials as well as new medications, combined with the rising prevalence of diabetes worldwide, prompted an update to a previous clinical practice guideline (BMJ Rapid Recommendations) on antidiabetic treatments, to become a living guideline.2 Furthermore to that living clinical practice guideline, the panel confirmed that individuals living with diabetes with various levels of risk (eg, high v low) for cardiovascular and kidney outcomes experience different absolute magnitudes of benefit. These variations warrant potentially different recommendations depending on risk strata. As for the first version of the guideline, the panel identified the need to determine the most trustworthy and best performing prognostic models assessing individual outcomes to obtain the most credible baseline risks to inform the development of recommendations. Therefore, to inform the living guideline—as well as other guidelines—on drugs for type 2 diabetes mellitus, we conducted this living systematic review and meta-analysis to identify, critically appraise, and summarise the available evidence regarding the performance of validated prognostic models on cardiovascular and kidney outcomes in people with type 2 diabetes mellitus (box 1). Box 1: Linked articles in this BMJ Rapid Recommendation cluster Practice article: Agarwal A, Mustafa R, Manja V, et al. Cardiovascular, kidney-related, and weight loss effects of therapeutics for type 2 diabetes: a living clinical practice guideline. BMJ 2025;390:e082071. doi:10.1136/bmj-2024-082071 Research article: Nong K, Jeppesen BT, Shi Q, et al. Medications for adults with type 2 diabetes: a living systematic review and network meta-analysis. BMJ 2025;390:e083039. doi:10.1136/bmj-2024-083039 Methods This living systematic review and meta-analysis follows Grading of Recommendations, Assessment, Development and Evaluation (GRADE) guidance and established guidance on prognostic model reviews.9,10 11 We report our systematic review in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement12 and the Meta-analyses Of Observational Studies in Epidemiology (MOOSE) checklist13 (see online supplemental appendix 1 for the completed MOOSE checklist). We prospectively registered our protocol on PROSPERO (CRD42023423075). Search strategy and selection criteria A previous systematic review and meta-analysis of prognostic models in adults with type 2 diabetes mellitus identified 15 observational studies reporting on seven risk models, of which one showed adequate calibration and discrimination.14 We updated this systematic review and meta-analysis and transitioned to a living evidence model. Electronic database searches using Medline, Embase, Central, and the Cochrane Database of Systematic Reviews were conducted; this iteration incorporates a search update from 1 January 2020 to 17 January 2024 (see online supplemental appendix 2 for search strategies). Relevant search terms included “diabetes mellitus”, “cardiovascular”, “MACE”, “kidney failure”, “mortality”, and “admission” as well as search terms for clinical prediction guides (“prognosis”, “diagnosed”, “cohort”, and “predictor”). We included observational studies and post-hoc analyses of randomised controlled trials that enroled ambulatory adults (≥18 years) with type 2 diabetes mellitus (with or without established cardiovascular disease or chronic kidney disease) and assessed prognostic models with at least two predictors. Specifically, we included validation (internal and external) studies that assessed the performance of models for all cause mortality, cardiovascular mortality, admission to hospital for heart failure, kidney failure, myocardial infarction or stroke, and reported model discrimination or calibration measures. Studies validating prognostic models used to predict composite outcomes were not eligible for inclusion; for example, studies evaluating the SCORE2-Diabetes model15 evaluated a composite outcome of cardiovascular disease events including cardiovascular mortality, non-fatal myocardial infarction and non-fatal stroke, making them ineligible for inclusion. Furthermore, we restricted our included studies to those reporting prognostic models validated in three or more cohorts. To identify additional studies, we searched the reference lists of included studies and consulted clinical experts and methodologists participating on the guideline panel for the linked living BMJ Rapid Recommendation on medications for type 2 diabetes mellitus. Study selection and data extraction Pairs of calibrated reviewers (DGR, DS, SD, DG, and JZXC) independently assessed titles and abstracts of identified citations as well as full texts of articles that were deemed potentially eligible using Covidence (Veritas Health Innovation, Melbourne, Australia). Pairs of reviewers (DGR, DS, SD, DG, and JZXC) independently extracted data using prepiloted, structured Excel forms. Reviewers resolved conflicts through discussion or, if necessary, through adjudication by a third reviewer (FF). Reviewers collected information related to the data source, time frame of recruitment, duration of patient follow-up, characteristics of the validation cohorts (eg, age, sex, body mass index, comorbidities, laboratory values, etc), details of the prognostic model assessed (eg, predictors included, and definition and measurement of the outcome) and measures of model performance (model discrimination (ie, c statistics or areas under the curve) and calibration (ie, calibration plots)). When studies do not report relevant data in corresponding tabular or narrative formats, we use WebPlotDigitizer v4.7 (Pacifica, CA, USA) to extract values from figures and graphs. If one or more publications reported on the validation of the same model using the same cohort, we included the publication with the largest analytical sample size. Risk of bias assessment Pairs of reviewers (DGR, DS, SD, DG, and JZXC) independently used the Prediction Model Risk of Bias Assessment Tool (PROBAST) to assess the risk of bias of the individual cohorts at the outcome level.16 Disagreements were resolved through discussion between reviewers or, if necessary, adjudication by a third reviewer (FF). PROBAST considers four domains: participants, predictors, outcomes, and analysis. Domains were rated as low, high, or unclear risk of bias. We categorised a study as having an overall high risk of bias if reviewers judged one or more domains to be at high risk of bias, or two or more to be at an unclear risk of bias, otherwise, the study was classified as having a low risk of bias. Data synthesis and subgroup analyses STATA SE (v18) was used to perform all analyses. We considered a two sided P value of 0.05 or less statistically significant. Using the “metan” function,17 we pooled estimates and 95% confidence intervals (CIs) of discrimination statistics (eg, c statistics) for prognostic models validated in three or more cohorts, using restricted maximum likelihood random effects models with Hartung-Knapp-Sidik-Jonkman corrections.10 11 We followed guidance from Debray et aland colleagues to estimate the standard error for discrimination statistics for studies in which the authors did not report the 95% CI.10 Extracted data from all calibration plots assessing the same model on the same outcome were re-plotted and calibration was assessed through visual inspection of these reconstructed plots, considering people at low and high risk. We did not use statistical measures (eg, observed to expected ratios or Hosmer-Lemeshow χ2 tests) to assess model calibration, as these measures were not optimal when calculated for the entire cohort without regard to varying risk in half of the patients (who may be high risk) the model underestimates the true risk and in the other half (who may be low risk) the model overestimates the true risk. In a meta-analysis of observed to expected ratios where half the studies reported a ratio of less than 1.0 and the other half reported more than 1.0, the pooled ratio may incorrectly suggest perfect calibration (observed to expected ratio=1.0).18 Likewise, Hosmer-Lemeshow χ2 tests have several drawbacks, including low statistical power and a lack of information regarding the type or extent of miscalibration.19 We assessed heterogeneity through visual inspection of the individual point estimates and their 95% CIs. To explore the observed heterogeneity, we conducted prespecified subgroup analyses to evaluate the effect of high versus low risk of bias on model performance and relied on studies at low risk of bias if a significant difference was observed. Certainty of the evidence In order to assess the credibility of the risk prediction models, we evaluated the certainty of the evidence using GRADE.18 20 We rated certainty in relation to an acceptable discrimination threshold informed by clinician intuition. We selected a threshold of 0.7 to represent clinician intuition, based on a previous systematic review of studies evaluating discrimination of clinicians in disease areas similar to diabetes.21 GRADE rates certainty drawn on the performance of prognostic models, starting as high for a body of evidence informed by observational studies. Certainty may be decreased due to issues related to risk of bias, imprecision, inconsistency, indirectness, and publication bias. When appropriate, we assessed publication bias through visual inspection of funnel plots. Living model of evidence synthesis To iteratively incorporate new evidence regarding performance metrics of prognostic models and newly available prognostic models for adults with type 2 diabetes mellitus, we commit to a living systematic review model. The systematic review is planned for updates if practice changing evidence is made available. Our dynamically updated systematic review will directly inform the linked living BMJ Rapid Recommendation on medications for type 2 diabetes mellitus, planned for update every six months, and other international practice guideline development endeavours addressing type 2 diabetes mellitus management. We will collaborate with members of the linked living guideline to monitor for practice changing evidence and to determine whether an update is warranted. The search strategy and core team of methodologists informing the development and conduct of the living systematic review will remain consistent and convene on at least an annual basis to review the scope and methods of the review and to determine if and when the living review should be retired. We aim to publish major updates of this review on prognostic models in a scientific journal with minor or more frequent updates available through the living guideline as published in the open access authoring and publication platform MAGICapp.22 Patient and public involvement Patient partners participated as part of the living BMJ Rapid Recommendation guideline panel informing the scope and prioritised clinical outcomes for this living review. This study had no public participation. On publication, the study findings will be disseminated to related patients and the public as linked evidence for the paralleled BMJ Rapid Recommendation (https://www.bmj.com/rapid-recommendations) on the use of antidiabetic treatments in people with type 2 diabetes mellitus. Results The systematic search yielded 6529 unique citations and 224 potentially relevant full texts. Ultimately, 35 studies were eligible by reporting on the internal or external validation of 13 prognostic models across 52 validation cohorts (figure 1).23–57 Three models predicted all cause mortality, four predicted cardiovascular mortality, four predicted kidney failure, two predicted myocardial infarction, three predicted stroke, and five predicted admission to hospital for heart failure. Two identified models, RECODe and UKPDS-OM2, evaluated all outcomes except for admission to hospital for heart failure. All studies reported the validation of prognostic models in people with type 2 diabetes mellitus with the presence of various risk factors for the development of cardiovascular and kidney outcomes; none reported validation of models specifically for patients with established cardiovascular disease and/or chronic kidney disease on our outcomes of interest. PRISMA flowchart for study selection Characteristics of synthesised studies The cohorts included a median of 4329 patients (range 125-94 946). The median of mean ages was 61.4 years (range of means 50.9-70.7), median of mean HbA1C was 7.4% (range of means 6.1% to 8.7%) and a median of 54% of participants were male (range 0-100%). Among the cohorts enrolling participants from a single country, the most common countries were the United States (n=9), Italy (n=8), and China (n=7) (Online supplemental appendix 3). Risk of bias of individual studies Online supplemental appendix 4 presents the risk of bias assessments for each validation cohort and respective outcome across the 35 synthesised studies, leading to 119 separate risk of bias assessments. We found 47 assessments to be at a low risk of bias, and 72 to be at a high risk of bias. Common study limitations included inappropriate methods for handling missing data (eg, complete case analysis and use of a separate category to capture missing data), an inadequate number of events observed, and use of administrative databases and codes from International Classification of Diseases (ICD) 9th edition for identifying outcomes, which may face high rates of missing data and poor specificity. All cause mortality Three models predicted all cause mortality in people with type 2 diabetes mellitus: estimation of mortality risk in type 2 diabetic patients (ENFORCE), risk equations for complications of type 2 diabetes (RECODe), and UK prospective diabetes study outcomes model (UKPDS-OM)-2 (table 1; online supplemental appendix 5). Compared with a threshold c statistic of 0.7, high certainty evidence indicates that RECODe has acceptable discrimination for all cause mortality. Moderate certainty evidence suggests that ENFORCE and UKPDS-OM2 probably have acceptable discrimination. Table 1 Summary of findings table for prediction model discrimination With regards to model calibration, high certainty evidence showed that UKPDS-OM2 overestimates risk for all cause mortality. Moderate certainty evidence suggests that ENFORCE probably underestimates risk, and that RECODe probably has acceptable calibration (table 2; online supplemental appendix 6). Table 2 Summary of findings table for prediction model calibration Cardiovascular mortality Four models predicted cardiovascular mortality in people with type 2 diabetes mellitus: Framingham score (FHS), RECODe, systematic coronary risk evaluation (SCORE), and UKPDS-OM2 (table 1; online supplemental appendix 5). High certainty evidence indicates that RECODe has acceptable discrimination for cardiovascular mortality. FHS may have acceptable discrimination, and UKPDS-OM2 may have unacceptable discrimination (both low certainty). We are very uncertain about the discriminatory capability of SCORE (very low certainty). High certainty evidence shows that RECODe has acceptable calibration. Moderate certainty evidence suggests that both SCORE and UKPDS-OM2 probably have acceptable calibration. No studies evaluated the calibration of FHS through calibration plots (table 2; online supplemental appendix 6). Kidney failure Four models predicted kidney failure in individuals with type 2 diabetes mellitus: ADVANCE (action in diabetes and vascular disease: preterax and diamicron MR controlled evaluation), the New Zealand model, RECODe, and UKPDS-OM2 (table 1; online supplemental appendix 5). Moderate certainty evidence indicates that ADVANCE and the New Zealand model probably have acceptable discrimination. RECODe may have acceptable discrimination, and UKPDS-OM2 may have unacceptable discrimination (low certainty). Moderate certainty evidence suggests that ADVANCE and the New Zealand model probably underestimate the risk of kidney failure. RECODe probably has acceptable calibration (moderate certainty), while UKPDS-OM2 may have acceptable calibration for predicting kidney failure (low certainty) (table 2; online supplemental appendix 6). Myocardial infarction Two models predicted myocardial infarction in people with type 2 diabetes mellitus: RECODe and UKPDS-OM2. High certainty evidence indicates that UKPDS-OM2 has unacceptable discrimination for myocardial infarction. Meanwhile, RECODe probably has acceptable discrimination (moderate certainty) (table 1; online supplemental appendix 5). High certainty evidence shows that RECODe has acceptable calibration for myocardial infarction, and moderate certainty evidence suggests that UKPDS-OM2 probably overestimates risk (table 1; online supplemental appendix 6). Stroke Three models predicted stroke in individuals with type 2 diabetes mellitus: RECODe, UKPDS-OM1, and UKPDS-OM2. High certainty evidence indicates that UKPDS-OM2 has unacceptable discrimination for stroke. Moderate certainty evidence suggests that RECODe probably has acceptable discrimination and UKPDS-OM1 probably has unacceptable discrimination for stroke (table 1; online supplemental appendix 5). Moderate certainty evidence suggests that RECODe probably overestimates the risk of stroke, and that both UKPDS-OM1 and UKPDS-OM2 probably have acceptable calibration (table 2; online supplemental appendix 6). Admission to hospital with heart failure Five models predicted admission to hospital with heart failure in people with type 2 diabetes mellitus: DM-CURE (socio-demographic variables, metabolic, diabetes-related complication factors, and health care utilization for risk evaluation), TRS-HFDM (thrombolysis in myocardial infarction risk score for heart failure in diabetes), and three WATCH-DM models (machine learning, regression, and integer based). Moderate certainty evidence suggests that DM-CURE and TRS-HFDM probably have acceptable discrimination for admission to hospital with heart failure. The machine learning and regression based WATCH-DM models may have acceptable discrimination, but the integer based WATCH-DM model may have unacceptable discrimination (all low certainty). No studies evaluated the calibration of DM-CURE using calibration plots. High certainty evidence showed that all other models for admission to hospital with heart failure had acceptable calibration. Other analyses We did not identify any significant effect modification on model discrimination (online supplemental appendix 7) or model calibration (online supplemental appendix 8) based on overall risk of bias. Similarly, we did not identify any evidence of publication bias (online supplemental appendix 9). Discussion Principle findings This systematic review summarised the discrimination and calibration of prognostic models for adults with type 2 diabetes mellitus validated in three or more cohorts. We compared the discriminatory performance of each model to our best estimate of clinician intuition (c statistic=0.7) in predicting mortality (all cause and cardiovascular related), kidney failure, myocardial infarction, stroke, and admission to hospital with heart failure. Among the 13 identified prognostic models, RECODe has the most acceptable discriminatory performance and calibration across the evaluated outcomes; this finding aligns with a previous systematic review.14 Strengths and limitations Strengths of this review include the use of rigorous and comprehensive methods for systematic reviews and meta-analyses of prognostic models10 11 and the use of formal GRADE guidance to assess certainty of evidence for discrimination and calibration.18 20 The GRADE approach allowed us to contextualise our findings in relation to the average discriminatory performance of clinicians, enabling our findings to have direct relevance to clinical practice. Furthermore, this updated review continues to be linked to a multidisciplinary BMJ Rapid Recommendation panel composed of clinical experts, methodologists, and patient partners. The panel has prespecified patient important outcomes of interest and has been consulted to ensure the comprehensiveness of our systematic literature search, ensuring that our review's findings are directly relevant to clinical practice. Potential limitations of this systematic review stem from current limitations of prognosis literature. Firstly, although the discriminatory performance of prognostic models can be quantitatively assessed by pooling the reported c statistics in each study, the methods used to assess the calibration of each model vary substantially. Given the limitations of statistical measures of calibration, including observed to expected ratios and Hosmer-Lemeshow χ2 tests,18 19 we only assessed calibration through the studies' reported calibration plots. This approach involved a visual assessment of reconstructed calibration plots, and a narrative summary of each model's calibration, resulting in the assessment of calibration being more subjective. Secondly, the included studies assessed model calibration among patients with relatively low cardiovascular risk, limiting assessment of model calibration across the full spectrum of risk including among those with established cardiovascular disease or chronic kidney disease. In the absence of credible risk prediction models for this large population with type 2 diabetes mellitus, the linked guideline had to use their clinical experience and perform pragmatic modelling to estimate risk for cardiovascular and kidney outcomes for people at moderate to high risk; patients who will benefit most from medications such as SGLT2-inhibitors and GLP1-RA (unpublished). Thirdly, we were unable to assess on the predictive model performance the influence of several potential sources of heterogeneity, including ascertainment of outcomes across studies, the time periods in which they were conducted and available antidiabetic treatments available during these periods, and diversities in healthcare systems and health outcomes across geographical regions. Fourthly, our review excluded prognostic models solely evaluating composite outcomes, such as major adverse cardiovascular events, as our linked guideline focused on individual cardiovascular and kidney outcomes. As a result, several robust risk prediction models for type 2 diabetes mellitus, such as SCORE2-Diabetes,15 which may assist clinicians and patients with risk stratification in clinical practice, were not assessed in our review. Finally, our review assessed model performance relative to a threshold inferred by clinician intuition and did not directly compare performance between different models. Given the potential intransitivity between validation cohorts used to assess each model, future research is needed to directly compare promising models, such as those that predict multiple patient important outcomes (eg, RECODe, UKPDS-OM2), using the same validation cohort. Implications for current practice and research Models identified by our systematic review, such as RECODe, should help the development and updating of clinical practice guidelines, health technology assessments, and subsequently clinicians, patients, policy makers and payers in informing individual decision making. These models enable appropriate risk stratification for patients with type 2 diabetes mellitus, enable identification of risk stratified baseline risks (ie, likelihood of events occurring without treatment) across patient important outcomes, and facilitate estimation of risk stratified absolute effect estimates for treatments (applying the relative effects anticipated with treatments to baseline risks) and cost-effectiveness analyses. The models also allow clinicians and adults with type 2 diabetes mellitus to define a given individual's risk profile and individualised estimates of benefits or harms with treatment, facilitating evidence informed shared decision making. Additionally, previous clinical practice guidelines on the management of type 2 diabetes mellitus, including those from the American Diabetes Association58 and the American Association of Clinical Endocrinology/American College of Endocrinology,59 recommend the use of prognostic models developed and validated in non-diabetic populations, such as the Pooled Cohort Equation and the FHS. Our findings may enable the adoption of diabetes specific prognostic models in the development of future clinical practice guidelines for diabetes management. Our systematic review identified several areas for future research on prognostic models for adults with type 2 diabetes mellitus. Firstly, our review compared the discriminatory performance of identified prognostic models to a threshold informed by clinician intuition alone (c statistic=0.7).21 In clinical practice, clinicians may use prognostic models in addition to other clinical factors to inform their risk estimation and decision making. One systematic review suggested that clinician intuition enhanced by prognostic models may be superior to clinician intuition alone.21 Future research should investigate the usefulness of adding trustworthy prognostic models for diabetes, such as RECODe, to routine clinical practice. Secondly, most models had limited data assessing their calibration for patients at higher risk. Future research should focus on validating identified prognostic models in cohorts of adults at higher risk. Finally, our review was unable to assess the clinical usefulness of using risk stratification, by leveraging these prognostic models, to guide treatment of type 2 diabetes mellitus. Future research should evaluate the benefit of these identified prognostic models in clinical practice and their impact on patient important outcomes. Conclusion We identified 13 unique prognostic models evaluating cardiovascular and kidney outcomes in patients with type 2 diabetes mellitus, with no models explicitly reporting validation of patients with established cardiovascular disease or chronic kidney disease. We identified two models, RECODe and UKPDS-OM2, which evaluated all outcomes except for admission to hospital with heart failure. Of all the identified prognostic models, RECODe showed acceptable discrimination and calibration in validation studies for most outcomes. Ethics approval Not applicable. Contributors: All authors were involved in the design of the study. FF conceived the idea for this systematic review. DGR, DS, S-CD, DG, and JZXC screened records. DGR, DS, S-CD, DG, and JZXC completed data extraction and assessed risk of bias and certainty. DGR conducted data analysis and presented findings. DGR and FF drafted the initial manuscript. All authors provided critical revisions of the manuscript based on important intellectual content, and FF supervised the project. DGR and FF are the guarantors. Transparency: The lead author (the guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained. Funding: We do not declare a specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors. Declaration of AI: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. Provenance and peer review: Not commissioned; externally peer reviewed. Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise. Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information. Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract 2022; 183: 109119. Shi Q, Nong K, Vandvik PO, et al. Benefits and harms of drug treatment for type 2 diabetes: systematic review and network meta-analysis of randomised controlled trials. BMJ 2023; 381. Einarson TR, Acs A, Ludwig C, et al. Prevalence of cardiovascular disease in type 2 diabetes: a systematic literature review of scientific evidence from across the world in 2007-2017. Cardiovasc Diabetol 2018; 17. Fenta ET, Eshetu HB, Kebede N, et al. Prevalence and predictors of chronic kidney disease among type 2 diabetic patients worldwide, systematic review and meta-analysis. Diabetol Metab Syndr 2023; 15. Dziopa K, Asselbergs FW, Gratton J, et al. Cardiovascular risk prediction in type 2 diabetes: a comparison of 22 risk scores in primary care settings. Diabetologia 2022; 65: 644–56. Jitraknatee J, Ruengorn C, Nochaiwong S, et al. Prevalence and Risk Factors of Chronic Kidney Disease among Type 2 Diabetes Patients: A Cross-Sectional Study in Primary Care Practice. Sci Rep 2020; 10. Read SH, van Diepen M, Colhoun HM, et al. Performance of Cardiovascular Disease Risk Scores in People Diagnosed With Type 2 Diabetes: External Validation Using Data From the National Scottish Diabetes Register. Diabetes Care 2018; 41: 2010–8. Li S, Vandvik PO, Lytvyn L, et al. SGLT-2 inhibitors or GLP-1 receptor agonists for adults with type 2 diabetes: a clinical practice guideline. BMJ 2021; 373. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008; 336: 924–6. Debray TPA, Damen JAAG, Snell KIE, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017; 356. Rayner DG, Kim B, Foroutan F, et al. A brief step-by-step guide on conducting a systematic review and meta-analysis of prognostic model studies. J Clin Epidemiol 2024; 170: 111360. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021; 372. Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 2000; 283: 2008–12. Buchan TA, Malik A, Chan C, et al. Predictive models for cardiovascular and kidney outcomes in patients with type 2 diabetes: systematic review and meta-analyses. Heart 2021; 107: 1962–73. SCORE2-Diabetes Working Group and the ESC Cardiovascular Risk Collaboration. SCORE2-Diabetes: 10-year cardiovascular risk estimation in type 2 diabetes in Europe. Eur Heart J 2023; 44: 2544–56. Moons KGM, Wolff RF, Riley RD, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med 2019; 170: W1–33. Harris RJ, Deeks JJ, Altman DG, et al. Metan: fixed- and random-effects meta-analysis. Stata J 2008; 8: 3–28. Foroutan F, Guyatt G, Trivella M, et al. GRADE concept paper 2: Concepts for judging certainty on the calibration of prognostic models in a body of validation studies. J Clin Epidemiol 2022; 143: 202–11. Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med 2019; 17. Foroutan F, Mayer M, Guyatt G, et al. GRADE concept paper 8: judging the certainty of discrimination performance estimates of prognostic models in a body of validation studies. J Clin Epidemiol 2024; 170: 111344. Colunga-Lozano LE, Foroutan F, Rayner D, et al. Clinical judgment shows similar and sometimes superior discrimination compared to prognostic clinical prediction models. A systematic review. J Clin Epidemiol 2023; 2024: 111200. MAGIC evidence ecosystem foundation. Available: here Agarwal S, Cox AJ, Herrington DM, et al. Coronary calcium score predicts cardiovascular mortality in diabetes: diabetes heart study. Diabetes Care 2013; 36: 972–7. Aminian A, Zajichek A, Arterburn DE, et al. Predicting 10-Year Risk of End-Organ Complications of Type 2 Diabetes With and Without Metabolic Surgery: A Machine Learning Approach. Diabetes Care 2020; 43: 852–9. Bannister CA, Poole CD, Jenkins-Jones S, et al. External validation of the UKPDS risk engine in incident type 2 diabetes: a need for new type 2 diabetes-specific risk equations. Diabetes Care 2014; 37: 537–45. Basu S, Sussman JB, Berkowitz SA, et al. Development and validation of Risk Equations for Complications Of type 2 Diabetes (RECODe) using individual participant data from randomised trials. Lancet Diabetes Endocrinol 2017; 5: 788–98. Basu S, Sussman JB, Berkowitz SA, et al. Validation of Risk Equations for Complications of Type 2 Diabetes (RECODe) Using Individual Participant Data From Diverse Longitudinal Cohorts in the U.S. Diabetes Care 2018; 41: 586–95. Berg DD, Wiviott SD, Scirica BM, et al. A Biomarker-Based Score for Risk of Hospitalization for Heart Failure in Patients With Diabetes. Diabetes Care 2021; 44: 2573–81. Bergmark BA, Bhatt DL, Braunwald E, et al. Risk Assessment in Patients With Diabetes With the TIMI Risk Score for Atherothrombotic Disease. Diabetes Care 2018; 41: 577–85. Coleman RL, Stevens RJ, Retnakaran R, et al. Framingham, SCORE, and DECODE risk equations do not provide reliable cardiovascular risk estimates in type 2 diabetes. Diabetes Care 2007; 30: 1292–3. Copetti M, Shah H, Fontana A, et al. Estimation of Mortality Risk in Type 2 Diabetic Patients (ENFORCE): An Inexpensive and Parsimonious Prediction Model. J Clin Endocrinol Metab 2019; 104: 4900–8. Copetti M, Biancalana E, Fontana A, et al. All-cause mortality prediction models in type 2 diabetes: applicability in the early stage of disease. Acta Diabetol 2021; 58: 1425–8. Copetti M, Baroni MG, Buzzetti R, et al. Validation in type 2 diabetes of a metabolomic signature of all-cause mortality. Diabetes Metab Res Rev 2024; 40. Davis WA, Colagiuri S, Davis TME, et al. Comparison of the Framingham and United Kingdom Prospective Diabetes Study cardiovascular risk equations in Australian patients with type 2 diabetes from the Fremantle Diabetes Study. Med J Aust 2009; 190: 180–4. Elley CR, Robinson T, Moyes SA, et al. Derivation and validation of a renal risk score for people with type 2 diabetes. Diabetes Care 2013; 36: 3113–20. Jin Q, Lau ESH, Luk AO, et al. High-density lipoprotein subclasses and cardiovascular disease and mortality in type 2 diabetes: analysis from the Hong Kong Diabetes Biobank. Cardiovasc Diabetol 2022; 21. Keng MJ, Leal J, Mafham M, et al. Performance of the UK Prospective Diabetes Study Outcomes Model 2 in a Contemporary UK Type 2 Diabetes Trial Cohort. Value Health 2022; 25: 435–42. Laxy M, Schöning VM, Kurz C, et al. Performance of the UKPDS Outcomes Model 2 for Predicting Death and Cardiovascular Events in Patients with Type 2 Diabetes Mellitus from a German Population-Based Cohort. Pharmacoeconomics 2019; 37: 1485–94. Li T-C, Wang H-C, Li C-I, et al. Establishment and validation of a prediction model for ischemic stroke risks in patients with type 2 diabetes. Diabetes Res Clin Pract 2018; 138: 220–8. Lin Y, Shao H, Shi L, et al. Predicting incident heart failure among patients with type 2 diabetes mellitus: The DM-CURE risk score. Diabetes Obes Metab 2022; 24: 2203–11. Pagano E, Konings SRA, Di Cuonzo D, et al. Prediction of mortality and major cardiovascular complications in type 2 diabetes: External validation of UK Prospective Diabetes Study outcomes model version 2 in two European observational cohorts. Diabetes Obes Metab 2021; 23: 1084–91. Prausmüller S, Resl M, Arfsten H, et al. Performance of the recommended ESC/EASD cardiovascular risk stratification model in comparison to SCORE and NT-proBNP as a single biomarker for risk prediction in type 2 diabetes mellitus. Cardiovasc Diabetol 2021; 20. Quan J, Ng CS, Kwok HHY, et al. Development and validation of the CHIME simulation model to assess lifetime health outcomes of prediabetes and type 2 diabetes in Chinese populations: A modeling study. PLoS Med 2021; 18. Razaghizad A, Sharma A, Ni J, et al. External validation and extension of the TIMI risk score for heart failure in diabetes for patients with recent acute coronary syndrome: An analysis of the EXAMINE trial. Diabetes Obes Metab 2023; 25: 229–37. Scarale MG, Copetti M, Garofolo M, et al. The Synergic Association of hs-CRP and Serum Amyloid P Component in Predicting All-Cause Mortality in Patients With Type 2 Diabetes. Diabetes Care 2020; 43: 1025–32. Scarale MG, Mastroianno M, Prehn C, et al. Circulating Metabolites Associate With and Improve the Prediction of All-Cause Mortality in Type 2 Diabetes. Diabetes 2022; 71: 1363–70. Segar MW, Patel KV, Hellkamp AS, et al. Validation of the WATCH-DM and TRS-HFDM Risk Scores to Predict the Risk of Incident Hospitalization for Heart Failure Among Adults With Type 2 Diabetes: A Multicohort Analysis. J Am Heart Assoc 2022; 11. Tanaka S, Tanaka S, Iimuro S, et al. Predicting macro- and microvascular complications in type 2 diabetes: the Japan Diabetes Complications Study/the Japanese Elderly Diabetes Intervention Trial risk engine. Diabetes Care 2013; 36: 1193–9. Tao L, Wilson ECF, Griffin SJ, et al. Performance of the UKPDS outcomes model for prediction of myocardial infarction and stroke in the ADDITION-Europe trial cohort. Value Health 2013; 16: 1074–80. van der Heijden AAWA, Ortegon MM, Niessen LW, et al. Prediction of coronary heart disease risk in a general, pre-diabetic, and diabetic population during 10 years of follow-up: accuracy of the Framingham, SCORE, and UKPDS risk functions: The Hoorn Study. Diabetes Care 2009; 32: 2094–8. Wan EYF, Fong DYT, Fung CSC, et al. Prediction of new onset of end stage renal disease in Chinese patients with type 2 diabetes mellitus - a population-based retrospective cohort study. BMC Nephrol 2017; 18. Willis M, Asseburg C, Slee A, et al. Macrovascular Risk Equations Based on the CANVAS Program. Pharmacoeconomics 2021; 39: 447–61. Xiong K, Zhang S, Zhong P, et al. Serum cystatin C for risk stratification of prediabetes and diabetes populations. Diabetes Metab Syndr 2023; 17: 102882. Yang X, So W-Y, Kong APS, et al. Development and validation of stroke risk equation for Hong Kong Chinese patients with type 2 diabetes: the Hong Kong Diabetes Registry. Diabetes Care 2007; 30: 65–70. Yew SQ, Chia YC, Theodorakis M, et al. Assessing 10-Year Cardiovascular Disease Risk in Malaysians With Type 2 Diabetes Mellitus: Framingham Cardiovascular Versus United Kingdom Prospective Diabetes Study Equations. Asia Pac J Public Health 2019; 31: 622–32. Zhang X, Lv X, Wang N, et al. WATCH-DM risk score predicts the prognosis of diabetic phenotype patients with heart failure and preserved ejection fraction. Int J Cardiol 2023; 385: 34–40. Zhuo X, Melzer Cohen C, Chen J, et al. Validating the UK prospective diabetes study outcome model 2 using data of 94,946 Israeli patients with type 2 diabetes. J Diabetes Complications 2022; 36: 108086. American Diabetes Association Professional Practice Committee. 10. Cardiovascular Disease and Risk Management: Standards of Care in Diabetes-2024. Diabetes Care 2024; 47: S179–218. Garber AJ, Handelsman Y, Grunberger G, et al. CONSENSUS STATEMENT BY THE AMERICAN ASSOCIATION OF CLINICAL ENDOCRINOLOGISTS AND AMERICAN COLLEGE OF ENDOCRINOLOGY ON THE COMPREHENSIVE TYPE 2 DIABETES MANAGEMENT ALGORITHM - 2020 EXECUTIVE SUMMARY. Endocr Pract 2020; 26: 107–39. Received: 20 January 2025 Accepted: 7 May 2025 First published: 14 August 2025