Readers may wish to know how the sample size was determined and whether the assumptions used in this calculation are consistent with the scientific and clinical history and objectives of the study. Readers will also want to know if the study authors were successful in recruiting the targeted number of participants. Methods of calculating sample size in diagnostic research are widespread,74-76 but these calculations are not always performed or provided in diagnostic precision study reports77,78. In this application, the purpose of the evaluation agreement study is generally not to estimate the accuracy of evaluations by a single evaluator. This can be done directly in a validity study that compares the assessments to a definitive diagnosis of a biopsy. Post-hoc analyses performed after analyzing the data present a high risk of false results. It is particularly likely that the results will not be confirmed by subsequent studies. Analyses prior to the report prior to data collection have greater credibility.72 Bossuyt, P.M., Reitsma, J.B., Bruns, D.E., Gatsonis, C.A., Glasziou, P.P., Irwig, L.M., Lijmer, J.G., Moher, D., Rennie, D., &deVet, H.C.W. (2003). Towards a complete and accurate coverage of studies with diagnostic accuracy: the STARD initiative. Clinical Chemistry, 49.1, 1-6.
(Also appears in Annals of Internal Medicine (2003) 138 (1), W1-12 and british Medical Journal (2003) 329 (7379), 41-44) To assess the validity and applicability of these classifications, readers want to know these positivity cuts or outcome categories, how they were determined, and whether they were defined before the study or after data collection. Predefined thresholds may be based on (1) previous trials, (2) limit values used in clinical practice, (3) thresholds recommended in clinical practice guidelines, or (4) thresholds recommended by the manufacturer. If such thresholds do not exist, the authors may be tempted to investigate the accuracy of the different thresholds after data collection. . . .