Journal List > Ann Lab Med > v.45(3) > 1516090413

Lee: Agreement Evaluation in Statistical Analyses: Misconceptions and Key Features
Developing and measuring disease-predictive biomarkers is essential for making accurate clinical decisions. Comprehensive statistical analyses are vital for illustrating the distributions of specific biomarker concentrations in patient groups, assessing correlations between biomarkers and clinical outcomes, predicting diseases, and evaluating the variability across testing methods. To this end, researchers set research hypotheses, selected suitable analytical methods to perform statistical analyses, and drew conclusions by interpreting the data and analyzing P-values derived from these statistical tests. Many researchers often simplify their interpretation of P-values, considering results significant when P<0.05 and insignificant otherwise. Few researchers truly understand the meaning of the P-value. Therefore, we first define what a P-value is.
The P-value represents the probability of obtaining a result at least as extreme as the one observed, assuming that the null hypothesis is true [1]. When the null hypothesis is true, failing to reject is the right decision. However, in the context of hypothesis testing, when the likelihood of encountering values more extreme than the current statistic falls <5%, such an occurrence is considered statistically unlikely, thereby meeting the criteria for significance at the 5% level. This value chosen as the criterion for decision-making is called the significance level α. This level is determined by the researcher and can vary [2]; it is often set at 10% in clinical studies or as strict as 0.1% in genomic research.
A review of the 94 articles published in ALM in 2024 revealed that original articles accounted for the largest proportion (N=37), followed by letters to editor (N=26), brief communications (N=14), and reviews (N=7) (Fig. 1). The most frequently used statistical methods were mean comparison techniques, such as the t-test, Wilcoxon rank-sum test, and Mann–Whitney U-test (N=23), and correlation analyses (N=13). Regression analyses, including Passing–Bablok, Deming, and logistic regression, were applied in nine articles, ROC analysis in eight articles, and survival analysis methods (Kaplan–Meier, log-rank, and proportional hazard regression) in four articles. Other methods, such as Bland–Altman plot and cubic spline models, were also utilized.
Among the 94 articles in total, 42 articles reported statistical analyses and presented results, of which only 19 adequately reported and interpreted their results. Misinterpretation most commonly occurs in correlation analysis. Three major types of misinterpretation were observed: 1) interpretation of P-values as evidence of association without reporting correlation coefficients; 2) claims of significant associations based solely on P<0.05 despite low correlation coefficients; and 3) inappropriate application of Pearson correlation analysis to variables exhibiting non-linear distributions.
Multiple methodological concerns regarding the application of regression analyses were identified. The primary issues included: 1) misinterpretation of logistic regression coefficients as hazard ratios; 2) confusion between correlation coefficients (r) and coefficients of determination (r2); and 3) incomplete reporting of Passing–Bablok and Deming regression analyses, with graphs and regression equations being presented without 95% confidence intervals (CIs). Additional methodological deficiencies included the reporting of P-values without specifying statistical methods and failure to present results for stated analytical procedures. These review results provided a foundation for summarizing the characteristics and methodological considerations for frequently misused statistical methods.
Correlation analysis is used to evaluate the magnitude of linear association between two continuous variables. The correlation coefficient is calculated by dividing the covariance of two variables by the standard deviation of each variable. The population correlation coefficient is denoted by ρ, and the sample correlation coefficient by r, which ranges between –1 and 1 [3]. Although the CI of the correlation coefficient can be calculated, it is often not presented in papers.
Correlation coefficient=EXμXYμYσXσYCovX,YσXσY
To interpret the P-value resulting from a correlation analysis, one must first consider the hypothesis of the correlation analysis. The null hypothesis in correlation analysis posits the absence of a linear relationship between two variables, represented as ρ=0 [4]. When the P-value is sufficiently small to reject the null hypothesis, the conclusion is that ρ is not 0. Consider the following two scenarios. First, the correlation coefficient is low, and the P-value indicates significance (ρ=0.2, P<0.0001); second, the correlation coefficient is high, and the P-value does not indicate significance (ρ=0.8, P=0.5). In the first scenario, the correlation coefficient indicates a very weak linear relationship between the two variables, and the P-value demonstrates that ρ is not 0, i.e., a very weak linear relationship exists. In the second scenario, although the P-value did not allow for the rejection of the null hypothesis that ρ=0, the two variables show a strong linear relationship. However, in terms of the meaning of the P-value, the strong association observed in the current sample holds more significance than the results from multiple population-based samples. Notably, the P-value in correlation analysis does not validate the numerical magnitude of the correlation coefficient.
Regression analysis explores how the dependent variable (Y) is explained or predicted by independent variables (X). The effect size of X on Y is denoted by the regression coefficient β, interpreted as the amount of change in Y when X increases by one unit. The null hypothesis is that the regression coefficient β=0, and based on the calculated P-value, the independent variable is thought to have a significant effect on the dependent variable. In regression analysis, the most suitable model must be identified from several regression models based on the composition of the independent variables, which involves assessing the model’s fit. This is evaluated using the coefficient of determination (r2), which ranges from 0 to 1, with values closer to 1 indicating that the regression model well explains the data [5]. Distinguishing between the correlation coefficient r used in correlation analysis and the coefficient of determination r2 used in regression analysis is essential to avoid confusion.
r2=SSRTSS=1SSETSS

TSS: total sum of squares, SSR: sum of squares due to regression, SSE: sum of squares due to error.

Passing–Bablok regression analysis is a nonparametric method for analyzing the agreement between two continuous measurement values. Deming regression is a similar method. The Deming model assumes that both the independent variable x and the dependent variable y are measured with errors and also assumes the normality of the errors [6, 7]. In contrast, Passing–Bablok regression does not assume normality and homoscedasticity and is robust to outliers [8, 9]. The null hypothesis in Passing–Bablok regression is that the measured values are connected by a line with a unit slope. When the 95% CI of the slope includes 1 and that of the intercept includes 0, the two measurement methods are considered equivalent within the investigated range. A systematic discrepancy between the two measurement methods is indicated when the CI of the intercept does not include 0. When the CI of the slope does not include 1, a proportional difference exists between the methods. As a hypothetical example, Passing–Bablok regression analysis reveals slope and intercept estimates of 0.947 (95% CI: 0.899–1.135) and 0.028 (95% CI: 0.023–0.037), respectively, and the regression line is described as y=0.028+0.947*x. The CI of the slope includes 1, but the CI of the intercept does not include 0. In this case, the hypothesis that the measurements of the two methods are identical is rejected. Therefore, the CIs should be provided along with the estimates for the slope and intercept when presenting Passing–Bablok analysis results.
In summary, in correlation analysis, interpretation focuses on the magnitude of the correlation coefficient (ρ) rather than the significance of the P-value. In regression analysis, the coefficient of determination (r2) represents the explanatory power of the regression equation and indicates the variation that can be explained by the regression model; however, it does not indicate the correlation between two variables. In Passing–Bablok regression analysis, both the CIs for the slope and intercept should be presented to interpret the agreement between methods. Finally, when a P-value is reported, the analytical method used must be specified.

ACKNOWLEDGEMENTS

None.

Notes

AUTHOR CONTRIBUTIONS

Lee S: conceptualization and writing.

CONFLICTS OF INTEREST

None declared.

RESEARCH FUNDING

None declared.

References

1. Wasserstein RL, Lazar NA. 2016; The ASA's statement on values: context, process, and purpose. Am Stat. 70:129–33. DOI: 10.1080/00031305.2016.1154108.
2. Fisher RA. Kotz S., Johnson N.L., editors. 1992. Statistical Methods for Research Workers. Breakthroughs in Statistics. Springer Series in Statistics. Springer;New York: p. 66–70. DOI: 10.1007/978-1-4612-4380-9_6.
3. Schober P, Boer C, Schwarte LA. 2018; Correlation coefficients: appropriate use and interpretation. Anesth Analg. 126:1763–8. DOI: 10.1213/ANE.0000000000002864. PMID: 29481436.
crossref
4. Illowsky B. 2018; Testing the significance of the correlation coefficient. OpenStax. DOI: 10.1080/09720502.2015.1084778.
5. Nagelkerke NJD. 1991; A note on a general definition of the coefficient of determination. Biometrika. 78:691–2. DOI: 10.1093/biomet/78.3.691.
crossref
6. Deming WE. 1943. Statistical adjustment of data. Dover Publications edition. 1985 ed. John Wiley & Sons;New York: DOI: 10.2307/2965635.
7. Wicklin R. Deming regression for comparing different measurement methods. https://blogs.sas.com/content/iml/2019/01/07/deming-regression-sas.html. updated on Dec 2024.
8. Passing H, Bablok . 1983; A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in clinical chemistry, part I. J Clin Chem Clin Biochem. 21:709–20. DOI: 10.1515/cclm.1983.21.11.709. PMID: 6655447.
crossref
9. Passing H, Bablok W. 1984; Comparison of several regression procedures for method comparison studies and determination of sample sizes. Application of linear regression procedures for method comparison studies in Clinical Chemistry, Part II. J Clin Chem Clin Biochem. 22:431–45. DOI: 10.1515/cclm.1984.22.6.431. PMID: 6481307.
crossref

Fig. 1

Distribution of article types (A) and statistical methods used (B) in articles published in ALM in 2024.

Abbreviations: NA, not applicable; Chisquare, Chi-square test; exact, Fisher’s exact test; Corr, Correlation analysis; Reg, Regression; KM, Kaplan–Meier survival curve; PHREG, Cox’s proportional hazard regression.
alm-45-3-276-f1.tif
TOOLS
Similar articles