Information

  • Publication Type: Master Thesis
  • Workgroup(s)/Project(s): not specified
  • Date: 2022
  • Open Access: yes
  • First Supervisor: Renata RaidouORCID iD
  • Pages: 97
  • Keywords: artificial intelligence, statistical methodologies, performance assessment, continuous variable, bland-altman, interchangeability

Abstract

The radiological determination of bone age (BA) from a left-hand x-ray continues to be the reference standard for skeletal maturity assessment related to short or long stature, and underlying conditions. Artificial (AI) algorithms are becoming more prevalent due to the subjectivity and time-consuming nature of BA assessment. Therefore, we proposed methods and statistical recommendations in assessing standalone performance of an AI tool. Our strategy was verified in a retrospective study using the AI model, PANDA, a fully automated AI software used to estimate bone age (BA) on hand radiographs. We analyzed radiographs of 342 patients retrospectively. Three board-certified pediatric radiologists made blind reads of BA using the Greulich & Pyle (GP) method independently. The AI-software, PANDA, was subsequently used to provide automated estimations of BA from the same set of images. The ground truth was established based on the mean of the estimations. We assessed agreement of AI with readers based on comparison of Bland-Altman limits of agreement (LOA), orthogonal linear regression, and interchangeability.Bland-Altman assessment displayed a mean difference between readers and AI to be -0.72 with 95% CI (-1.46; 0.02) months displaying no fixed bias. Using orthogonal linear regression, the slope between readers and AI software was reported to be 1.02 95% CI (1.00, 1.03). No proportional bias was observed. The square root of the absolute value of the equivalence index of the AI software compared to assessments made by readers was observed to be -5.8 months. This indicates that the AI software is interchangeable with expert readers. The proposed framework is generalizable to the other applications aside from bone age. If one wants to find bias between two techniques of measurement, regression analysis should be performed. If the purpose is to see if one method may be safely replaced by another, especially in clinical practice, Bland-Altman plot is preferred. If there is no adequate reference standard to compare to, interchangeability can be used. This statistical method does not rely on a reference standard.

Additional Files and Images

Additional images and videos


Additional files

Weblinks

BibTeX

@mastersthesis{chung-2022-sma,
  title =      "Statistical methodologies for assessing an artificial
               intelligence (AI) software in a diagnostic setting",
  author =     "Tek Sin Chung",
  year =       "2022",
  abstract =   "The radiological determination of bone age (BA) from a
               left-hand x-ray continues to be the reference standard for
               skeletal maturity assessment related to short or long
               stature, and underlying conditions. Artificial (AI)
               algorithms are becoming more prevalent due to the
               subjectivity and time-consuming nature of BA assessment.
               Therefore, we proposed methods and statistical
               recommendations in assessing standalone performance of an AI
               tool. Our strategy was verified in a retrospective study
               using the AI model, PANDA, a fully automated AI software
               used to estimate bone age (BA) on hand radiographs. We
               analyzed radiographs of 342 patients retrospectively. Three
               board-certified pediatric radiologists made blind reads of
               BA using the Greulich & Pyle (GP) method independently. The
               AI-software, PANDA, was subsequently used to provide
               automated estimations of BA from the same set of images. The
               ground truth was established based on the mean of the
               estimations. We assessed agreement of AI with readers based
               on comparison of Bland-Altman limits of agreement (LOA),
               orthogonal linear regression, and
               interchangeability.Bland-Altman assessment displayed a mean
               difference between readers and AI to be -0.72 with 95% CI
               (-1.46; 0.02) months displaying no fixed bias. Using
               orthogonal linear regression, the slope between readers and
               AI software was reported to be 1.02 95% CI (1.00, 1.03). No
               proportional bias was observed. The square root of the
               absolute value of the equivalence index of the AI software
               compared to assessments made by readers was observed to be
               -5.8 months. This indicates that the AI software is
               interchangeable with expert readers. The proposed framework
               is generalizable to the other applications aside from bone
               age. If one wants to find bias between two techniques of
               measurement, regression analysis should be performed. If the
               purpose is to see if one method may be safely replaced by
               another, especially in clinical practice, Bland-Altman plot
               is preferred. If there is no adequate reference standard to
               compare to, interchangeability can be used. This statistical
               method does not rely on a reference standard.",
  pages =      "97",
  address =    "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
  school =     "Research Unit of Computer Graphics, Institute of Visual
               Computing and Human-Centered Technology, Faculty of
               Informatics, TU Wien",
  keywords =   "artificial intelligence, statistical methodologies,
               performance assessment, continuous variable, bland-altman,
               interchangeability",
  URL =        "https://www.cg.tuwien.ac.at/research/publications/2022/chung-2022-sma/",
}