Information
- Publication Type: Master Thesis
- Workgroup(s)/Project(s): not specified
- Date: 2022
- Open Access: yes
- First Supervisor: Renata Raidou
- Pages: 97
- Keywords: artificial intelligence, statistical methodologies, performance assessment, continuous variable, bland-altman, interchangeability
Abstract
The radiological determination of bone age (BA) from a left-hand x-ray continues to be the reference standard for skeletal maturity assessment related to short or long stature, and underlying conditions. Artificial (AI) algorithms are becoming more prevalent due to the subjectivity and time-consuming nature of BA assessment. Therefore, we proposed methods and statistical recommendations in assessing standalone performance of an AI tool. Our strategy was verified in a retrospective study using the AI model, PANDA, a fully automated AI software used to estimate bone age (BA) on hand radiographs. We analyzed radiographs of 342 patients retrospectively. Three board-certified pediatric radiologists made blind reads of BA using the Greulich & Pyle (GP) method independently. The AI-software, PANDA, was subsequently used to provide automated estimations of BA from the same set of images. The ground truth was established based on the mean of the estimations. We assessed agreement of AI with readers based on comparison of Bland-Altman limits of agreement (LOA), orthogonal linear regression, and interchangeability.Bland-Altman assessment displayed a mean difference between readers and AI to be -0.72 with 95% CI (-1.46; 0.02) months displaying no fixed bias. Using orthogonal linear regression, the slope between readers and AI software was reported to be 1.02 95% CI (1.00, 1.03). No proportional bias was observed. The square root of the absolute value of the equivalence index of the AI software compared to assessments made by readers was observed to be -5.8 months. This indicates that the AI software is interchangeable with expert readers. The proposed framework is generalizable to the other applications aside from bone age. If one wants to find bias between two techniques of measurement, regression analysis should be performed. If the purpose is to see if one method may be safely replaced by another, especially in clinical practice, Bland-Altman plot is preferred. If there is no adequate reference standard to compare to, interchangeability can be used. This statistical method does not rely on a reference standard.
Additional Files and Images
Additional images and videos
Additional files
Weblinks
BibTeX
@mastersthesis{chung-2022-sma,
title = "Statistical methodologies for assessing an artificial
intelligence (AI) software in a diagnostic setting",
author = "Tek Sin Chung",
year = "2022",
abstract = "The radiological determination of bone age (BA) from a
left-hand x-ray continues to be the reference standard for
skeletal maturity assessment related to short or long
stature, and underlying conditions. Artificial (AI)
algorithms are becoming more prevalent due to the
subjectivity and time-consuming nature of BA assessment.
Therefore, we proposed methods and statistical
recommendations in assessing standalone performance of an AI
tool. Our strategy was verified in a retrospective study
using the AI model, PANDA, a fully automated AI software
used to estimate bone age (BA) on hand radiographs. We
analyzed radiographs of 342 patients retrospectively. Three
board-certified pediatric radiologists made blind reads of
BA using the Greulich & Pyle (GP) method independently. The
AI-software, PANDA, was subsequently used to provide
automated estimations of BA from the same set of images. The
ground truth was established based on the mean of the
estimations. We assessed agreement of AI with readers based
on comparison of Bland-Altman limits of agreement (LOA),
orthogonal linear regression, and
interchangeability.Bland-Altman assessment displayed a mean
difference between readers and AI to be -0.72 with 95% CI
(-1.46; 0.02) months displaying no fixed bias. Using
orthogonal linear regression, the slope between readers and
AI software was reported to be 1.02 95% CI (1.00, 1.03). No
proportional bias was observed. The square root of the
absolute value of the equivalence index of the AI software
compared to assessments made by readers was observed to be
-5.8 months. This indicates that the AI software is
interchangeable with expert readers. The proposed framework
is generalizable to the other applications aside from bone
age. If one wants to find bias between two techniques of
measurement, regression analysis should be performed. If the
purpose is to see if one method may be safely replaced by
another, especially in clinical practice, Bland-Altman plot
is preferred. If there is no adequate reference standard to
compare to, interchangeability can be used. This statistical
method does not rely on a reference standard.",
pages = "97",
address = "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
school = "Research Unit of Computer Graphics, Institute of Visual
Computing and Human-Centered Technology, Faculty of
Informatics, TU Wien",
keywords = "artificial intelligence, statistical methodologies,
performance assessment, continuous variable, bland-altman,
interchangeability",
URL = "https://www.cg.tuwien.ac.at/research/publications/2022/chung-2022-sma/",
}