A retrospective study from Germany evaluated the performance of three AI products: BoneXpert (Visiana), IB Lab PANDA (ImageBiopsy Lab), and BoneView (Gleamer) for determining bone age (BA) using Greulich and Pyle (G&P) standards. The study involved 306 hand radiographs of children aged 1–18 years, conducted at a Central European tertiary center. Each products' BA predictions were compared to assessments by human experts.
For the full cohort, BoneXpert showed the lowest RMSE (0.62 years), followed by BoneView (0.65 years) and PANDA (0.75 years). Dropout rates were as follows: BoneXpert rejected 2.3% of cases due to internal quality control, while BoneView excluded 20.3% of cases for being outside its intended range of ≥3 years to ≤17 years. PANDA had no dropouts. A subgroup, comprising children aged 4.8–15.5 years (females) and 4.9–17.0 years (males), was formed based on the age ranges most commonly assessed in clinical practice. In this subgroup, PANDA outperformed both BoneXpert and BoneView, with the lowest RMSE (0.65 years), compared to BoneXpert (0.66 years) and BoneView (0.68 years). AI predictions correlated highly with ground truth (R² ≥ 0.98) and had lower variability than human readers (SD: 0.54 vs. 0.62 years).
The main conclusion is that all three products reliably estimate BA with minor performance differences, particularly at age boundaries. This study highlights the AI's potential for accurate BA estimation, aiding clinical decision-making.
Read full study
A critical comparative study of the performance of three AI-assisted programs for bone age determination
European Radiology, 2024
Abstract
Objectives
To date, AI-supported programs for bone age (BA) determination for medical use in Europe have almost only been validated separately, according to Greulich and Pyle (G&P). Therefore, the current study aimed to compare the performance of three programs, namely BoneXpert, PANDA, and BoneView, on a single Central European population.
Materials and methods
For this retrospective study, hand radiographs of 306 children aged 1-18 years, stratified by gender and age, were included. A subgroup consisting of the age group accounting for 90% of examinations in clinical practice was formed. The G&P BA was estimated by three human experts-as ground truth-and three AI-supported programs. The mean absolute deviation, the root mean squared error (RMSE), and dropouts by the AI were calculated.
Results
The correlation between all programs and the ground truth was prominent (R2 ≥ 0.98). In the total group, BoneXpert had a lower RMSE than BoneView and PANDA (0.62 vs. 0.65 and 0.75 years) with a dropout rate of 2.3%, 20.3% and 0%, respectively. In the subgroup, there was less difference in RMSE (0.66 vs. 0.68 and 0.65 years, max. 4% dropouts). The standard deviation between the AI readers was lower than that between the human readers (0.54 vs. 0.62 years, p < 0.01).
Conclusion
All three AI programs predict BA after G&P in the main age range with similar high reliability. Differences arise at the boundaries of childhood.
Key points
Question There is a lack of comparative, independent validation for artificial intelligence-based bone age estimation in children. Findings Three commercially available programs estimate bone age after Greulich and Pyle with similarly high reliability in a central European cohort. Clinical relevance The comparative study will help the reader choose a software for bone age estimation approved for the European market depending on the targeted age group and economic considerations.