November 7, 2016 — Researchers from the Netherlands recently completed a study that explored the variation between radiologists when judging fibroglandular breast density visually on serially-acquired mammograms, and whether an automated density assessment would provide better consistency. Results of the study, “Consistency of breast density categories in serial screening mammograms: A comparison between automated and human assessment,” recently published in the Breast Journal, suggest that an automated computer measurement, VolparaDensity, provides better consistency than density readings performed visually by radiologists.
In the study, Katharina Holland and a team from the Radboud University Medical Center in The Netherlands, investigated the categorization of pairs of subsequent screening mammograms into density classes by human readers and by an automated breast density assessment system. One thousand mammograms belonging to 500 pairs of subsequent screening exams were randomly selected from the Dutch breast screening program. The serial mammograms, which had an average 30-month interval between them, were categorized into either two-category (“fatty” versus “dense”) or a four-category (BI-RADS 4th Edition: 1, 2, 3 or 4) density classes by four readers, including three specialized breast radiologists, and by using Volpara Density Grades (VDG), which are analogous to the BI-RADS (Breast Imaging-Reporting and Data System) density categories.
In addition, to better replicate clinical practice, a “group reading” was performed. In these group readings, each mammogram set was scored by randomly selecting one of the four radiologists’ scores and assigning it to the study; the intention here was to emulate the fact that serial mammograms are usually read by different radiologists.
Volpara identified a significantly higher percentage of women who did not exhibit a change between two-point density categories (90.4 percent) compared to the group reading of radiologists (86.8 percent). Volpara’s agreement to its own readings between serial exams was significantly higher than the group radiologist readings were to each other. Volpara maintained higher kappa agreement values on both two- and four- category scales, prompting researchers to suggest that Volpara produced more consistent density readings than the radiologists.
When women did exhibit a density change between screens, most of the instances of change were from the “dense” to “fatty” category (this happened in approximately 70 percent of cases of density change). Change in the other direction (“dense” to “fatty”) occurred as well, though less frequently — this could potentially be caused by use of hormone replacement therapy, weight loss or measurement error. There was no significant difference between the direction of density change between Volpara measurements and those of the radiologists.
Overall, the indication was that an automated computer measurement such as Volpara provides better consistency than density readings done by radiologists, when emulating clinical screening practice. “These results are particularly important given the recent work presented by Sprague et al. The use of an automated density measurement algorithm could prove desirable in screening practice — not only because of improved efficiency in terms of time and cost, but also because of enhanced reliability. Furthermore, the results indicate that Volpara could prove useful in temporal measurements of breast density,” noted Prof. Nico Karssemeijer, Radboud University Nijmegen. “These could be valuable for examining breast cancer risk, by looking at the occurrence of age-related involution or looking at response to adjuvant or neoadjuvant hormonal therapy, which can be reflected in breast density.”
For more information: www.onlinelibrary.wiley.com