Clinical validation of a targeted methylation-based multi-cancer early detection test

Background

A multi-cancer early detection (MCED) test could complement existing screening methods.
The Circulating Cell-free Genome Atlas study (CCGA; NCT02889978) demonstrated that a blood-based MCED test could detect cancer signals across multiple cancer types and predict cancer signal origin (CSO) with high accuracy.
The objective of the third and final CCGA substudy was to validate an MCED test version further refined for use as a screening tool.

Patients and Methods

This pre-specified substudy included 4077 participants in an independent validation set (cancer: n = 2823; non-cancer: n = 1254, non-cancer status confirmed at year-one follow-up).
Specificity, sensitivity, and CSO prediction accuracy were measured.

Results

Specificity for cancer signal detection was 99.5% [95% confidence interval (CI): 99.0% to 99.8%].
Overall sensitivity for cancer signal detection was 51.5% (49.6% to 53.3%); sensitivity increased with stage [stage I: 16.8% (14.5% to 19.5%), stage II: 40.4% (36.8% to 44.1%), stage III: 77.0% (73.4% to 80.3%), stage IV: 90.1% (87.5% to 92.2%)].
Stage I-III sensitivity was 67.6% (64.4% to 70.6%) in 12 pre-specified cancers that account for approximately two-thirds of annual USA cancer deaths and was 40.7% (38.7% to 42.9%) in all cancers.
Cancer signals were detected across >50 cancer types.
Overall accuracy of CSO prediction in true positives was 88.7% (87.0% to 90.2%).

Conclusion

In this pre-specified, large-scale, clinical validation substudy, the MCED test demonstrated high specificity and accuracy of CSO prediction and detected cancer signals across a wide diversity of cancers.
These results support the feasibility of this blood-based MCED test as a complement to existing single-cancer screening tests.
Clinical trial number: NCT02889978.
Key words: cancer, multi-cancer early detection, liquid biopsy, methylation, cell-free nucleic acids, machine learning

Introduction

Cancer will soon be the leading cause of mortality globally.
Improving population-scale early detection reduces disease- and treatment-related morbidity, increases the likelihood of treatment success, improves quality of life and reduces treatment cost and complexity.
Currently, only five cancer screening tests are available in the USA (breast, colorectal, cervical, lung, and prostate), collectively accounting for only 42% of annual cancer incidence in people aged 50-79 years.
These screening tests are associated with high false-positive rates, overdiagnosis and overtreatment, disparities in adherence, and low positive predictive value (PPV) as well as a high cumulative false-positive rate when used sequentially.
It has been estimated that cancer detection before stage IV could reduce cancer-related deaths by approximately 15% within 5 years.
A new approach called multi-cancer early detection (MCED) has the potential to achieve this goal by detecting signals for multiple cancers from cell-free DNA (cfDNA) or other circulating analytes in the blood shed by tumors.
These assays use genomic sequencing or other approaches, sometimes in combination with machine learning to detect signals from cfDNA methylation, mutations, and/or fragmentation, or other circulating analytes such as proteins.
Results from early studies of these tests have shown promise in detecting cancers at earlier stages, although none are yet available outside of clinical trials.
The Circulating Cell-free Genome Atlas (CCGA; NCT02889978) study was designed to develop and validate an MCED test to detect cancer signals across multiple cancer types and predict cancer signal origin (CSO) via a single blood draw.
Modeled data from this test have shown that its use in the general population could shift cancer detection from stage IV to earlier stages (stages I-III), potentially reducing cancer mortality.
CCGA was divided into three substudies; in the first, a comprehensive comparison of genomic sequencing approaches identified that whole-genome bisulfite sequencing (WGBS; detecting genome-wide DNA methylation status) outperformed other methods.
In the second substudy, the selected WGBS assay was refined into a targeted methylation assay, and machine learning classifiers for cancer detection and CSO prediction were developed.
The third and final CCGA substudy, reported herein, is a large clinical validation study of this MCED test.

Patients and Methods

CCGA (NCT02889978) is a prospective, multi-center, case-control, observational study with longitudinal follow-up that enrolled 15 254 participants (8584 with cancer; 6670 without cancer) from 142 sites in North America between August 2016 and February 2019.
All participants were required to provide written informed consent.
The study was approved by the Institutional Review Board or an independent ethics committee at each participating trial site and conducted in accordance with the International Conference on Harmonization for Good Clinical Practice guidelines and the Declaration of Helsinki.
CCGA was divided into three pre-specified substudies: (i) discovery, (ii) training and validation with the selected and updated assay and classifiers, and (iii) clinical validation within an independent validation set with a further refined assay and classifiers optimized for screening.
A total of 5309 qualified participants were included in the third substudy.
Adults (>=20 years of age) were enrolled as previously described.
Participants eligible for the cancer arm included individuals diagnosed with cancer and/or who were scheduled to undergo biopsy and/or surgical resection for known or highly suspected malignancy.
Individuals who had received chemotherapy or radiotherapy, undergone definitive local therapy, or received more extensive surgery than that required to establish the diagnosis, before study blood draw, were ineligible.
Non-cancer participants were enrolled from participating sites to control for confounding factors.

Study objectives and corresponding measures of test performance

The primary objectives of this substudy were to evaluate test performance for cancer signal detection, CSO prediction, and both combined
Test performance was defined as:
- Cancer signal detection: measured by sensitivity (proportion of participants with a positive test result among all cancer participants in an analysis set) and specificity (proportion of participants with a negative test result among non-cancer participants)
- CSO prediction: measured by overall accuracy of CSO prediction (proportion of participants with a correct predicted CSO label among true positive participants, excluding those who had an unknown origin) and displayed in a confusion matrix (comparing target CSO label to predicted CSO label).
Secondary objectives included:
- Test performance by age group
- Test performance for cancer signal detection by method of cancer diagnosis [screening test (breast, cervical, colorectal, lung, and prostate) or clinical presentation]
- Test performance for cancer signal detection in a pre-specified group of 12 cancer classes (anus, bladder, colon/rectum, esophagus, head and neck, liver/bile duct, lung, lymphoma, ovary, pancreas, plasma cell neoplasm, and stomach) that account for approximately two-thirds of annual USA cancer deaths. Previous studies identified these cancers as having higher cancer signal detection, consistent with their ability to release higher amounts of cfDNA into circulation.
Exploratory objectives included an extrapolation of the PPV and negative predictive value (NPV; defined as the proportion of non-cancer participants among those with a negative test result) adjusted by the Surveillance, Epidemiology, and End Results Program (SEER) incidence rates for those aged 50-79 years and an evaluation of the test positive rate by American Joint Committee on Cancer (AJCC) cancer types.
A post hoc analysis of test performance for cancer signal detection was carried out for the following three categories: hematologic malignancies, solid tumors with common screening options, and solid tumors without common screening options.

Sample Collection, Processing and Analysis

Plasma and tumor tissue sample collection, accessioning, storage, and processing were carried out as described previously. To minimize bias, blood samples from cancer and non-cancer participants were randomized for processing across batches, operators, and reagent lots.
The targeted methylation assay was conducted as previously described, with adjustments. Briefly, plasma cfDNA (up to 75 ng) was subjected to customized bisulfite conversion reaction prepared as a dual indexed sequencing library, and enriched using standard hybridization capture conditions, for 150-bp paired-end sequencing on the Illumina NovaSeq.

Clinical Data Collection

Clinical, pathology, and radiology data were collected from participant questionnaires and abstracted from medical records, including reports of adverse events from the study blood draw. Additionally, the World Health Organization (WHO) International Classification of Diseases for Oncology (ICD-O) morphologic and behavior codes were assigned to cancers by pathologists.
Clinical stage was assigned by the treating physician or a certified cancer registry professional according to the AJCC Staging Manual (7th or 8th edition). Cancers without an AJCC staging classification were analyzed without staging information.
Participant follow-up for clinical information was carried out annually (within approximately 2 months from anniversary of enrollment) from a search of medical records or direct contact with participants by clinical research staff.

Classification of cancer versus non-cancer and CSO

Custom software was built to classify samples using source models that recognized methylation patterns per region as similar to those derived from a particular cancer class but potentially shared across multiple cancer classes, followed by a pair of machine learning modules – one to determine cancer/non-cancer status and the other to predict the CSO label (Supplementary Table S4, available at https://doi.org/10.1016/j.annonc.2021.05.806).
Three key classifier modifications were implemented to improve performance in a screening application:
- The specificity threshold (i.e. false-positive rejection) was refined to account for cancer-like signals from prevalent non-malignant hematological conditions in non-cancer individuals
- CSOs were refined to improve signal identification, resulting in a new CSO class – ‘Neuroendocrine Cells of Lung or Other Organs’
- The CSO classifier was modified to remove an ‘indeterminate’ CSO category and return a CSO prediction for all test positive samples.
The classifier was trained on 17 339 samples [12 185 from the first and second CCGA substudies, 4891 from a separate clinical study (STRIVE; NCT03085888), and 263 obtained commercially] from 6383 distinct individuals; 1014 samples were analyzed using WGBS, and 16 325 samples were analyzed using a targeted methylation assay.
The locked classifier was trained to target 99.4% specificity. Samples from participants in the third CCGA study were entirely reserved for independent validation.

Statistical Analysis

For demographics and baseline characteristics, descriptive statistics are reported. For categorical variables, the number and percentage of participants in each category were calculated; for continuous variables, the total number of participants (n), mean, standard deviation (SD) or standard error (SE), median, first quartile (Q1), third quartile (Q3), minimum, and maximum were calculated.
The 95% confidence intervals (CIs) for these test performance measures (sensitivity, specificity, and overall accuracy of CSO prediction) were calculated using the Wilson (score) method, unless otherwise specified.
No formal statistical tests were conducted. All analyses were carried out using R software, version 3.6 or higher.

Results

A total of 5309 participants (enrolled as cancer, n = 3237; enrolled as non-cancer, n = 2069; missing enrollment status, n = 3) were included in this third CCGA validation substudy. Of these, 4077 (cancer, n = 2823; non-cancer, n = 1254) were included in the Confirmed Status analysis set.
The most common reasons for exclusion were incomplete year-one follow-up for non-cancer participants (n = 324), presence of non-malignant conditions at enrollment (n = 283), and unconfirmed cancer or treatment status at blood draw (n = 171).
All exclusion categories were pre-specified before unblinding. The assay failure rate was low [0.8% (45/5309)]. A total of 0.4% (20/5309) of participants reported an adverse event related to the blood draw; 17 of 20 were mild in severity, and 3 of 20 were moderate. No serious adverse events related to the blood draw were reported.

Participant demographics and baseline characteristics

Participant demographics and baseline characteristics were comparable between groups, with expected differences in age group distribution between the cancer and non-cancer groups (i.e. there were more cancers than non-cancers in the older age groups).
Mean (SD) age was 60.6 (12.4) years, 55.4% (2258/4077) were female (with a higher percentage in the non-cancer versus cancer group), and 81.2% (3312/4077) were classified as non-Hispanic white. In the cancer group, 54.9% (1552/2823) had stage I/II cancer.

Cancer signal detection

Specificity was 99.5% (95% CI: 99.0% to 99.8%; 1248/1254), indicating a low false-positive rate of 0.5%.
Overall sensitivity across cancer classes and stages was 51.5% (49.6%-53.3%; 1453/2823).
Sensitivity in the pre-specified group of 12 cancer classes was 76.3% (74.0%-78.5%) across all stages and was 67.6% (64.4%-70.6%) for stages I-III. Across cancer classes, sensitivity was 51.9% (50.0%-53.8%) for stages I-IV and was 40.7% (38.7%-42.9%) for stages I-III.
As expected, sensitivity of cancer signal detection increased with increasing stage [stage I, 16.8% (14.5%-19.5%); stage II, 40.4% (36.8%-44.1%); stage III, 77.0% (73.4%-80.3%); stage IV, 90.1% (87.5%-92.2%)]

Test positive rate by AJCC cancer type

Cancer types defined by the more granular AJCC criteria were assigned to each cancer participant; because of the small sample sizes for certain cancer types, the test positive rate for each cancer type (not sensitivity) is reported.
Participants with multiple primaries or unknown primary were excluded because there was insufficient information to assign an AJCC cancer type. The test positive rate across all AJCC cancer types and clinical stages was 51.0% (1420/2786; 49.1%-52.8%).
Overall, cancer signals were detected across >50 AJCC cancer types.

CSO prediction

The overall accuracy of CSO prediction was 88.7% (87.0%-90.2%) in true positives (excluding participants with an unknown origin) based on the top CSO label prediction.

Performance in subgroups

Specificity, sensitivity, and accuracy of CSO prediction showed similar results across age groups (
When evaluating test performance in groups by method of cancer diagnosis, overall sensitivity in cancers identified by clinical presentation [63.9% (61.8%-66.0%)] was higher than that in cancers identified by screening tests [18.0% (15.5%-20.8%)], likely due to a preponderance of early-stage prostate and breast cancers in the screen-detected cancer classes.
In a post hoc analysis that categorized cancers into three groups to better characterize findings in solid versus hematological cancers, overall sensitivity for solid tumors without common screening options was nearly twice that for solid tumors with common screening options, including breast, colorectal, cervical, and prostate [65.6% (876/1336; 63.0%-68.1%) and 33.7% (396/1175; 31.1%-36.5%)]. Overall sensitivity for hematologic malignancies was 55.1% (156/283; 49.3%-60.8%).

Extrapolated positive and negative predictive values

To further understand performance in a potential screening population, PPV and NPV were extrapolated (adjusted to SEER cancer incidence and stage distribution in the 50-79 years age group). In this analysis, PPV for cancer signal detection was 44.4% (28.6%-79.9%), and NPV was 99.4% (99.4%-99.5%).
To understand the value of the test in the event of accurate cancer signal detection but incorrect CSO prediction, residual PPV of incorrect predicted CSO (conditional probability given a positive test result but an incorrect top CSO prediction when one or two CSO predictions were generated) was found to be 8.0% (4.2%-29.7%); when implemented clinically, this could potentially be high enough to warrant a workup.

Discussion

The ability to detect cancer at earlier stages has the potential to reduce cancer mortality.
The introduction of MCED tests together with current screening protocols on a population scale may increase the absolute number of cancers detected through screening and shift detection to earlier stages when outcomes are better and mortality is lower.
CCGA is a large-scale study (N = 15254) that included systematic comparison of genomic methods to identify the best analytic approach and large-scale validation of a refined assay and classifiers to support the development of an MCED test (Galleri®).
The low false-positive rate of 0.5% suggests that the test may limit additional harms due to unnecessary diagnostic workups when implemented clinically; this compares favorably to the false-positive rates associated with current recommended single-cancer screening tests (9.0%-14.5%), which generally optimize sensitivity over specificity.
By contrast, this MCED test was designed to maintain a high specificity while detecting common signals across many cancer types, allowing for an overall increase in the population cancer detection rate.
The extrapolated PPV reported here based on SEER cancer incidence and clinical stage distribution was 44.4% in the screening-eligible 50-79-year age group, which is higher than that of currently recommended screening tests, as PPV is driven by specificity and population incidence.
Studies in intended-use populations that will provide more accurate PPV estimates are ongoing (STRIVE, NCT03085888; SUMMIT, NCT03934866; PATHFINDER, NCT04241796). It should be noted that the CCGA study includes 5-year longitudinal follow-up, and these data will provide additional insight into test performance.
A true ‘multi-cancer’ test should be able to detect cancer signals in as many cancer types as possible to maximize the population cancer detection rate (the fraction of cancers detected from the total expected cancers in the population).
There are limitations to using sensitivity to measure performance of an MCED test in that the absolute number of cancers detected increases with each additional cancer class, even if the average sensitivity over all cancer classes decreases. In other words, overall sensitivity of 51.5% would represent more absolute cancer cases detected than the 76.3% sensitivity in the restricted pre-specified set of 12 cancer classes. These observations reinforce the limitation of the sensitivity metric, which may not reflect the total clinical utility of an MCED test. Thus, PPV may be a more clinically relevant metric.
Cancer incidence increases with age, so it is important to ensure consistent performance across age groups. In this substudy, test performance, including specificity, was similar across age groups, such that no particular age group would be expected to be at higher risk of additional harms associated with false positives.
This MCED test is meant to complement, and not replace, existing screening tests.
Sensitivity of this test was higher for cancers identified by clinical presentation than by existing single-cancer screening tests; this observed lower sensitivity in screen-detected cancers was driven by a preponderance of early-stage breast and prostate cancers, both of which have a recommended screening test with proven survival benefit. The intended complementarity is also supported by the post hoc analysis showing a higher sensitivity in solid tumors without common screening methods relative to solid tumors with common screening methods, including breast and prostate.
For context, a reported 71% of cancer deaths in those aged 50-79 years are attributed to cancers without recommended screening options.
Lastly, this test may also provide a complementary screening option for individuals ineligible for or non-compliant with current screening tests, as well as for underserved communities with poor access to screening facilities.
Cancer classes that tend to be aggressive, such as pancreatic and esophageal cancers, were included in the pre-specified group of 12 cancers that contribute to a large proportion of cancer deaths and are more likely than others to shed more cfDNA into circulation. Indeed, sensitivity in the pre-specified group of 12 cancer classes, most of which currently lack screening tests, was higher than that observed in all cancers. These 12 cancers account for approximately two-thirds of US cancer deaths, underscoring the potential for this test to provide population-scale benefits.
Conversely, some indolent cancers, like early-stage prostate cancer, shed less and are thus less detectable by this approach. Together, this suggests that this MCED test has the potential to minimize overdiagnosis.
As expected, accuracy of CSO prediction was slightly lower in this substudy compared with the second substudy (88.7% versus 93.3%), in part because indeterminate CSO predictions were removed in the refined test. However, accuracy was still high, and the few incorrect CSO calls were often the result of a biological phenomenon that complicated CSO assignment [e.g. mismatched CSO predictions between human papillomavirus (HPV)-driven cancers, like cervical and anal].
Providing CSO predictions is intended to help health care providers define diagnostic workups after a positive test result. One exception is the ‘Neuroendocrine Cells of Lung or Other Organs’ CSO class, which may require a whole-body computed tomography (CT) or positron emission tomography (PET)-CT scan to localize the primary tumor. Importantly, the estimated likelihood of having cancer even with an unresolved positive signal (i.e. workup of the CSO did not confirm a cancer diagnosis) was still sufficiently high (8.0%, a PPV that is at least twice as high as that reported for current screening modalities for breast and colorectal cancers) to potentially warrant further diagnostic evaluation.
The assay and classifiers used here underwent multiple refinements for use as a multi-cancer screening tool. In previous work with an earlier version of this test, a hematopoietic CSO was associated with a larger number of false positives. To mitigate this, the cancer detection classifier was adjusted depending on whether the CSO prediction was consistent with a solid tumor or a hematological cancer. This allowed for increased sensitivity for solid tumors while maintaining specificity and was supported by the post hoc analysis reported here separating solid cancers (with or without common screening options) and hematological cancers. Additionally, neuroendocrine tumors of all organs were associated with a common methylation pattern; as such, ‘Neuroendocrine Cells of Lung or Other Organs’ was created as a separate, distinct CSO label. Finally, improvements in classification removed the need for an indeterminate CSO category, meaning that all positive test results had an associated CSO prediction, which could help physicians direct diagnostic workups.

Limitations of this study include:

The blood samples collected from participants with cancer after biopsies had been carried out could increase the possibility that the tumor cfDNA fraction may increase relative to before the biopsy.
CCGA is a case-control study, and as such, is not reflective of performance in a screening population; a larger clinical development program that includes other studies evaluating test performance and/or clinical utility in target-use populations (STRIVE, NCT03085888; SUMMIT, NCT03934866; PATHFINDER, NCT04241796) is underway. Importantly, an ongoing interventional return-of-results study (PATHFINDER, NCT04241796) will also assess clinical implementation (e.g. time to diagnostic resolution) as well as safety.

Conclusion

Taken together, these results demonstrate that this targeted methylation-based MCED test has high specificity that is generalizable across study populations, detects cancer signals across a broad range of cancer types with diverse biologic features (including those that currently lack screening tests), and provides accurate CSO prediction that may inform patient management.
These results support that this blood-based MCED test may complement existing single-cancer screening tests and result in reduced cancer mortality.