Athiti 211171101005 project phase 1 ppt
Introduction
Early cancer diagnosis improves survival rates.
Study develops 10 diagnostic models for common cancers using extreme gradient boosting and 66 laboratory parameters.
Study Design and Enrollment
Data Collection
Timeframe: January 1, 2017 - October 31, 2020.
Total data points: 14,949,191 diagnostic and 122,365,478 test data points.
Feature Selection
Initial selection: removed features with < 1‰ missing data.
Model feature selection: removed features with < 50% missing in each cancer type.
Model Development
Used XGBoost for building binary classification models.
Parameters selected via forward stepwise method.
Model Performance
Performance Metrics by Cancer Type
Lung Cancer: AUC 0.896, Sensitivity 0.773, Specificity 0.902
Bowel Cancer: AUC 0.800, Sensitivity 0.722, Specificity 0.753
Gastric Cancer: AUC 0.806, Sensitivity 0.743, Specificity 0.731
Liver Cancer: AUC 0.835, Sensitivity 0.773, Specificity 0.759
Pancreatic Cancer: AUC 0.918, Sensitivity 0.778, Specificity 0.908
Biliary Tract Malignancy: AUC 0.763, Sensitivity 0.716, Specificity 0.723
Prostate Cancer: AUC 0.976, Sensitivity 0.925, Specificity 0.952
Urological Cancers: AUC 0.862, Sensitivity 0.866, Specificity 0.700
Breast Cancer: AUC 0.968, Sensitivity 0.991, Specificity 0.882
Thyroid Cancer: AUC 0.993, Sensitivity 0.987, Specificity 0.969
Feature Importance
Key Findings
Significant contributions from 54 nontumor markers identified via SHAP analysis.
Top features included both tumor and nontumor markers.
Urinary leukocyte count was the most weighted feature in urological cancers.
Fecal occult blood and blood were significant for gastric and intestinal cancer models.
Cosine Similarity Analysis
Pancreatic & Biliary Tract Malignancy: Highest similarity score (0.52) due to shared embryological origin.
Lung & Gastric Cancer: Similarity score of 0.34, indicating clustering within the digestive system category.
Cluster Analysis
Identified Clusters
Cluster 1: Pancreatic cancer, biliary tract malignancy, liver cancer, bowel cancer, lung cancer, gastric cancer.
Cluster 2: Prostate cancer, breast cancer, urological cancers, thyroid cancer.
Feature Relation Diagram
Recommended Testing Parameters
For Bowel/Gastric Cancer: Test for stool blood, serum prealbumin, or total hemoglobin if abnormal.
For Pancreatic/Biliary Tract Malignancy: Test serum amylase, cholyglycine, direct bilirubin, or alkaline phosphatase if abnormal.
Multicancer Early Warning System
Key Aspects
Offers flexibility for clinical applications.
Potential to address 73.82% of cancer deaths in China through early detection.
Utilizes machine learning to identify relationships among biomarkers.
Future Directions
Real-Time Monitoring: Integrate into electronic health records for accuracy in early warning.
Model Optimization: Improve model efficacy in clinical settings.
Conclusion
Establishes a machine learning-based multicancer early warning system for 10 cancers using laboratory results.
Identifies potential shared pathological processes among cancers.
References
Various authors related to machine learning and bioinformatics in cancer detection.