Module 9 - Diagnostic Accuracy Studies

Introduction to Diagnostic Accuracy

This module introduces the concept of diagnosis and its relation to assessment, measurement, and validity, incorporating detailed explanations of how these elements interact within the diagnostic process.
It emphasizes the importance of diagnostic accuracy in ensuring effective patient care, the role of diagnostic research in advancing medical knowledge, and the significance of diagnostic study designs in producing reliable and valid results.

Elaboration on Diagnostic Accuracy

Diagnostic accuracy refers to the ability of a diagnostic test or procedure to correctly classify individuals as either having or not having a particular condition or disease.
It involves evaluating the performance of diagnostic tests in terms of their sensitivity, specificity, positive predictive value, and negative predictive value.

Elaboration on Diagnostic Research

Diagnostic research aims to improve the accuracy, reliability, and efficiency of diagnostic tests and procedures.
It involves conducting studies to evaluate the performance of new diagnostic tests, compare different diagnostic strategies, and identify factors that may affect diagnostic accuracy.

Elaboration on Diagnostic Study Designs

Diagnostic study designs are frameworks for conducting research studies to evaluate the performance of diagnostic tests and procedures.
Common diagnostic study designs include cross-sectional studies, case-control studies, and cohort studies.
These designs help researchers systematically collect and analyze data to assess the accuracy and reliability of diagnostic tests.

Definition of Diagnosis

Diagnosis is the process of determining the nature of a disorder, which requires a systematic approach involving the assessment of symptoms, signs, and other relevant information.
This involves considering the patient's signs and symptoms, medical background, and results of laboratory tests and X-ray examinations, integrating all available data to arrive at an accurate conclusion.
Diagnosis concepts are medically grounded but applicable to allied health, highlighting the interdisciplinary nature of diagnostic processes.

Differential Diagnosis

Differential diagnosis involves identifying the correct condition when signs and symptoms are shared by various other conditions, necessitating a process of elimination and comparison.
It requires ruling out possible, incorrect diagnoses to identify the real problem, presenting a clinical challenge that demands thorough evaluation and clinical judgment.

Diagnosis, Assessment, and Measurement

Diagnosis is a part of the assessment process, serving as a crucial step in determining the patient's overall health status and needs.
It involves determining which health conditions the patient has (ruling in) and which they do not have (ruling out), using various diagnostic tools and techniques to gather relevant information.
This requires structured observation, interviewing, and testing, ensuring that all aspects of the patient's condition are thoroughly evaluated.
Assessment and diagnostic procedures lead to decisions about the patient's condition, informing subsequent treatment and management strategies.
The results inform treatment, management, or a patient care plan, based on evidence gathered during the diagnostic process.
Diagnosis is a form of measurement and must be valid, implying reliability, requiring tests to consistently and accurately reflect the patient's true condition.
It involves classification of the patient's condition based on diagnostic criteria, using established guidelines to categorize the patient's health status.
This can be nominal or ordinal measurement, such as binary (condition present or not) or a spectrum of severity (e.g., autism), providing a structured framework for understanding and addressing the patient's needs.

Evidence-Based Decisions

Diagnosis relies on evidence from the patient and diagnostic procedures, emphasizing the importance of both subjective and objective data in the diagnostic process.
Multiple sources of evidence include subjective evidence from the patient (symptoms) and objective evidence from clinical investigation (signs, investigative tests), ensuring a comprehensive understanding of the patient's condition.
A diagnostic test is a procedure that provides evidence supporting a decision, using a system of nominal or ordinal measurement, offering a structured approach to evaluating the patient's health status.
There isn't always a one-to-one link between a sign or symptom and an underlying disorder, complicating diagnosis and requiring careful consideration of all available information.
One sign or symptom could indicate multiple disorders, or one disorder could have multiple signs or symptoms, necessitating a thorough differential diagnosis to identify the correct condition.

Importance of Accurate Diagnosis

Assessment and diagnosis help determine the patient's health conditions, forming the foundation for effective treatment and management strategies.
They are essential for developing a treatment or management plan tailored to the patient's specific needs, ensuring the best possible outcomes.
An incorrect diagnosis can lead to wrong or missed treatment, wasting time and money, and delaying proper treatment, underscoring the critical importance of accurate diagnoses.
Effective treatment relies on accurate initial diagnosis, ensuring that patients receive the appropriate care from the outset.

Diagnostic Errors

Diagnostic errors, including missed, wrong, or delayed diagnoses, affect a significant percentage of hospital admissions and outpatient clinics, posing a substantial challenge to healthcare systems.
Diagnostic error can lead to patient harm and preventable deaths, highlighting the severe consequences of diagnostic inaccuracies.
Malpractice claims against general practitioners often involve diagnostic error, with a high percentage deemed preventable, emphasizing the need for improved diagnostic practices.

Principles of Diagnostic Accuracy

The module focuses on the principles of diagnostic accuracy common across clinical specialties, rather than specific diagnostic techniques, providing a broad understanding of diagnostic assessment.
The outcome of any diagnostic or assessment procedure is a decision, which should be valid, ensuring that diagnostic decisions are based on reliable and accurate information.
Diagnostic accuracy research examines the validity of diagnostic tests and procedures, aiming to improve the accuracy and reliability of diagnostic tools.
Clinical questions about diagnostic accuracy become research questions, informing practice through evidence, underscoring the importance of evidence-based diagnostic practices.

Research into Accuracy of Diagnostic Tests

The ideal research design for diagnostic accuracy is a diagnostic accuracy study, which offers a structured approach to evaluating diagnostic tests and procedures.
This design involves two diagnostic tests: a reference test of known high accuracy (gold standard) and an index test being evaluated, allowing for a direct comparison of the two tests.
Agreement between the index test and the reference test provides evidence of diagnostic accuracy, indicating the extent to which the index test correctly identifies cases and controls.
Diagnostic yield studies, which lack a reference test, are considered inferior, as they do not provide a reliable measure of diagnostic accuracy.

Cases, Controls, Positives, and Negatives

Reference test results classify individuals as cases (having the health condition) or controls (not having the health condition), forming the basis for evaluating the performance of the index test.
Index test results classify individuals as positive (predicted to be a case) or negative (predicted to be a control), providing a basis for comparison with the reference test results.
In diagnostic accuracy research, the index test predicts the results of the reference test, allowing for an assessment of the index test's ability to accurately identify cases and controls.
Only the reference test definitively determines whether a person is a case or a control, serving as the gold standard against which the index test is evaluated.
Positive and negative terms are commonly used in everyday language to refer to any test result, not just index tests, requiring careful attention to the context in which these terms are used.

Diagnostic Study Designs - Levels of Evidence

Level I: Systematic review of Level II studies, providing the highest level of evidence by synthesizing findings from multiple high-quality studies.
Level II: Diagnostic accuracy study with blinded, independent testing with random or consecutive patients; "consecutive cohort," offering strong evidence due to the rigorous methodology and reduced risk of bias.
Level III-1: Diagnostic accuracy with non-consecutive patients, providing moderate evidence but with a higher risk of selection bias.
Level III-2: Diagnostic case-control study – cases identified from reference test then compared on index test (over-estimates accuracy – biased), offering lower-quality evidence due to the potential for bias and overestimation of accuracy.
Level IV: Diagnostic yield study – no reference test, providing the lowest level of evidence due to the lack of a gold standard for comparison.
Diagnostic accuracy studies are typically cross-sectional, focusing on the present rather than tracking people over time, providing a snapshot of diagnostic accuracy at a single point in time.

Valid Discrimination

Valid discrimination in diagnostic tests distinguishes between people who have a condition and those who do not, ensuring that the test accurately classifies individuals based on their health status.
For a valid index test:
- All positive results should be cases (people who have the condition), indicating that the test correctly identifies individuals with the condition.
- All negative results should be controls (people who do not have the condition), indicating that the test correctly identifies individuals without the condition.

Biases in Diagnostic Studies

Incorporation bias: When the index test is included in the reference test, overestimating accuracy, creating a circular relationship that inflates the apparent accuracy of the index test.
Verification bias: Occurs due to sampling problems, such as patients being eligible only because they have had both tests, leading to a non-representative sample and biased results.
Non-consecutive cases: Sampling bias arises if not all eligible patients get both tests, resulting in a sample that does not accurately reflect the population of interest.
Reference test limitations: The reference test may be harmful or invasive, leading to selection bias, as only certain patients may be willing or able to undergo the reference test.

Statistical Methods to Measure Diagnostic Accuracy

Continuous data must be categorized into cases and non-cases, requiring the establishment of cutoff points to classify individuals based on their test results.
Categorical data is used to classify patients into cases/controls (reference test) and positives/negatives (index test), providing a structured framework for evaluating diagnostic accuracy.
Statistical measures of association link reference and index test categories, allowing for the assessment of the relationship between the two tests.
Diagnostic accuracy measures assess agreement between reference and index tests, providing an indication of the extent to which the index test accurately identifies cases and controls.

Two-by-Two Contingency Table

This table is essential for understanding diagnostic accuracy studies, providing a clear and organized way to present the results of diagnostic tests.
It cross-tabulates the results of reference and index tests, allowing for the calculation of various measures of diagnostic accuracy.
True positive (TP): Index test correctly identifies a case, indicating that the test accurately identifies individuals with the condition.
False negative (FN): Index test incorrectly identifies a case as a control, indicating that the test fails to identify individuals with the condition.
False positive (FP): Index test incorrectly identifies a control as a case, indicating that the test incorrectly identifies individuals without the condition as having the condition.
True negative (TN): Index test correctly identifies a control, indicating that the test accurately identifies individuals without the condition.

Categorical Measures of Diagnostic Accuracy

Sensitivity (Sn): Index test ability to detect people with the condition (percentage of cases with a positive result).

Sensitivity = \frac{True Positives}{(True Positives + False Negatives)}

Specificity (Sp): Index test ability to identify people without the condition (percentage of controls with a negative result).

Specificity = \frac{True Negatives}{(True Negatives + False Positives)}

Positive Predictive Value (PPV): Percentage of positive tests who are cases.

PPV = \frac{True Positives}{(True Positives + False Positives)}

Negative Predictive Value (NPV): Percentage of negative tests who are controls.

NPV = \frac{True Negatives}{(True Negatives + False Negatives)}

All four measures range from 0% to 100%, providing a standardized scale for evaluating diagnostic accuracy.

Interpreting Diagnostic Accuracy Measures

Sensitivity:
- High sensitivity: Finds true positives; few false negatives, indicating that the test is effective at identifying individuals with the condition.
- Low sensitivity: Misses true positives; many false negatives, indicating that the test is not effective at identifying individuals with the condition.
Specificity:
- High specificity: Finds true negatives; few false positives, indicating that the test is effective at identifying individuals without the condition.
- Low specificity: Misses true negatives; many false positives, indicating that the test is not effective at identifying individuals without the condition.
Positive Predictive Value:
- High PPV: Most people who test positive will be cases, indicating that a positive test result is likely to be accurate.
- Low PPV: Most people who test positive will be controls, indicating that a positive test result is likely to be inaccurate.
Negative Predictive Value:
- High NPV: Most people who test negative will be controls, indicating that a negative test result is likely to be accurate.
- Low NPV: Most people who test negative will be cases, indicating that a negative test result is likely to be inaccurate.

Simplified Interpretation of Sn, Sp, PPV, and NPV

Sensitivity: Do cases test positive? (Higher percentage = more cases test positive).
Specificity: Do controls test negative? (Higher percentage = more controls test negative).
Positive Predictive Value: Are positives cases? (Higher percentage = more positives are cases).
Negative Predictive Value: Are negatives controls? (Higher percentage = more negatives are controls).

Likelihood Ratios to Measure Accuracy

Likelihood ratio is the probability of one event divided by the probability of an alternative event, providing a measure of the strength of evidence for or against a diagnosis.
Positive Likelihood Ratio: For ruling in disease. \frac{(True Positives ÷ cases)}{(False Positives ÷ controls)} - Ideally: TPs per case much higher than FPs per control.
- Conventionally: Good if > 2; excellent if > 10.
Negative Likelihood Ratio: For ruling out disease. \frac{(False Negatives ÷ cases)}{(True Negatives ÷ controls)} - Ideally: FNs per case much lower than TNs per control.
- Conventionally: Good if < 0.5; excellent if < 0.1.

Index and Reference Example: Anterior Cruciate Ligament Tears

Gold standard: Arthroscopy or MRI scan (accepted accuracy).
Index tests: Anterior drawer test, Lachman’s test, Pivot shift test (non-invasive, inexpensive).
Index and reference tests are research concepts, not used in everyday practice.

How Diagnostic Decisions Affect Accuracy

Clinical decisions affect diagnostic accuracy, influencing the likelihood of false positives and false negatives.
Example: Diagnosing overweight and obesity using abdomen and ankle circumference.
Body Mass Index (BMI) is the reference test: BMI = \frac{Weight (kg)}{Height^2 (m)}
- BMI < 25 = normal weight or underweight.
- BMI ≥ 25 = overweight.
- BMI ≥ 30 = obese.
- BMI ≥ 40 = morbidly obese.

Sample Characteristics

Descriptive statistics from a subset of variables, providing an overview of the characteristics of the study population.
Body mass index (BMI) was calculated from height and weight, allowing for the classification of individuals based on their weight status.
Half of the sample were at least overweight, about 10% of the sample were at least obese but only 1 from 250 was morbidly obese, highlighting the prevalence of overweight and obesity in the study population.

Relationships between Variables: Correlations

Correlation has a possible range between 0 and 1, providing a measure of the strength and direction of the relationship between two variables.
Higher correlation gives stronger association and better prediction.
- 1 = perfect prediction.
- 0 = no prediction.
Abdomen predicts BMI very well.
Ankle predicts BMI a lot less well.

Diagnosing Overweight: Ankle Circumference

Lower 25% Cut-Off (Ankle value of 22):
- Sensitivity = 90%.
- Specificity = 34%.
- Positive Likelihood = 1.38.
- Negative Likelihood = 0.28.
Median Cut-Off (Ankle value of nearly 23):
- Sensitivity = 70%.
- Specificity = 68%.
- Positive Likelihood = 2.18.
- Negative Likelihood = 0.44.
Upper 25% Cut-Off:
- Sensitivity = 40%.
- Specificity = 88%.
- Positive Likelihood = 3.37.
- Negative Likelihood = 0.68.
Diagnostic accuracy is still far from perfect (too high or too low decision thresholds).

Diagnosing Overweight: Abdomen Circumference

Lower 25% Cut-Off:
- Sensitivity = 99%.
- Specificity = 50%.
- Positive Likelihood = 1.97.
- Negative Likelihood = 0.02.
Median Cut-Off (Abdomen value of about 90):
- Sensitivity = 87%.
- Specificity = 88%.
- Positive Likelihood = 7.28.
- Negative Likelihood = 0.14.
Upper 25% Cut-Off:
- Sensitivity = 48%.
- Specificity = 99%.
- Positive Likelihood = 60.52.
- Negative Likelihood = 0.52.

Diagnosing Obesity: Abdomen Circumference

Median Cut-Off:
- Sensitivity = 100%.
- Specificity = 56%.
- Positive Likelihood = 2.25.
- Negative Likelihood = 0.00.

Diagnosing Morbid Obesity: Abdomen Circumference

Median Cut-Off:
- Sensitivity = 100%.
- Specificity = 50%.
- Positive Likelihood = 2.02.
- Negative Likelihood = 0.00.

Receiver Operating Characteristic (ROC) Curves

An ROC plots sensitivity against the false positive rate, providing a visual representation of the performance of a diagnostic test across different cutoff values.
A good test has high sensitivity without a lot of controls also testing positive, its curve close to the upper left corner, indicating that the test is accurate and reliable.
The abdomen test works better than the ankle test for diagnosing overweight, highlighting the importance of selecting appropriate diagnostic tests for specific conditions.

Diagnosing Overweight: Randomised Ankle

Median Cut-Off:
- Sensitivity = 50%.
- Specificity = 48%.
- Positive Likelihood = 0.96.
- Negative Likelihood = 1.04.
Randomisation destroys relationship between index & reference test; all measures work at chance level.

Conclusion of BMI, Body Fat, Ankles and Tummies in Males

Abdominal circumference makes better diagnostic test of overweight than ankle, highlighting the importance of selecting appropriate diagnostic tests for specific conditions.
Best combination: Diagnosing overweight (above average BMI = median BMI) using median abdomen as cut-off.
Diagnosing obesity and morbid obese - median abdomen gave too many false positives; needed a higher decision threshold.

Key Factors Affecting Diagnostic Accuracy

Diagnostic accuracy depends on:
- How common the health condition is.
- How easy or hard it is for someone to test positive.
Validity of the test.

Considerations for Decision Criteria

Decision rules should be from evidence, not guesswork, emphasizing the importance of evidence-based decision-making in diagnostic testing.
Diagnostic cut-offs: From diagnostic accuracy studies, providing a basis for establishing appropriate cutoff values for diagnostic tests.
Need large sample research, not just one patient to have accurate evidence-based guidelines for clinicians, highlighting the importance of conducting robust research studies to inform clinical practice.
Accounts for results of wrong decision: treatment errors, underscoring the need to consider the potential consequences of diagnostic errors.
Consequences of error: False positive or false negative, requiring careful consideration of the implications of both types of errors.

Beyond the Test: Other Factors

Accuracy (validity) is not just about the test, but also depends on how the test is performed and interpreted.
Validity threats apply also to clinical practice, not only research, highlighting the importance of maintaining high standards in clinical practice.
A test of known accuracy (from research) will be invalid if:
- Not done properly in clinical practice due to a sloppy technique or biased observations and assessments by a clinician.
- Conflict of interest if clinician makes money from treatment (financial incentive to bias decisions towards positive).
- Test used with the wrong population for that test.

Screening Tests vs Diagnostic Tests

Screening aims for early detection of cases among large numbers of asymptomatic/apparently healthy people, but who are at risk, providing an opportunity for early intervention and improved outcomes.
Screening tests resemble diagnostic tests but have a different purpose, focusing on identifying individuals who may have a condition, rather than confirming a diagnosis.
False negatives should preferably be avoided in a screening test so test decision bias tends towards positives, ensuring that individuals who may have the condition are not missed.
People testing positive on a screening test receive follow-up rigorous diagnostic testing to identify false positives, confirming whether they truly have the condition.

Over-Diagnosis

Health practitioners are too quick to classify people as positive, leading to unnecessary treatment and potential harm.
If this opinion piece is to be believed there are clear examples of health practitioners too readily classifying people as positive.

COVID-19 and Temperature Screening

Infrared thermometers were used for screening asymptomatic people for COVID-19 in public, aiming to identify individuals who may be infected.
Recent article: Thermal screening has reasonable diagnostic accuracy in the detection of fever, although it may vary with changes in subject characteristics, setting, index test and the reference standard used. Thermal screening has a good NPV even during a pandemic.