1/50
Vocabulary flashcards covering key terms, technologies, data types, AI techniques, workflow steps, and challenges discussed in Module 1 of the Certificate Course on Artificial Intelligence in Medicine (Big Data Analytics in Healthcare).
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data
Raw facts, figures, or observations that can be processed to generate insights.
Structured Data
Organized data stored in predefined formats such as tables, spreadsheets, or relational databases.
Unstructured Data
Data without a fixed format, e.g., emails, physician notes, images, audio, and video.
Healthcare Data
Medical and health-related information collected to improve patient care, decision-making, and operations.
Patient Data
Demographics, medical history, diagnoses, and treatment records about an individual patient.
Clinical Data
Lab results, imaging reports, and physician notes gathered during clinical care.
Operational Data
Information on hospital resources, staff schedules, and workflow efficiency.
Genomic Data
DNA sequencing information used for precision and personalized medicine.
Wearable & IoT Data
Real-time physiological readings from smart devices and sensors (e.g., heart rate, glucose).
Importance of Healthcare Data
Enhances diagnosis, improves outcomes, supports prediction, optimizes resources, and fuels AI innovation.
Big Data
Extremely large, complex datasets that traditional systems cannot process efficiently.
Volume (Big Data V)
The massive quantity of data generated and stored.
Velocity (Big Data V)
The speed at which data is produced, transmitted, and processed.
Variety (Big Data V)
The diversity of data types—structured, semi-structured, and unstructured.
Veracity (Big Data V)
The reliability, accuracy, and trustworthiness of data.
Value (Big Data V)
The actionable benefits and insights derived from data.
Electronic Health Records (EHR)
Digital patient charts that consolidate medical histories across care settings.
Medical Imaging Data
Large unstructured files from X-rays, MRIs, CT scans used for diagnostics.
Clinical Trials Data
Structured and unstructured information collected during medical research studies.
Social Media & Patient Feedback
Unstructured posts, reviews, and comments reflecting patient experience and public health trends.
Health Insurance & Billing Data
Structured claims and payment records used for reimbursement and fraud detection.
Kilobyte (KB)
≈1,024 bytes; a small storage unit for simple text files.
Megabyte (MB)
≈1,024 KB; stores images or small datasets.
Gigabyte (GB)
≈1,024 MB; typical size for hospital databases or imaging studies.
Terabyte (TB)
≈1,024 GB; capacity for years of EHR or imaging archives.
Petabyte (PB)
≈1,024 TB; scale of nationwide health systems or genomic repositories.
Small Data in Healthcare
KB–MB datasets processed locally with tools like Excel or Access.
Medium Data in Healthcare
GB–TB datasets managed with enterprise SQL databases or cloud solutions.
Big Data in Healthcare
TB–PB+ datasets requiring distributed storage and parallel processing frameworks.
Hadoop
Open-source framework with HDFS & MapReduce for distributed big data storage and processing.
Hive
SQL-like data warehouse on Hadoop enabling easy querying with HiveQL.
Pig
High-level scripting platform (Pig Latin) that simplifies complex MapReduce tasks.
Spark
In-memory big data engine offering fast batch, streaming, ML, and SQL analytics.
HBase
NoSQL database on Hadoop providing real-time read/write access to massive tables.
Machine Learning (ML)
AI technique that learns patterns from data to make predictions (e.g., readmission risk).
Natural Language Processing (NLP)
AI method for extracting insights from unstructured text like physician notes.
Deep Learning
Neural-network approach (e.g., CNNs) for tasks such as tumor detection in MRI.
Predictive Analytics
Using historical data and models to forecast outcomes (e.g., disease outbreaks).
Real-Time Analytics
Instant analysis of streaming data, enabling immediate clinical interventions.
Data Collection (Workflow)
Gathering EHRs, lab results, demographics, and admission details for analysis.
Data Preprocessing
Cleaning, normalizing, and anonymizing raw healthcare data before modeling.
Feature Engineering
Creating meaningful variables (e.g., age, medication adherence) to improve model performance.
Model Training & Selection
Applying algorithms like Random Forest or XGBoost to learn from training data.
Model Evaluation
Assessing accuracy, precision, and recall to validate predictive performance.
Deployment
Integrating an analytic model into clinical systems for real-time decision support.
Data Privacy & Security
Protecting sensitive health data and complying with HIPAA, GDPR, and related laws.
Data Integration
Combining disparate data sources into a unified, analyzable format.
Scalability
Ability of systems to handle growing data volumes without performance loss.
Interoperability
Seamless exchange of information across different healthcare IT systems.
MIMIC Database
Public ICU dataset widely used for AI research in intensive care.
PhysioNet
Repository of physiological signals and time-series data for health research.