Comprehensive Notes on the Development of a Clinic Optimization System with OCR and Predictive AI

Primary Goal: To optimize the workflow of a clinic through the implementation of Optical Character Recognition (OCR) and predictive scheduling.
Specific Objectives:
- Transition from paper records to a digital capture-and-paste system.
- Reduce no-shows and downtime by predicting the likelihood of a patient showing up for an appointment.
- Overall optimization of the clinic's operational efficiency.
Out of Scope (Non-Clinical/Non-Telemedicine):
- The system is not a standard Telemedicine platform designed for video communication with doctors.
- It does not involve financial billing, insurance recommendations, or medicine recommendations.
- It is not intended to be a fully interoperable Electronic Health Record (EHR) system for external entities; it is focused on specific client migration within a clinic set.
- Expensive communication servers and unnecessary APIs that add financial burden are explicitly excluded.

Frameworks:
- Python: The primary programming language.
- Django 6.8: The core web framework.
- Django Rest Framework (DRF) 3.16: Used for building the API.
- Simple Auth JWT: Used for handling authentication (login and logout functions).
Database Management:
- SQLite: Used during the development phase. It is noted for having a "lock on write" during simultaneous access by multiple staff members, which is why it is preferred only for dev/personal level use.
- PostgreSQL: Used for live production and stress testing. It is the preferred choice for performance and handling multiple concurrent users.
Alternatives Rejected:
- Flask and FastAPI: Considered too bare-bones as they lack built-in admin panels, easy migrations, and comprehensive authentication systems provided by Django.
- Node.js/Express, Spring Boot (.NET): Deemed too heavy or requiring a larger development team compared to the agility of Django for a small project.

Framework: React + Vite + Tailwind CSS.
Design Philosophy: Plug-and-play for authentication functionality.
Alternatives Rejected:
- Next.js: Deemed "overkill" for this specific use case because the system mostly requires simple sign-in/sign-out functionality and a landing page rather than complex server-side rendering or heavy SEO optimization.
- Angular: Mentioned as a complex alternative with a steeper learning curve (JS complexity).

Backend Hosting: Railway is the chosen platform for backend deployment.
Cost Structure:
- Thesis Demo: Covered under the free tier.
- Production: Approximately $\$5/\text{month}$ for the clinic implementation.
Deployment Tool: Nixpacks (auto-detects language and handles migrations during deployment).
AI/Model Hosting: Serverless GPU hosting for AI inference.
- GPU Cost: Fixed at and around $\$30/\text{month}$ .
- Operational Costs: AWS idle costs approximately $\$200$ for hibernation mode credits. Specific runs cost roughly $\$0.12/\text{run}$ .
Backend Model Strategy: The AI logic resides behind an HTTP boundary, separated from the main CPU-based hosting.

Firebase:
- Function: User account creation, sign-up, and verification.
- Financials: Utilizes a 3-month trial yielding $\$90$ in free credits for live testing purposes.
- Security: Supports six-digit authenticator codes for login verification.
Semaphore:
- Function: SMS gateway used for external notifications.
- Use Cases: Account approval notifications and appointment schedule alerts.
- Setup: Requires a phone number and a secret key.

Goal: To convert medical handwriting and documents into a structured JSON format.
Metric for Success: Character Error Rate (CER). A high CER (greater than $50\%$ ) is a failure state.
Models and Alternatives Evaluated:
- Tesseract: Mentioned as a traditional alternative.
- Microsoft Read API and Google Vision: Rejected despite high quality because they are paid APIs that introduce external dependencies.
- Donut: Rejected due to limitations with fixed templates.
- Qwen 2V: High sequence accuracy but lower overall performance.
- Qwen 7B: Identified as having a tendency to hallucinate; therefore, the team uses a fall-back strategy involving individual word extraction.
Iterative Model Training (R-Series):
- $R_0$ (Untouched): The base, non-fine-tuned model.
- $R_1$ (Synthetic): Fine-tuned using synthetic data to handle freehand and cursive. Targeted CER: $6.7$ to $7.2$ .
- $R_2$ (Medical Data): Targeted specifically at medical documents. Note: Medical data is private/illegal to access easily, requiring strict data privacy controls. Targeted CER: $20\%$ to $30\%$ .
- $R_3$ (Clinic Specific): Fine-tuned for specific clinic documents such as medical certificates, lab results, and "baby books."

Layer 1: Individual Patient Prediction (Beta-Binomial):
- Uses a $\text{Beta-Binomial}$ distribution model to predict behavior based on a small sample space (e.g., 5 to 10 previous appointments).
- Useful for predicting if a specific patient will be late at a certain time (e.g., late afternoon appointments).
Layer 2: Overall Clinic Prediction (CatBoost + Random Forest):
- Analyzes overall clinic trends.
- Requires a larger dataset, typically at least $200$ total appointments across the clinic to achieve accurate prediction levels.
Maya Bot: A rule-based AI bot triggered by specific actions (e.g., booking an appointment). It provides basic automated responses like "please contact the clinic" and handles basic triggers rather than complex generative conversation.

Accuracy Standards:
- Question: What is the target accuracy for the system?
- Response: The target is a bare minimum of $70\%$ accuracy. While the general medical industry standard for AI is often cited as $80\%$ , the team is aiming for $70\%$ as a functional baseline during the development phase.
Character Error Rate Specifics: For $R_1$ , accuracy is high (CER between $6.7$ and $7.2$ ). For $R_2$ (actual medical documents), the CER is expected to be within $20\%$ to $30\%$ , meaning approximately 2 errors for every 10 characters.
Data Feasibility: The project was confirmed as "doable," though obtaining the necessary datasets for medical records is a significant hurdle due to privacy regulations.
Unexpected Findings: The speaker noted that real-world clinic datasets often prove to be more "legit" and "unexpected" than synthetic data, requiring robust handling in the final system.